phoenix-srun: job 3211349 queued and waiting for resources phoenix-srun: job 3211349 has been allocated resources phoenix-srun: Job 3211349 scheduled successfully! Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition. Current PHX_PRIORITY is normal [2024-06-10 00:33:29,642] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:29,644] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:30,062] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:30,068] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:31,362] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:31,364] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:31,377] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:33:31,377] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! [2024-06-10 00:33:59,053] [INFO] [comm.py:637:init_distributed] cdb=None Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! [2024-06-10 00:33:59,142] [INFO] [comm.py:637:init_distributed] cdb=None Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! [2024-06-10 00:33:59,313] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-10 00:33:59,325] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-10 00:33:59,325] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-06-10 00:33:59,326] [INFO] [comm.py:637:init_distributed] cdb=None 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=zero_stage1_config.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=32, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=True, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=4e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/runs/Jun10_00-34-00_SH-IDC1-10-140-37-3, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=adamw_torch, optim_args=None, output_dir=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=200, save_strategy=steps, save_total_limit=3, seed=42, skip_memory_metrics=True, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.01, ) 06/10/2024 00:34:00 - INFO - __main__ - Loading Tokenizer: /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5 [INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file ./tokenizer.model [INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2025] 2024-06-10 00:34:00,540 >> loading file tokenizer.json 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False 06/10/2024 00:34:00 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False [WARNING|logging.py:314] 2024-06-10 00:34:00,722 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [WARNING|logging.py:314] 2024-06-10 00:34:00,867 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,932 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,933 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,933 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,934 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,934 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-06-10 00:34:00,936 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) [TCSLoader] config_path: ~/petreloss.conf --> before Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) --> after Client(conf_path) 06/10/2024 00:35:20 - INFO - __main__ - Loading InternVLChatModel... [INFO|configuration_utils.py:727] 2024-06-10 00:35:20,584 >> loading configuration file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/config.json [INFO|configuration_utils.py:792] 2024-06-10 00:35:20,585 >> Model config InternVLChatConfig { "_commit_hash": null, "_name_or_path": "OpenGVLab/Mini-InternVL-Chat-2B-V1-5", "architectures": [ "InternVLChatModel" ], "auto_map": { "AutoConfig": "configuration_internvl_chat.InternVLChatConfig", "AutoModel": "modeling_internvl_chat.InternVLChatModel", "AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel" }, "downsample_ratio": 0.5, "dynamic_image_size": true, "force_image_size": 448, "llm_config": { "_name_or_path": "pretrained/internlm2-chat-1_8b", "add_cross_attention": false, "architectures": [ "InternLM2ForCausalLM" ], "attn_implementation": "flash_attention_2", "auto_map": { "AutoConfig": "configuration_internlm2.InternLM2Config", "AutoModel": "modeling_internlm2.InternLM2ForCausalLM", "AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM" }, "bad_words_ids": null, "begin_suppress_tokens": null, "bias": false, "bos_token_id": 1, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 2, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "silu", "hidden_size": 2048, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 8192, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 32768, "min_length": 0, "model_type": "internlm2", "no_repeat_ngram_size": 0, "num_attention_heads": 16, "num_beam_groups": 1, "num_beams": 1, "num_hidden_layers": 24, "num_key_value_heads": 8, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": 2, "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 3.0, "type": "dynamic" }, "rope_theta": 1000000, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": false, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": false, "use_cache": true, "vocab_size": 92553 }, "max_dynamic_patch": 12, "min_dynamic_patch": 1, "model_type": "internvl_chat", "pad2square": false, "ps_version": "v2", "select_layer": -1, "template": "internlm2-chat", "torch_dtype": "bfloat16", "transformers_version": null, "use_backbone_lora": 0, "use_llm_lora": 0, "use_thumbnail": true, "vision_config": { "_name_or_path": "OpenGVLab/InternViT-300M-448px", "add_cross_attention": false, "architectures": [ "InternVisionModel" ], "attention_dropout": 0.0, "auto_map": { "AutoConfig": "configuration_intern_vit.InternVisionConfig", "AutoModel": "modeling_intern_vit.InternVisionModel" }, "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "drop_path_rate": 0.1, "dropout": 0.0, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": null, 06/10/2024 00:35:20 - INFO - __main__ - Using flash_attention_2 for InternLM "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "gelu", "hidden_size": 1024, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "image_size": 448, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 4096, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-06, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "intern_vit_6b", "no_repeat_ngram_size": 0, "norm_type": "layer_norm", "num_attention_heads": 16, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_hidden_layers": 24, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "patch_size": 14, "prefix": null, "problem_type": null, "pruned_heads": {}, "qk_normalization": false, "qkv_bias": true, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_flash_attn": true } } [INFO|modeling_utils.py:3473] 2024-06-10 00:35:20,675 >> loading weights file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/model.safetensors [INFO|modeling_utils.py:1426] 2024-06-10 00:35:20,708 >> Instantiating InternVLChatModel model under default dtype torch.bfloat16. [INFO|configuration_utils.py:826] 2024-06-10 00:35:20,710 >> Generate config GenerationConfig {} [INFO|configuration_utils.py:826] 2024-06-10 00:35:20,796 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 2 } [INFO|modeling_utils.py:4350] 2024-06-10 00:35:36,690 >> All model checkpoint weights were used when initializing InternVLChatModel. [INFO|modeling_utils.py:4358] 2024-06-10 00:35:36,691 >> All the weights of InternVLChatModel were initialized from the model checkpoint at /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5. If your task is similar to the task the model of the checkpoint was trained on, you can already use InternVLChatModel for predictions without further training. [INFO|configuration_utils.py:779] 2024-06-10 00:35:36,698 >> loading configuration file /mnt/petrelfs/share_data/wangwenhai/internvl/release/Mini-InternVL-Chat-2B-V1-5/generation_config.json [INFO|configuration_utils.py:826] 2024-06-10 00:35:36,698 >> Generate config GenerationConfig {} 06/10/2024 00:35:36 - INFO - __main__ - Finished 06/10/2024 00:35:36 - INFO - __main__ - model.config.force_image_size: 448 06/10/2024 00:35:36 - INFO - __main__ - data_args.force_image_size: 448 06/10/2024 00:35:36 - INFO - __main__ - model.config.vision_config.image_size: 448 06/10/2024 00:35:36 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:35:36 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:35:36 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:35:36 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:35:36 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:35:41 - INFO - __main__ - Add dataset:sharegpt4v_instruct_gpt4-vision_cap100k_0 with length: 102025 06/10/2024 00:35:41 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:35:41 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:35:41 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:35:41 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:35:41 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:35:45 - INFO - __main__ - Add dataset:llava_instruct_150k_zh_0 with length: 157712 06/10/2024 00:35:45 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:35:45 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:35:45 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:35:45 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:35:45 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:34 - INFO - __main__ - Add dataset:sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k_0 with length: 665058 06/10/2024 00:36:34 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:34 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:34 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:34 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:34 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:43 - INFO - __main__ - Add dataset:dvqa_train_200k_0 with length: 200000 06/10/2024 00:36:43 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:43 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:43 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:43 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:43 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:44 - INFO - __main__ - Add dataset:chartqa_train_18k_0 with length: 18317 06/10/2024 00:36:44 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:44 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:44 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:44 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:44 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:45 - INFO - __main__ - Add dataset:ai2d_train_12k_0 with length: 12413 06/10/2024 00:36:45 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:45 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:45 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:45 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:45 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:50 - INFO - __main__ - Add dataset:docvqa_train_10k_0 with length: 10211 06/10/2024 00:36:50 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:50 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:50 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:50 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:50 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:51 - INFO - __main__ - Add dataset:geoqa+_0 with length: 72318 06/10/2024 00:36:51 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:51 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:51 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:51 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:51 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:53 - INFO - __main__ - Add dataset:synthdog_en_0 with length: 29765 06/10/2024 00:36:53 - INFO - __main__ - [Dataset] num_image_token: 256 06/10/2024 00:36:53 - INFO - __main__ - [Dataset] dynamic_image_size: True 06/10/2024 00:36:53 - INFO - __main__ - [Dataset] use_thumbnail: True 06/10/2024 00:36:53 - INFO - __main__ - [Dataset] min_dynamic_patch: 1, max_dynamic_patch: 12 06/10/2024 00:36:53 - INFO - __main__ - Formatting inputs...Skip in lazy mode 06/10/2024 00:36:59 - INFO - __main__ - Add dataset:medical_sft_sample500k_0 with length: 499712 06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.class_embedding 06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.position_embedding 06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.patch_embedding.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.embeddings.patch_embedding.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.0.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.1.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.2.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.3.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.4.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.5.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.6.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.7.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.8.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.9.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.10.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.11.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.12.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.13.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.14.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.15.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.16.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.17.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.18.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.19.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.20.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.21.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.22.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.ls1 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.ls2 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.qkv.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.qkv.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.proj.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.attn.proj.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.mlp.fc2.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm1.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm1.bias 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm2.weight 06/10/2024 00:36:59 - INFO - __main__ - vision_model.encoder.layers.23.norm2.bias 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.tok_embeddings.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.0.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.1.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.2.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.3.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.4.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.5.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.6.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.7.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.8.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.9.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.10.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.11.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.12.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.13.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.14.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.15.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.16.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.17.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.18.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.19.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.20.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.21.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.22.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention.wqkv.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention.wo.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w1.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w3.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.feed_forward.w2.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.attention_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.layers.23.ffn_norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.model.norm.weight 06/10/2024 00:36:59 - INFO - __main__ - language_model.output.weight 06/10/2024 00:36:59 - INFO - __main__ - mlp1.0.weight 06/10/2024 00:36:59 - INFO - __main__ - mlp1.0.bias 06/10/2024 00:36:59 - INFO - __main__ - mlp1.1.weight 06/10/2024 00:36:59 - INFO - __main__ - mlp1.1.bias 06/10/2024 00:36:59 - INFO - __main__ - mlp1.3.weight 06/10/2024 00:36:59 - INFO - __main__ - mlp1.3.bias 06/10/2024 00:36:59 - WARNING - accelerate.utils.other - Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. [INFO|trainer.py:571] 2024-06-10 00:36:59,654 >> Using auto half precision backend [2024-06-10 00:37:00,502] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.13.5, git-hash=unknown, git-branch=unknown [2024-06-10 00:37:07,652] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Using /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /mnt/petrelfs/wangwenhai/.cache/torch_extensions/py39_cu118/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 1.6104459762573242 seconds Loading extension module fused_adam... Time to load fused_adam op: 1.6218023300170898 seconds Loading extension module fused_adam... Loading extension module fused_adam... Time to load fused_adam op: 1.6295955181121826 seconds Time to load fused_adam op: 1.6301324367523193 seconds Loading extension module fused_adam... Time to load fused_adam op: 1.6191380023956299 seconds Loading extension module fused_adam... Time to load fused_adam op: 1.6189420223236084 seconds [2024-06-10 00:37:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2024-06-10 00:37:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-06-10 00:37:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2024-06-10 00:37:10,303] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2024-06-10 00:37:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 1 optimizer [2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 1000000000 [2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 1000000000 [2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False [2024-06-10 00:37:10,303] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False Loading extension module fused_adam... Loading extension module fused_adam... Time to load fused_adam op: 1.625352382659912 seconds Time to load fused_adam op: 1.6189563274383545 seconds [2024-06-10 00:37:22,009] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states [2024-06-10 00:37:22,011] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB Max_MA 6.03 GB CA 6.36 GB Max_CA 6 GB [2024-06-10 00:37:22,012] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 111.06 GB, percent = 11.0% [2024-06-10 00:37:22,506] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states [2024-06-10 00:37:22,507] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB Max_MA 6.54 GB CA 7.38 GB Max_CA 7 GB [2024-06-10 00:37:22,507] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 112.8 GB, percent = 11.2% [2024-06-10 00:37:22,507] [INFO] [stage_1_and_2.py:539:__init__] optimizer state initialized [2024-06-10 00:37:23,097] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer [2024-06-10 00:37:23,098] [INFO] [utils.py:801:see_memory_usage] MA 5.51 GB Max_MA 5.51 GB CA 7.38 GB Max_CA 7 GB [2024-06-10 00:37:23,099] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory: used = 114.43 GB, percent = 11.4% [2024-06-10 00:37:23,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2024-06-10 00:37:23,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client callable to create LR scheduler [2024-06-10 00:37:23,110] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2024-06-10 00:37:23,110] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[[0.9, 0.999]] [2024-06-10 00:37:23,111] [INFO] [config.py:996:print] DeepSpeedEngine configuration: [2024-06-10 00:37:23,111] [INFO] [config.py:1000:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-06-10 00:37:23,111] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-06-10 00:37:23,111] [INFO] [config.py:1000:print] amp_enabled .................. False [2024-06-10 00:37:23,111] [INFO] [config.py:1000:print] amp_params ................... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] bfloat16_enabled ............. True [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] comms_config ................. [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] dump_state ................... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... None [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] fp16_auto_cast ............... None [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] fp16_enabled ................. False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-06-10 00:37:23,112] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 32 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 1 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] loss_scale ................... 1.0 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] optimizer_name ............... adamw [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] optimizer_params ............. {'lr': 4e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.01} [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] pld_params ................... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] steps_per_print .............. inf [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] train_batch_size ............. 1024 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 4 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] wall_clock_breakdown ......... True [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] world_size ................... 8 [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] zero_allow_untested_optimizer False [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] zero_config .................. stage=1 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=1000000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=1000000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] zero_enabled ................. True [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-06-10 00:37:23,113] [INFO] [config.py:1000:print] zero_optimization_stage ...... 1 [2024-06-10 00:37:23,114] [INFO] [config.py:986:print_user_config] json = { "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 1.000000e+09, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 1.000000e+09, "contiguous_gradients": true }, "fp16": { "enabled": false, "auto_cast": true, "loss_scale": 0, "initial_scale_power": 32, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "optimizer": { "type": "AdamW", "params": { "lr": 4e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.01 } }, "gradient_accumulation_steps": 32, "gradient_clipping": 1.0, "steps_per_print": inf, "train_batch_size": 1.024000e+03, "train_micro_batch_size_per_gpu": 4, "wall_clock_breakdown": true } [INFO|trainer.py:1721] 2024-06-10 00:37:23,114 >> ***** Running training ***** [INFO|trainer.py:1722] 2024-06-10 00:37:23,114 >> Num examples = 1,767,531 [INFO|trainer.py:1723] 2024-06-10 00:37:23,114 >> Num Epochs = 1 [INFO|trainer.py:1724] 2024-06-10 00:37:23,114 >> Instantaneous batch size per device = 4 [INFO|trainer.py:1727] 2024-06-10 00:37:23,114 >> Total train batch size (w. parallel, distributed & accumulation) = 1,024 [INFO|trainer.py:1728] 2024-06-10 00:37:23,114 >> Gradient Accumulation steps = 32 [INFO|trainer.py:1729] 2024-06-10 00:37:23,114 >> Total optimization steps = 1,726 [INFO|trainer.py:1730] 2024-06-10 00:37:23,116 >> Number of trainable parameters = 2,205,754,368 [2024-06-10 00:37:37,731] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:37,732] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:39,682] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:39,775] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:39,776] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:40,095] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:40,384] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:37:41,285] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:32,397] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:32,403] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,348] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,484] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,487] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,491] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,491] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:38:40,522] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:37,186] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:37,191] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:37,637] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:43,054] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:43,056] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:44,430] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:44,431] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:39:44,431] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:45,784] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:50,760] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:50,762] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:54,757] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:54,759] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:54,759] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:56,597] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-06-10 00:40:56,608] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1929 [2024-06-10 00:43:21,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 3239.39 | bwd_microstep: 983.10 | bwd_inner_microstep: 982.88 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-10 00:43:23,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.59 | bwd_microstep: 1571.79 | bwd_inner_microstep: 1571.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2242 [2024-06-10 00:43:25,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.47 | bwd_microstep: 910.24 | bwd_inner_microstep: 910.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805 [2024-06-10 00:43:27,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1061 [2024-06-10 00:43:27,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.74 | bwd_microstep: 447.24 | bwd_inner_microstep: 447.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 00:43:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1312.76 | bwd_inner_microstep: 1312.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 00:43:31,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.86 | bwd_microstep: 1620.97 | bwd_inner_microstep: 1620.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 00:43:34,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.33 | bwd_microstep: 1522.22 | bwd_inner_microstep: 1522.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3502 [2024-06-10 00:43:36,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1505.43 | bwd_inner_microstep: 1505.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 00:43:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.68 | bwd_microstep: 1275.00 | bwd_inner_microstep: 1274.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 00:43:39,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1338.27 | bwd_inner_microstep: 1338.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3515 [2024-06-10 00:43:41,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1532.36 | bwd_inner_microstep: 1532.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.02 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3660 [2024-06-10 00:43:44,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.44 | bwd_microstep: 1682.01 | bwd_inner_microstep: 1681.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459 [2024-06-10 00:43:46,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.90 | bwd_microstep: 1337.72 | bwd_inner_microstep: 1337.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 00:43:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1491.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 00:43:50,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.68 | bwd_microstep: 1499.31 | bwd_inner_microstep: 1499.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3662 [2024-06-10 00:43:52,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1654.00 | bwd_inner_microstep: 1653.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975 [2024-06-10 00:43:53,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.84 | bwd_microstep: 732.94 | bwd_inner_microstep: 732.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 00:43:55,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 00:43:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1552.97 | bwd_inner_microstep: 1552.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 00:43:59,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-10 00:44:01,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1603.77 | bwd_inner_microstep: 1603.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1123 [2024-06-10 00:44:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 166.92 | bwd_microstep: 426.97 | bwd_inner_microstep: 426.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909 [2024-06-10 00:44:03,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.18 | bwd_microstep: 778.16 | bwd_inner_microstep: 778.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 00:44:05,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1375.88 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 00:44:07,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612 [2024-06-10 00:44:09,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 00:44:10,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1285.42 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2426 [2024-06-10 00:44:12,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.53 | bwd_microstep: 1069.19 | bwd_inner_microstep: 1069.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564 [2024-06-10 00:44:14,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1331.67 | bwd_inner_microstep: 1331.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 00:44:16,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.40 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 00:44:22,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.51 | optimizer_step: 7.76 [2024-06-10 00:44:22,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 5122.37 | bwd_inner_microstep: 1868.17 | bwd_allreduce_microstep: 3254.14 | step_microstep: 41.16 [2024-06-10 00:44:22,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 18620.45 | bwd: 45632.92 | bwd_inner: 42377.68 | bwd_allreduce: 3254.48 | step: 41.65 {'loss': 1.4159, 'learning_rate': 7.692307692307694e-07, 'epoch': 0.0} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 00:44:26,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1486.45 | bwd_inner_microstep: 1486.31 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 00:44:28,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.97 | bwd_microstep: 1243.54 | bwd_inner_microstep: 1243.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 00:44:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.71 | bwd_microstep: 1342.08 | bwd_inner_microstep: 1342.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 00:44:32,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1381.06 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 00:44:34,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.53 | bwd_inner_microstep: 1298.33 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4061 [2024-06-10 00:44:36,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.19 | bwd_microstep: 1718.99 | bwd_inner_microstep: 1718.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 00:44:38,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953 [2024-06-10 00:44:39,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 823.46 | bwd_inner_microstep: 823.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074 [2024-06-10 00:44:40,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.53 | bwd_microstep: 820.65 | bwd_inner_microstep: 820.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3494 [2024-06-10 00:44:42,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.52 | bwd_microstep: 1442.17 | bwd_inner_microstep: 1442.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2444 [2024-06-10 00:44:43,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.66 | bwd_microstep: 947.81 | bwd_inner_microstep: 947.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3411 [2024-06-10 00:44:45,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 00:44:47,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633 [2024-06-10 00:44:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1445.69 | bwd_inner_microstep: 1445.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-10 00:44:51,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1560.10 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640 [2024-06-10 00:44:53,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.53 | bwd_microstep: 1317.28 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 00:44:55,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 00:44:57,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649 [2024-06-10 00:44:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.87 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 00:45:01,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1397.65 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525 [2024-06-10 00:45:03,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.89 | bwd_microstep: 1424.23 | bwd_inner_microstep: 1424.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-10 00:45:05,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1610.14 | bwd_inner_microstep: 1610.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 00:45:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1394.40 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462 [2024-06-10 00:45:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.88 | bwd_microstep: 1348.07 | bwd_inner_microstep: 1348.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2042 [2024-06-10 00:45:10,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.61 | bwd_microstep: 718.88 | bwd_inner_microstep: 718.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 00:45:12,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1560.20 | bwd_inner_microstep: 1560.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 00:45:14,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.44 | bwd_microstep: 1661.78 | bwd_inner_microstep: 1661.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3503 [2024-06-10 00:45:16,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1340.11 | bwd_inner_microstep: 1340.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3595 [2024-06-10 00:45:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.65 | bwd_microstep: 1441.71 | bwd_inner_microstep: 1441.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.02 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 00:45:20,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1444.31 | bwd_inner_microstep: 1444.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2257 [2024-06-10 00:45:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.83 | bwd_microstep: 972.29 | bwd_inner_microstep: 972.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.01 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 00:45:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.17 | optimizer_step: 6.58 [2024-06-10 00:45:24,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.97 | bwd_microstep: 2043.23 | bwd_inner_microstep: 1878.38 | bwd_allreduce_microstep: 164.78 | step_microstep: 38.91 [2024-06-10 00:45:24,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.10 | bwd: 43349.19 | bwd_inner: 43183.16 | bwd_allreduce: 165.16 | step: 39.50 {'loss': 1.4336, 'learning_rate': 1.5384615384615387e-06, 'epoch': 0.0} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 00:45:26,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1292.66 | bwd_inner_microstep: 1292.49 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4038 [2024-06-10 00:45:28,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1623.48 | bwd_inner_microstep: 1623.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3870 [2024-06-10 00:45:30,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.92 | bwd_microstep: 1526.73 | bwd_inner_microstep: 1526.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 00:45:32,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1385.56 | bwd_inner_microstep: 1385.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 00:45:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1353.31 | bwd_inner_microstep: 1353.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 00:45:35,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 794.18 | bwd_inner_microstep: 794.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 00:45:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.43 | bwd_microstep: 1292.33 | bwd_inner_microstep: 1292.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683 [2024-06-10 00:45:39,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1423.11 | bwd_inner_microstep: 1423.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 00:45:41,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505 [2024-06-10 00:45:43,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.81 | bwd_microstep: 1548.14 | bwd_inner_microstep: 1548.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 00:45:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1510.60 | bwd_inner_microstep: 1510.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 00:45:47,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.97 | bwd_microstep: 1488.94 | bwd_inner_microstep: 1488.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-10 00:45:49,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1439.88 | bwd_inner_microstep: 1439.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659 [2024-06-10 00:45:51,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.59 | bwd_microstep: 1718.04 | bwd_inner_microstep: 1718.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718 [2024-06-10 00:45:53,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1474.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 00:45:55,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.61 | bwd_microstep: 1490.16 | bwd_inner_microstep: 1490.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3840 [2024-06-10 00:45:57,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.25 | bwd_microstep: 1456.18 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484 [2024-06-10 00:45:59,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.44 | bwd_microstep: 1222.26 | bwd_inner_microstep: 1222.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 00:46:01,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.71 | bwd_microstep: 1261.06 | bwd_inner_microstep: 1261.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 00:46:03,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.27 | bwd_microstep: 1466.57 | bwd_inner_microstep: 1466.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2088 [2024-06-10 00:46:04,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.06 | bwd_microstep: 760.02 | bwd_inner_microstep: 759.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 00:46:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.32 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 00:46:08,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.12 | bwd_microstep: 1560.77 | bwd_inner_microstep: 1560.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-10 00:46:10,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.11 | bwd_microstep: 1639.42 | bwd_inner_microstep: 1639.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 00:46:13,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.50 | bwd_microstep: 1660.93 | bwd_inner_microstep: 1660.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037 [2024-06-10 00:46:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.19 | bwd_microstep: 812.68 | bwd_inner_microstep: 812.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465 [2024-06-10 00:46:16,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.26 | bwd_microstep: 1216.70 | bwd_inner_microstep: 1216.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3774 [2024-06-10 00:46:18,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1500.81 | bwd_inner_microstep: 1500.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 00:46:20,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1550.47 | bwd_inner_microstep: 1550.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3391 [2024-06-10 00:46:22,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.41 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 00:46:24,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1551.78 | bwd_inner_microstep: 1551.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 00:46:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.65 [2024-06-10 00:46:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1541.62 | bwd_allreduce_microstep: 7.73 | step_microstep: 38.51 [2024-06-10 00:46:26,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16677.84 | bwd: 44685.92 | bwd_inner: 44677.13 | bwd_allreduce: 8.01 | step: 40.47 {'loss': 1.4715, 'learning_rate': 2.307692307692308e-06, 'epoch': 0.0} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 00:46:28,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1286.74 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 00:46:30,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.28 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853 [2024-06-10 00:46:32,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1365.31 | bwd_inner_microstep: 1365.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3752 [2024-06-10 00:46:33,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1404.16 | bwd_inner_microstep: 1404.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 00:46:35,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1395.40 | bwd_inner_microstep: 1395.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 00:46:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 00:46:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1527.48 | bwd_inner_microstep: 1527.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 00:46:41,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1279.71 | bwd_inner_microstep: 1279.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405 [2024-06-10 00:46:43,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1439.14 | bwd_inner_microstep: 1439.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399 [2024-06-10 00:46:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1393.30 | bwd_inner_microstep: 1393.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1894 [2024-06-10 00:46:46,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.29 | bwd_microstep: 777.37 | bwd_inner_microstep: 777.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3656 [2024-06-10 00:46:49,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.40 | bwd_microstep: 1721.57 | bwd_inner_microstep: 1721.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646 [2024-06-10 00:46:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1422.19 | bwd_inner_microstep: 1422.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 00:46:52,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.07 | bwd_microstep: 1256.97 | bwd_inner_microstep: 1256.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.23 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446 [2024-06-10 00:46:54,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1380.44 | bwd_inner_microstep: 1380.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647 [2024-06-10 00:46:56,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.42 | bwd_microstep: 1621.47 | bwd_inner_microstep: 1621.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 00:46:57,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 796.34 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 00:46:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.29 | bwd_microstep: 1258.08 | bwd_inner_microstep: 1258.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 00:47:01,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1391.20 | bwd_inner_microstep: 1391.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3846 [2024-06-10 00:47:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1529.30 | bwd_inner_microstep: 1529.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3787 [2024-06-10 00:47:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1481.67 | bwd_inner_microstep: 1481.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 00:47:07,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1601.44 | bwd_inner_microstep: 1601.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 00:47:10,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1561.95 | bwd_inner_microstep: 1561.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 00:47:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1513.23 | bwd_inner_microstep: 1513.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 00:47:14,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.43 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2278 [2024-06-10 00:47:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.17 | bwd_microstep: 942.41 | bwd_inner_microstep: 942.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 00:47:16,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.61 | bwd_microstep: 976.38 | bwd_inner_microstep: 976.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 00:47:18,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1356.07 | bwd_inner_microstep: 1356.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107 [2024-06-10 00:47:19,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 1019.33 | bwd_inner_microstep: 1019.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3581 [2024-06-10 00:47:22,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769 [2024-06-10 00:47:24,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1641.69 | bwd_inner_microstep: 1641.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-10 00:47:28,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.60 [2024-06-10 00:47:28,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 4001.37 | bwd_inner_microstep: 1642.00 | bwd_allreduce_microstep: 2359.29 | step_microstep: 39.67 [2024-06-10 00:47:28,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16206.56 | bwd: 45972.47 | bwd_inner: 43610.74 | bwd_allreduce: 2359.61 | step: 41.82 {'loss': 1.3975, 'learning_rate': 3.0769230769230774e-06, 'epoch': 0.0} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 00:47:30,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1370.76 | bwd_inner_microstep: 1370.66 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3938 [2024-06-10 00:47:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3890 [2024-06-10 00:47:35,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.19 | bwd_microstep: 1683.78 | bwd_inner_microstep: 1683.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839 [2024-06-10 00:47:37,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2297 [2024-06-10 00:47:38,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.78 | bwd_microstep: 939.37 | bwd_inner_microstep: 939.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 00:47:39,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 00:47:41,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 00:47:43,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1288.59 | bwd_inner_microstep: 1288.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 00:47:45,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897 [2024-06-10 00:47:46,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.10 | bwd_microstep: 685.90 | bwd_inner_microstep: 685.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3737 [2024-06-10 00:47:48,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.00 | bwd_microstep: 1565.85 | bwd_inner_microstep: 1565.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-10 00:47:50,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 00:47:52,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3517 [2024-06-10 00:47:54,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.32 | bwd_microstep: 1655.75 | bwd_inner_microstep: 1655.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521 [2024-06-10 00:47:56,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1596.00 | bwd_inner_microstep: 1595.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3681 [2024-06-10 00:47:58,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.72 | bwd_microstep: 1569.60 | bwd_inner_microstep: 1569.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 00:48:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.91 | bwd_microstep: 1557.51 | bwd_inner_microstep: 1557.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 00:48:02,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1343.83 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102 [2024-06-10 00:48:03,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 825.48 | bwd_inner_microstep: 825.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 00:48:06,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.79 | bwd_microstep: 1572.03 | bwd_inner_microstep: 1572.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 00:48:08,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1377.53 | bwd_inner_microstep: 1377.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930 [2024-06-10 00:48:09,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.19 | bwd_microstep: 761.27 | bwd_inner_microstep: 761.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 00:48:10,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.42 | bwd_microstep: 1396.84 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 00:48:13,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.93 | bwd_microstep: 1559.87 | bwd_inner_microstep: 1559.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 00:48:15,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.35 | bwd_microstep: 1498.73 | bwd_inner_microstep: 1498.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 00:48:17,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.84 | bwd_microstep: 1501.03 | bwd_inner_microstep: 1501.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-10 00:48:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1407.93 | bwd_inner_microstep: 1407.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 00:48:21,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1404.81 | bwd_inner_microstep: 1404.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480 [2024-06-10 00:48:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.79 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808 [2024-06-10 00:48:24,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.35 | bwd_microstep: 1358.75 | bwd_inner_microstep: 1358.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 00:48:26,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.96 | bwd_microstep: 1453.43 | bwd_inner_microstep: 1453.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 00:48:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-10 00:48:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.29 | bwd_microstep: 1642.18 | bwd_inner_microstep: 1634.32 | bwd_allreduce_microstep: 7.80 | step_microstep: 38.70 [2024-06-10 00:48:29,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.44 | bwd: 43625.39 | bwd_inner: 43616.59 | bwd_allreduce: 8.09 | step: 40.86 {'loss': 1.3749, 'learning_rate': 3.846153846153847e-06, 'epoch': 0.0} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 00:48:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1481.18 | bwd_inner_microstep: 1481.05 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4464 [2024-06-10 00:48:34,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 1306.31 | bwd_microstep: 1636.77 | bwd_inner_microstep: 1636.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3885 [2024-06-10 00:48:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-10 00:48:38,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.30 | bwd_microstep: 1349.89 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-10 00:48:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.28 | bwd_microstep: 1551.64 | bwd_inner_microstep: 1551.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 00:48:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.71 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 00:48:43,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.79 | bwd_microstep: 1288.66 | bwd_inner_microstep: 1288.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 00:48:45,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1304.94 | bwd_inner_microstep: 1304.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 00:48:47,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.44 | bwd_microstep: 1657.28 | bwd_inner_microstep: 1657.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 00:48:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.92 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3217 [2024-06-10 00:48:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1274.03 | bwd_inner_microstep: 1274.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 00:48:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.80 | bwd_microstep: 1619.71 | bwd_inner_microstep: 1619.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3672 [2024-06-10 00:48:55,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.69 | bwd_microstep: 1436.25 | bwd_inner_microstep: 1436.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-10 00:48:57,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.69 | bwd_microstep: 1451.38 | bwd_inner_microstep: 1451.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942 [2024-06-10 00:48:58,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.62 | bwd_microstep: 891.15 | bwd_inner_microstep: 891.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1916 [2024-06-10 00:49:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.90 | bwd_microstep: 844.34 | bwd_inner_microstep: 844.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651 [2024-06-10 00:49:02,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.92 | bwd_microstep: 1528.23 | bwd_inner_microstep: 1528.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 00:49:04,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1385.88 | bwd_inner_microstep: 1385.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3556 [2024-06-10 00:49:06,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.23 | bwd_microstep: 1699.45 | bwd_inner_microstep: 1699.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 00:49:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 00:49:10,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1356.30 | bwd_inner_microstep: 1356.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 00:49:11,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.63 | bwd_microstep: 817.62 | bwd_inner_microstep: 817.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1138 [2024-06-10 00:49:12,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 176.61 | bwd_microstep: 462.15 | bwd_inner_microstep: 462.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722 [2024-06-10 00:49:13,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.21 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-10 00:49:15,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 00:49:17,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1419.12 | bwd_inner_microstep: 1419.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 00:49:19,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.83 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 00:49:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3831 [2024-06-10 00:49:23,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.44 | bwd_microstep: 1723.20 | bwd_inner_microstep: 1723.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 00:49:26,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.00 | bwd_microstep: 1537.84 | bwd_inner_microstep: 1537.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581 [2024-06-10 00:49:28,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.05 | bwd_microstep: 1435.24 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3812 [2024-06-10 00:49:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 00:49:30,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.45 | bwd_microstep: 1705.02 | bwd_inner_microstep: 1697.23 | bwd_allreduce_microstep: 7.74 | step_microstep: 39.99 [2024-06-10 00:49:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17045.92 | bwd: 43723.27 | bwd_inner: 43714.52 | bwd_allreduce: 8.01 | step: 41.89 {'loss': 1.3896, 'learning_rate': 4.615384615384616e-06, 'epoch': 0.0} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 00:49:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1347.86 | bwd_inner_microstep: 1347.64 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907 [2024-06-10 00:49:34,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.22 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 00:49:36,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1503.11 | bwd_inner_microstep: 1503.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 00:49:38,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1652.65 | bwd_inner_microstep: 1652.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4188 [2024-06-10 00:49:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.36 | bwd_microstep: 1593.59 | bwd_inner_microstep: 1593.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-10 00:49:43,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1553.15 | bwd_inner_microstep: 1553.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 00:49:44,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.11 | bwd_microstep: 1148.55 | bwd_inner_microstep: 1148.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 00:49:45,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 700.47 | bwd_inner_microstep: 700.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3489 [2024-06-10 00:49:47,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.37 | bwd_microstep: 1353.44 | bwd_inner_microstep: 1353.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3690 [2024-06-10 00:49:49,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.54 | bwd_microstep: 1592.27 | bwd_inner_microstep: 1592.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3718 [2024-06-10 00:49:52,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.69 | bwd_microstep: 1730.90 | bwd_inner_microstep: 1730.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 00:49:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.21 | bwd_microstep: 1388.09 | bwd_inner_microstep: 1388.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2027 [2024-06-10 00:49:55,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.91 | bwd_microstep: 843.61 | bwd_inner_microstep: 843.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 00:49:57,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.46 | bwd_microstep: 1520.99 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681 [2024-06-10 00:49:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.79 | bwd_microstep: 1336.15 | bwd_inner_microstep: 1336.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2437 [2024-06-10 00:50:00,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.32 | bwd_microstep: 950.11 | bwd_inner_microstep: 950.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 00:50:02,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176 [2024-06-10 00:50:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.21 | bwd_microstep: 861.16 | bwd_inner_microstep: 861.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 00:50:05,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.90 | bwd_microstep: 1289.96 | bwd_inner_microstep: 1289.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 00:50:07,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1459.73 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2179 [2024-06-10 00:50:08,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.29 | bwd_microstep: 860.30 | bwd_inner_microstep: 860.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 00:50:10,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.13 | bwd_microstep: 1291.51 | bwd_inner_microstep: 1291.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 00:50:12,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.88 | bwd_microstep: 1286.31 | bwd_inner_microstep: 1286.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 00:50:13,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.27 | bwd_microstep: 804.03 | bwd_inner_microstep: 804.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527 [2024-06-10 00:50:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.39 | bwd_microstep: 1557.42 | bwd_inner_microstep: 1557.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814 [2024-06-10 00:50:17,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1585.63 | bwd_inner_microstep: 1585.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2090 [2024-06-10 00:50:19,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.32 | bwd_microstep: 953.83 | bwd_inner_microstep: 953.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3428 [2024-06-10 00:50:21,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1397.49 | bwd_inner_microstep: 1397.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2265 [2024-06-10 00:50:22,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.85 | bwd_microstep: 1070.62 | bwd_inner_microstep: 1070.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036 [2024-06-10 00:50:23,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.86 | bwd_microstep: 911.50 | bwd_inner_microstep: 911.22 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462 [2024-06-10 00:50:25,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.82 | bwd_microstep: 1567.55 | bwd_inner_microstep: 1567.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 00:50:33,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.35 | optimizer_step: 6.60 [2024-06-10 00:50:33,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 6465.99 | bwd_inner_microstep: 1753.19 | bwd_allreduce_microstep: 4712.72 | step_microstep: 39.88 [2024-06-10 00:50:33,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15632.28 | bwd: 46660.78 | bwd_inner: 41946.81 | bwd_allreduce: 4713.14 | step: 42.22 0%| | 0/1726 [00:00> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200 [INFO|configuration_utils.py:473] 2024-06-10 04:09:31,270 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/config.json [INFO|configuration_utils.py:594] 2024-06-10 04:09:31,272 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 04:09:40,155 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 04:09:40,183 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 04:09:40,185 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 04:09:40,185 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/added_tokens.json [2024-06-10 04:09:40,442] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step200 is about to be saved! [2024-06-10 04:09:40,452] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt [2024-06-10 04:09:40,453] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt... [2024-06-10 04:09:49,901] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/mp_rank_00_model_states.pt. [2024-06-10 04:09:49,912] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 04:10:02,919] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 04:10:02,926] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-200/global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 04:10:02,926] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step200 is ready now! dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 04:10:05,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.97 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 04:10:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1272.03 | bwd_inner_microstep: 1272.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3853 [2024-06-10 04:10:08,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.74 | bwd_microstep: 1387.38 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 04:10:10,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.05 | bwd_microstep: 1544.19 | bwd_inner_microstep: 1544.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805 [2024-06-10 04:10:12,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.16 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 04:10:14,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.27 | bwd_microstep: 1243.25 | bwd_inner_microstep: 1243.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 04:10:16,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.34 | bwd_microstep: 1249.24 | bwd_inner_microstep: 1249.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 04:10:17,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888 [2024-06-10 04:10:18,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.62 | bwd_microstep: 714.37 | bwd_inner_microstep: 714.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 04:10:20,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.14 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509 [2024-06-10 04:10:22,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1445.12 | bwd_inner_microstep: 1445.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 04:10:24,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 04:10:26,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1447.29 | bwd_inner_microstep: 1447.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460 [2024-06-10 04:10:28,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.63 | bwd_microstep: 1434.05 | bwd_inner_microstep: 1434.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386 [2024-06-10 04:10:30,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1432.28 | bwd_inner_microstep: 1432.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 04:10:32,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1582.91 | bwd_inner_microstep: 1582.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2038 [2024-06-10 04:10:33,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.92 | bwd_microstep: 717.66 | bwd_inner_microstep: 717.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3636 [2024-06-10 04:10:35,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1468.68 | bwd_inner_microstep: 1468.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651 [2024-06-10 04:10:37,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1425.99 | bwd_inner_microstep: 1425.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828 [2024-06-10 04:10:39,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.74 | bwd_microstep: 1586.81 | bwd_inner_microstep: 1586.36 | bwd_allreduce_microstep: 0.23 | step_microstep: 0.33 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3551 [2024-06-10 04:10:41,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1336.69 | bwd_inner_microstep: 1336.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696 [2024-06-10 04:10:43,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1535.25 | bwd_inner_microstep: 1535.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971 [2024-06-10 04:10:44,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.30 | bwd_microstep: 705.35 | bwd_inner_microstep: 705.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957 [2024-06-10 04:10:45,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.91 | bwd_microstep: 703.12 | bwd_inner_microstep: 703.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 04:10:47,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1419.01 | bwd_inner_microstep: 1418.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2028 [2024-06-10 04:10:48,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.56 | bwd_microstep: 842.34 | bwd_inner_microstep: 842.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 04:10:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.13 | bwd_microstep: 1567.15 | bwd_inner_microstep: 1567.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016 [2024-06-10 04:10:51,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.95 | bwd_microstep: 712.29 | bwd_inner_microstep: 712.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811 [2024-06-10 04:10:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.84 | bwd_microstep: 1752.79 | bwd_inner_microstep: 1752.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 04:10:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.46 | bwd_microstep: 1601.34 | bwd_inner_microstep: 1601.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2261 [2024-06-10 04:10:58,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.32 | bwd_microstep: 1070.21 | bwd_inner_microstep: 1070.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233 [2024-06-10 04:11:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.28 | optimizer_step: 6.61 [2024-06-10 04:11:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.89 | bwd_microstep: 5621.63 | bwd_inner_microstep: 1207.06 | bwd_allreduce_microstep: 4414.50 | step_microstep: 39.45 [2024-06-10 04:11:04,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.62 | bwd: 45416.79 | bwd_inner: 41000.77 | bwd_allreduce: 4415.12 | step: 41.57 {'loss': 1.3679, 'learning_rate': 3.9223163412160784e-05, 'epoch': 0.12} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1862 [2024-06-10 04:11:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.84 | bwd_microstep: 762.63 | bwd_inner_microstep: 762.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 04:11:06,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.71 | bwd_microstep: 676.86 | bwd_inner_microstep: 676.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473 [2024-06-10 04:11:08,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.36 | bwd_microstep: 1408.27 | bwd_inner_microstep: 1408.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764 [2024-06-10 04:11:10,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1436.28 | bwd_inner_microstep: 1436.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398 [2024-06-10 04:11:11,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1372.50 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 04:11:13,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 04:11:15,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:11:17,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 04:11:19,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 04:11:21,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 1396.81 | bwd_inner_microstep: 1396.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-10 04:11:23,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1325.18 | bwd_inner_microstep: 1325.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3683 [2024-06-10 04:11:25,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.28 | bwd_microstep: 1374.87 | bwd_inner_microstep: 1374.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1943 [2024-06-10 04:11:26,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.61 | bwd_microstep: 824.23 | bwd_inner_microstep: 824.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1907 [2024-06-10 04:11:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.78 | bwd_microstep: 813.16 | bwd_inner_microstep: 813.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522 [2024-06-10 04:11:29,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 04:11:31,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1379.94 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823 [2024-06-10 04:11:33,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.39 | bwd_microstep: 1505.61 | bwd_inner_microstep: 1505.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 04:11:35,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 04:11:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1348.46 | bwd_inner_microstep: 1348.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 04:11:39,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.18 | bwd_microstep: 1490.73 | bwd_inner_microstep: 1490.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 04:11:41,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126 [2024-06-10 04:11:42,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.17 | bwd_microstep: 929.71 | bwd_inner_microstep: 929.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597 [2024-06-10 04:11:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.90 | bwd_microstep: 1574.04 | bwd_inner_microstep: 1574.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-10 04:11:46,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.79 | bwd_microstep: 1162.32 | bwd_inner_microstep: 1162.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513 [2024-06-10 04:11:48,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1321.78 | bwd_inner_microstep: 1321.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 04:11:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1354.24 | bwd_inner_microstep: 1354.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899 [2024-06-10 04:11:51,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.14 | bwd_microstep: 715.14 | bwd_inner_microstep: 715.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-10 04:11:52,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.51 | bwd_microstep: 1309.86 | bwd_inner_microstep: 1309.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729 [2024-06-10 04:11:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1370.21 | bwd_inner_microstep: 1370.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 04:11:56,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1279.36 | bwd_inner_microstep: 1279.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 04:11:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.86 | bwd_microstep: 1449.50 | bwd_inner_microstep: 1449.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 04:12:05,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 04:12:05,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 6623.09 | bwd_inner_microstep: 1559.45 | bwd_allreduce_microstep: 5063.57 | step_microstep: 39.91 [2024-06-10 04:12:05,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15350.63 | bwd: 45972.87 | bwd_inner: 40908.23 | bwd_allreduce: 5063.88 | step: 41.81 {'loss': 1.3516, 'learning_rate': 3.921277026263959e-05, 'epoch': 0.12} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 04:12:07,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.08 | bwd_microstep: 1465.62 | bwd_inner_microstep: 1465.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2325 [2024-06-10 04:12:09,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.09 | bwd_microstep: 916.72 | bwd_inner_microstep: 916.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451 [2024-06-10 04:12:11,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1412.11 | bwd_inner_microstep: 1412.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 04:12:12,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.36 | bwd_microstep: 1384.36 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-10 04:12:15,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.76 | bwd_microstep: 1642.73 | bwd_inner_microstep: 1642.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 04:12:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.04 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2888 [2024-06-10 04:12:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.93 | bwd_microstep: 1058.39 | bwd_inner_microstep: 1058.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-10 04:12:20,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1528.27 | bwd_inner_microstep: 1528.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 04:12:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.02 | bwd_microstep: 1298.86 | bwd_inner_microstep: 1298.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3607 [2024-06-10 04:12:24,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.89 | bwd_microstep: 1489.09 | bwd_inner_microstep: 1489.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-10 04:12:26,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.05 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 04:12:28,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1294.66 | bwd_inner_microstep: 1294.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2372 [2024-06-10 04:12:29,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.10 | bwd_microstep: 1095.27 | bwd_inner_microstep: 1095.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637 [2024-06-10 04:12:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3645 [2024-06-10 04:12:33,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 04:12:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1509.48 | bwd_inner_microstep: 1509.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3670 [2024-06-10 04:12:37,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.36 | bwd_microstep: 1477.00 | bwd_inner_microstep: 1476.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832 [2024-06-10 04:12:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1389.67 | bwd_inner_microstep: 1389.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 04:12:41,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1388.29 | bwd_inner_microstep: 1388.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 04:12:43,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.41 | bwd_microstep: 1161.22 | bwd_inner_microstep: 1161.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2295 [2024-06-10 04:12:44,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.59 | bwd_microstep: 817.71 | bwd_inner_microstep: 817.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 04:12:46,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1981 [2024-06-10 04:12:47,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.56 | bwd_microstep: 705.97 | bwd_inner_microstep: 705.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068 [2024-06-10 04:12:48,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.43 | bwd_microstep: 917.08 | bwd_inner_microstep: 917.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 04:12:50,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.16 | bwd_microstep: 1461.24 | bwd_inner_microstep: 1461.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452 [2024-06-10 04:12:52,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1453.05 | bwd_inner_microstep: 1453.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3825 [2024-06-10 04:12:54,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.57 | bwd_microstep: 1619.14 | bwd_inner_microstep: 1619.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219 [2024-06-10 04:12:55,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.34 | bwd_microstep: 865.39 | bwd_inner_microstep: 865.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 04:12:57,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.67 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922 [2024-06-10 04:12:59,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.60 | bwd_microstep: 790.26 | bwd_inner_microstep: 790.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801 [2024-06-10 04:13:01,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.78 | bwd_microstep: 1644.48 | bwd_inner_microstep: 1644.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-10 04:13:06,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 04:13:06,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 4319.28 | bwd_inner_microstep: 1639.48 | bwd_allreduce_microstep: 2679.75 | step_microstep: 39.65 [2024-06-10 04:13:06,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15627.53 | bwd: 44495.73 | bwd_inner: 41814.95 | bwd_allreduce: 2680.05 | step: 41.35 {'loss': 1.3206, 'learning_rate': 3.920230944584141e-05, 'epoch': 0.12} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 04:13:08,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.36 | bwd_microstep: 1331.13 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3979 [2024-06-10 04:13:10,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1502.37 | bwd_inner_microstep: 1502.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 04:13:11,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1242.74 | bwd_inner_microstep: 1242.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 04:13:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1451.85 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4180 [2024-06-10 04:13:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.74 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 04:13:17,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.58 | bwd_microstep: 1186.99 | bwd_inner_microstep: 1186.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 04:13:19,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1301.92 | bwd_inner_microstep: 1301.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708 [2024-06-10 04:13:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 1433.00 | bwd_inner_microstep: 1432.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 04:13:23,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.13 | bwd_microstep: 1187.96 | bwd_inner_microstep: 1187.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 04:13:24,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.68 | bwd_microstep: 1153.26 | bwd_inner_microstep: 1153.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 04:13:26,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.55 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 04:13:28,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.50 | bwd_inner_microstep: 1253.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 04:13:30,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1521.24 | bwd_inner_microstep: 1521.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 04:13:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1289.05 | bwd_inner_microstep: 1289.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1264 [2024-06-10 04:13:32,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.35 | bwd_microstep: 519.17 | bwd_inner_microstep: 519.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2603 [2024-06-10 04:13:34,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.39 | bwd_microstep: 1060.20 | bwd_inner_microstep: 1060.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 04:13:36,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1528.57 | bwd_inner_microstep: 1528.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 04:13:38,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1398.87 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925 [2024-06-10 04:13:39,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.45 | bwd_microstep: 697.80 | bwd_inner_microstep: 697.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 04:13:41,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1663.24 | bwd_inner_microstep: 1663.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 04:13:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.18 | bwd_microstep: 1184.07 | bwd_inner_microstep: 1184.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142 [2024-06-10 04:13:44,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.00 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 04:13:46,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.72 | bwd_microstep: 1662.47 | bwd_inner_microstep: 1662.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 04:13:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1310.76 | bwd_inner_microstep: 1310.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 4012 [2024-06-10 04:13:51,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.16 | bwd_microstep: 1857.79 | bwd_inner_microstep: 1857.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430 [2024-06-10 04:13:53,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.93 | bwd_microstep: 1377.47 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 04:13:54,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1257.66 | bwd_inner_microstep: 1257.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 04:13:57,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1656.24 | bwd_inner_microstep: 1656.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3433 [2024-06-10 04:13:59,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1393.85 | bwd_inner_microstep: 1393.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3780 [2024-06-10 04:14:01,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1495.77 | bwd_inner_microstep: 1495.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3543 [2024-06-10 04:14:03,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.68 | bwd_microstep: 1537.64 | bwd_inner_microstep: 1537.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3413 [2024-06-10 04:14:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 04:14:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.59 | bwd_microstep: 3623.60 | bwd_inner_microstep: 1717.10 | bwd_allreduce_microstep: 1906.44 | step_microstep: 38.79 [2024-06-10 04:14:07,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16041.89 | bwd: 44810.04 | bwd_inner: 42902.67 | bwd_allreduce: 1906.67 | step: 40.48 {'loss': 1.3946, 'learning_rate': 3.919178099860918e-05, 'epoch': 0.12} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 04:14:09,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399 [2024-06-10 04:14:10,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.66 | bwd_microstep: 1000.37 | bwd_inner_microstep: 1000.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3855 [2024-06-10 04:14:12,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.19 | bwd_microstep: 1457.47 | bwd_inner_microstep: 1457.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 04:14:14,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 04:14:16,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1252.60 | bwd_inner_microstep: 1252.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 04:14:18,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 04:14:20,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1241.23 | bwd_inner_microstep: 1241.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 04:14:21,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 792.08 | bwd_inner_microstep: 792.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:14:23,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.70 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 04:14:24,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922 [2024-06-10 04:14:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.34 | bwd_microstep: 698.97 | bwd_inner_microstep: 698.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-10 04:14:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.03 | bwd_microstep: 1313.14 | bwd_inner_microstep: 1313.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3834 [2024-06-10 04:14:29,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1522.02 | bwd_inner_microstep: 1522.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3274 [2024-06-10 04:14:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1892 [2024-06-10 04:14:32,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 804.69 | bwd_inner_microstep: 804.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3634 [2024-06-10 04:14:35,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1572.79 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-10 04:14:36,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.68 | bwd_microstep: 1405.13 | bwd_inner_microstep: 1405.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2301 [2024-06-10 04:14:38,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.25 | bwd_microstep: 882.83 | bwd_inner_microstep: 882.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556 [2024-06-10 04:14:40,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.51 | bwd_microstep: 1429.57 | bwd_inner_microstep: 1429.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2389 [2024-06-10 04:14:41,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.06 | bwd_microstep: 845.24 | bwd_inner_microstep: 845.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 04:14:43,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.02 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 04:14:44,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.36 | bwd_microstep: 706.60 | bwd_inner_microstep: 706.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 04:14:46,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.23 | bwd_microstep: 1657.29 | bwd_inner_microstep: 1657.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 04:14:48,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1493.77 | bwd_inner_microstep: 1493.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 04:14:50,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1495.41 | bwd_inner_microstep: 1495.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 04:14:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1461.69 | bwd_inner_microstep: 1461.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 04:14:55,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1630.49 | bwd_inner_microstep: 1630.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-10 04:14:57,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1604.49 | bwd_inner_microstep: 1604.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-10 04:14:59,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.53 | bwd_microstep: 1600.93 | bwd_inner_microstep: 1600.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022 [2024-06-10 04:15:00,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.48 | bwd_microstep: 810.26 | bwd_inner_microstep: 810.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3610 [2024-06-10 04:15:02,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1436.52 | bwd_inner_microstep: 1436.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 04:15:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.31 | optimizer_step: 6.61 [2024-06-10 04:15:08,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 5502.22 | bwd_inner_microstep: 1575.42 | bwd_allreduce_microstep: 3926.73 | step_microstep: 39.37 [2024-06-10 04:15:08,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15481.47 | bwd: 45373.28 | bwd_inner: 41445.63 | bwd_allreduce: 3926.97 | step: 41.00 {'loss': 1.2976, 'learning_rate': 3.9181184958024045e-05, 'epoch': 0.12} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3445 [2024-06-10 04:15:10,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.29 | bwd_microstep: 1303.01 | bwd_inner_microstep: 1302.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2928 [2024-06-10 04:15:12,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.70 | bwd_microstep: 1133.38 | bwd_inner_microstep: 1133.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-10 04:15:14,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.47 | bwd_microstep: 1639.25 | bwd_inner_microstep: 1639.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 04:15:15,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 04:15:18,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1537.15 | bwd_inner_microstep: 1537.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 04:15:19,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.99 | bwd_microstep: 793.75 | bwd_inner_microstep: 793.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3757 [2024-06-10 04:15:21,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 04:15:23,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.20 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 04:15:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.43 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-10 04:15:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.26 | bwd_microstep: 713.75 | bwd_inner_microstep: 713.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491 [2024-06-10 04:15:28,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.59 | bwd_microstep: 1442.28 | bwd_inner_microstep: 1442.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 04:15:30,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1500.65 | bwd_inner_microstep: 1500.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2090 [2024-06-10 04:15:31,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.61 | bwd_microstep: 944.86 | bwd_inner_microstep: 944.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 04:15:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 1338.71 | bwd_inner_microstep: 1338.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1951 [2024-06-10 04:15:34,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 825.12 | bwd_inner_microstep: 825.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 04:15:36,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1289.38 | bwd_inner_microstep: 1289.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 04:15:37,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1257.52 | bwd_inner_microstep: 1257.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1993 [2024-06-10 04:15:38,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 737.94 | bwd_inner_microstep: 737.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 04:15:40,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1397.38 | bwd_inner_microstep: 1397.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-10 04:15:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.14 | bwd_microstep: 1309.46 | bwd_inner_microstep: 1309.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077 [2024-06-10 04:15:43,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.22 | bwd_microstep: 915.45 | bwd_inner_microstep: 915.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 04:15:45,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-10 04:15:47,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.92 | bwd_microstep: 1432.02 | bwd_inner_microstep: 1432.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 04:15:49,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.67 | bwd_microstep: 1315.93 | bwd_inner_microstep: 1315.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2020 [2024-06-10 04:15:50,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.93 | bwd_microstep: 853.64 | bwd_inner_microstep: 853.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 04:15:52,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.93 | bwd_microstep: 1498.13 | bwd_inner_microstep: 1498.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060 [2024-06-10 04:15:54,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.94 | bwd_microstep: 847.42 | bwd_inner_microstep: 847.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 04:15:56,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812 [2024-06-10 04:15:58,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.37 | bwd_microstep: 1620.22 | bwd_inner_microstep: 1620.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2270 [2024-06-10 04:15:59,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.56 | bwd_microstep: 935.20 | bwd_inner_microstep: 935.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-10 04:16:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1244.10 | bwd_inner_microstep: 1244.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801 [2024-06-10 04:16:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.35 | optimizer_step: 6.58 [2024-06-10 04:16:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 8140.05 | bwd_inner_microstep: 1753.47 | bwd_allreduce_microstep: 6386.51 | step_microstep: 39.65 [2024-06-10 04:16:10,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14870.37 | bwd: 46218.56 | bwd_inner: 39831.08 | bwd_allreduce: 6386.78 | step: 41.30 {'loss': 1.2942, 'learning_rate': 3.9170521361405206e-05, 'epoch': 0.12} 12%|█▏ | 201/1726 [3:33:40<30:24:55, 71.80s/it] 12%|█▏ | 201/1726 [3:33:40<30:24:55, 71.80s/it] 12%|█▏ | 202/1726 [3:34:42<29:06:37, 68.76s/it] 12%|█▏ | 202/1726 [3:34:42<29:06:37, 68.76s/it] 12%|█▏ | 203/1726 [3:35:42<28:02:19, 66.28s/it] 12%|█▏ | 203/1726 [3:35:42<28:02:19, 66.28s/it] 12%|█▏ | 204/1726 [3:36:44<27:22:36, 64.75s/it] 12%|█▏ | 204/1726 [3:36:44<27:22:36, 64.75s/it] 12%|█▏ | 205/1726 [3:37:45<26:54:30, 63.69s/it] 12%|█▏ | 205/1726 [3:37:45<26:54:30, 63.69s/it] 12%|█▏ | 206/1726 [3:38:46<26:36:17, 63.01s/it] 12%|█▏ | dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 04:16:11,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1342.14 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2403 [2024-06-10 04:16:13,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.33 | bwd_microstep: 998.92 | bwd_inner_microstep: 998.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915 [2024-06-10 04:16:15,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1494.89 | bwd_inner_microstep: 1494.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2260 [2024-06-10 04:16:16,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 871.55 | bwd_inner_microstep: 871.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 04:16:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1541.75 | bwd_inner_microstep: 1541.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763 [2024-06-10 04:16:20,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 04:16:22,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 04:16:24,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1252.44 | bwd_inner_microstep: 1252.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 04:16:26,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 04:16:28,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.33 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3494 [2024-06-10 04:16:30,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1466.69 | bwd_inner_microstep: 1466.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489 [2024-06-10 04:16:31,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1316.44 | bwd_inner_microstep: 1316.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 04:16:33,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1350.21 | bwd_inner_microstep: 1350.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3681 [2024-06-10 04:16:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.60 | bwd_microstep: 1667.47 | bwd_inner_microstep: 1667.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3387 [2024-06-10 04:16:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.62 | bwd_microstep: 1177.34 | bwd_inner_microstep: 1177.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493 [2024-06-10 04:16:39,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1440.90 | bwd_inner_microstep: 1440.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 04:16:41,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1289.25 | bwd_inner_microstep: 1289.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 04:16:43,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1492.23 | bwd_inner_microstep: 1492.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3617 [2024-06-10 04:16:46,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.19 | bwd_microstep: 1812.94 | bwd_inner_microstep: 1812.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3530 [2024-06-10 04:16:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 04:16:49,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-10 04:16:51,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 04:16:53,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.04 | bwd_microstep: 1458.53 | bwd_inner_microstep: 1458.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997 [2024-06-10 04:16:54,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.60 | bwd_microstep: 737.63 | bwd_inner_microstep: 737.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 04:16:56,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3453 [2024-06-10 04:16:58,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1416.32 | bwd_inner_microstep: 1416.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 04:17:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1405.00 | bwd_inner_microstep: 1404.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443 [2024-06-10 04:17:01,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.38 | bwd_microstep: 1156.64 | bwd_inner_microstep: 1156.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:17:04,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1555.24 | bwd_inner_microstep: 1555.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 04:17:06,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 04:17:08,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.61 | bwd_microstep: 1502.15 | bwd_inner_microstep: 1502.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924 [2024-06-10 04:17:11,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 04:17:11,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.45 | bwd_microstep: 2987.42 | bwd_inner_microstep: 892.05 | bwd_allreduce_microstep: 2095.32 | step_microstep: 40.86 [2024-06-10 04:17:11,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16010.21 | bwd: 44996.33 | bwd_inner: 42899.95 | bwd_allreduce: 2095.64 | step: 42.57 {'loss': 1.3435, 'learning_rate': 3.915979024630978e-05, 'epoch': 0.12} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2428 [2024-06-10 04:17:12,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.85 | bwd_microstep: 996.47 | bwd_inner_microstep: 996.36 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3903 [2024-06-10 04:17:15,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1653.72 | bwd_inner_microstep: 1653.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 04:17:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1382.84 | bwd_inner_microstep: 1382.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 04:17:19,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.58 | bwd_microstep: 1653.79 | bwd_inner_microstep: 1653.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418 [2024-06-10 04:17:21,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1309.19 | bwd_inner_microstep: 1309.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2900 [2024-06-10 04:17:22,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.29 | bwd_microstep: 1093.90 | bwd_inner_microstep: 1093.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951 [2024-06-10 04:17:23,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.28 | bwd_microstep: 730.54 | bwd_inner_microstep: 730.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1872 [2024-06-10 04:17:24,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.67 | bwd_microstep: 741.94 | bwd_inner_microstep: 741.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 04:17:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1387.98 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681 [2024-06-10 04:17:28,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.79 | bwd_microstep: 1288.86 | bwd_inner_microstep: 1288.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-10 04:17:30,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1279.83 | bwd_inner_microstep: 1279.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3626 [2024-06-10 04:17:32,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1463.23 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505 [2024-06-10 04:17:33,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.08 | bwd_microstep: 1317.39 | bwd_inner_microstep: 1317.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975 [2024-06-10 04:17:35,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.32 | bwd_microstep: 855.18 | bwd_inner_microstep: 855.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 04:17:37,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2127 [2024-06-10 04:17:38,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.55 | bwd_microstep: 926.05 | bwd_inner_microstep: 926.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647 [2024-06-10 04:17:40,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.13 | bwd_microstep: 1536.39 | bwd_inner_microstep: 1536.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 04:17:42,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 1607.87 | bwd_inner_microstep: 1607.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 04:17:44,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1290.23 | bwd_inner_microstep: 1290.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839 [2024-06-10 04:17:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1456.83 | bwd_inner_microstep: 1456.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-10 04:17:48,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.57 | bwd_microstep: 1488.31 | bwd_inner_microstep: 1488.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130 [2024-06-10 04:17:49,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.45 | bwd_microstep: 928.28 | bwd_inner_microstep: 928.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 04:17:51,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1489.30 | bwd_inner_microstep: 1489.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3847 [2024-06-10 04:17:54,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1563.69 | bwd_inner_microstep: 1563.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 04:17:55,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185 [2024-06-10 04:17:57,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.98 | bwd_microstep: 953.25 | bwd_inner_microstep: 953.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 04:17:59,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.86 | bwd_microstep: 1459.49 | bwd_inner_microstep: 1459.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444 [2024-06-10 04:18:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.81 | bwd_microstep: 1216.99 | bwd_inner_microstep: 1216.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3566 [2024-06-10 04:18:03,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.23 | bwd_microstep: 1456.58 | bwd_inner_microstep: 1456.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 04:18:05,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.31 | bwd_microstep: 1538.21 | bwd_inner_microstep: 1538.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3474 [2024-06-10 04:18:07,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1508.60 | bwd_inner_microstep: 1508.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 04:18:12,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.27 | optimizer_step: 6.56 [2024-06-10 04:18:12,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 4400.73 | bwd_inner_microstep: 1680.83 | bwd_allreduce_microstep: 2719.85 | step_microstep: 40.83 [2024-06-10 04:18:12,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15682.91 | bwd: 44761.56 | bwd_inner: 42040.69 | bwd_allreduce: 2720.13 | step: 42.55 {'loss': 1.3507, 'learning_rate': 3.914899165053272e-05, 'epoch': 0.12} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868 [2024-06-10 04:18:13,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.41 | bwd_microstep: 699.82 | bwd_inner_microstep: 699.68 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 04:18:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4296 [2024-06-10 04:18:17,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.74 | bwd_microstep: 1583.43 | bwd_inner_microstep: 1583.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-10 04:18:19,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.42 | bwd_microstep: 1452.73 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:18:21,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1551.15 | bwd_inner_microstep: 1551.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948 [2024-06-10 04:18:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 790.73 | bwd_inner_microstep: 790.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881 [2024-06-10 04:18:23,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 681.01 | bwd_inner_microstep: 680.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 04:18:25,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.21 | bwd_microstep: 1284.81 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3720 [2024-06-10 04:18:27,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.42 | bwd_microstep: 1364.67 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 04:18:28,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.25 | bwd_microstep: 799.59 | bwd_inner_microstep: 799.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2120 [2024-06-10 04:18:29,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.24 | bwd_microstep: 767.07 | bwd_inner_microstep: 767.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 04:18:31,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1382.67 | bwd_inner_microstep: 1382.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 04:18:33,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1336.86 | bwd_inner_microstep: 1336.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689 [2024-06-10 04:18:35,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 1625.61 | bwd_inner_microstep: 1625.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 04:18:37,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1519.59 | bwd_inner_microstep: 1519.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 04:18:38,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.59 | bwd_microstep: 686.34 | bwd_inner_microstep: 686.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521 [2024-06-10 04:18:40,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.56 | bwd_microstep: 1425.04 | bwd_inner_microstep: 1425.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3532 [2024-06-10 04:18:42,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1228.63 | bwd_inner_microstep: 1228.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 04:18:43,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1245.89 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005 [2024-06-10 04:18:45,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.60 | bwd_microstep: 897.38 | bwd_inner_microstep: 897.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 04:18:46,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1276.41 | bwd_inner_microstep: 1276.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3690 [2024-06-10 04:18:48,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.25 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2282 [2024-06-10 04:18:50,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.04 | bwd_microstep: 1036.62 | bwd_inner_microstep: 1036.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 04:18:52,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1501.40 | bwd_inner_microstep: 1501.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 04:18:54,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1425.51 | bwd_inner_microstep: 1425.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 04:18:55,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1252.08 | bwd_inner_microstep: 1252.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 04:18:57,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1405.20 | bwd_inner_microstep: 1405.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 04:18:59,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1396.45 | bwd_inner_microstep: 1396.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 04:19:00,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.49 | bwd_microstep: 804.02 | bwd_inner_microstep: 803.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 04:19:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.02 | bwd_microstep: 1384.23 | bwd_inner_microstep: 1384.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 04:19:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280 [2024-06-10 04:19:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.39 | optimizer_step: 6.58 [2024-06-10 04:19:15,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.31 | bwd_microstep: 9938.67 | bwd_inner_microstep: 1139.51 | bwd_allreduce_microstep: 8799.09 | step_microstep: 39.73 [2024-06-10 04:19:15,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14689.21 | bwd: 47978.17 | bwd_inner: 39178.03 | bwd_allreduce: 8799.38 | step: 41.35 {'loss': 1.2879, 'learning_rate': 3.91381256121066e-05, 'epoch': 0.12} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 04:19:17,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 1465.54 | bwd_inner_microstep: 1465.46 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504 [2024-06-10 04:19:18,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.99 | bwd_microstep: 1186.78 | bwd_inner_microstep: 1186.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-10 04:19:21,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.53 | bwd_microstep: 1580.88 | bwd_inner_microstep: 1580.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 04:19:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1375.23 | bwd_inner_microstep: 1375.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 461 [2024-06-10 04:19:23,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 91.34 | bwd_microstep: 226.67 | bwd_inner_microstep: 226.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2226 [2024-06-10 04:19:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.00 | bwd_microstep: 894.16 | bwd_inner_microstep: 894.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2627 [2024-06-10 04:19:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.90 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975 [2024-06-10 04:19:27,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.25 | bwd_microstep: 857.33 | bwd_inner_microstep: 857.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 04:19:28,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1253.89 | bwd_inner_microstep: 1253.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 04:19:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.23 | bwd_microstep: 797.47 | bwd_inner_microstep: 797.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2444 [2024-06-10 04:19:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.66 | bwd_microstep: 1017.71 | bwd_inner_microstep: 1017.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 04:19:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1390.89 | bwd_inner_microstep: 1390.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4055 [2024-06-10 04:19:35,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.32 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3517 [2024-06-10 04:19:37,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1366.68 | bwd_inner_microstep: 1366.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2124 [2024-06-10 04:19:38,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.23 | bwd_microstep: 1023.55 | bwd_inner_microstep: 1023.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2509 [2024-06-10 04:19:40,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.85 | bwd_microstep: 995.62 | bwd_inner_microstep: 995.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 04:19:42,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1491.68 | bwd_inner_microstep: 1491.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 04:19:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.98 | bwd_microstep: 1384.43 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-10 04:19:45,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 792.27 | bwd_inner_microstep: 792.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501 [2024-06-10 04:19:46,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.71 | bwd_microstep: 1225.44 | bwd_inner_microstep: 1225.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 04:19:48,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1393.94 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604 [2024-06-10 04:19:51,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1609.92 | bwd_inner_microstep: 1609.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142 [2024-06-10 04:19:52,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.15 | bwd_microstep: 835.08 | bwd_inner_microstep: 835.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 04:19:54,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1401.48 | bwd_inner_microstep: 1401.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 04:19:56,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1392.59 | bwd_inner_microstep: 1392.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:19:58,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1555.13 | bwd_inner_microstep: 1555.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808 [2024-06-10 04:20:00,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824 [2024-06-10 04:20:02,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.84 | bwd_microstep: 1755.05 | bwd_inner_microstep: 1755.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818 [2024-06-10 04:20:04,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.52 | bwd_microstep: 1614.90 | bwd_inner_microstep: 1614.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 04:20:06,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1551.58 | bwd_inner_microstep: 1551.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 04:20:09,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.14 | bwd_microstep: 1603.25 | bwd_inner_microstep: 1603.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581 [2024-06-10 04:20:15,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.35 | optimizer_step: 6.60 [2024-06-10 04:20:15,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.11 | bwd_microstep: 5787.65 | bwd_inner_microstep: 1493.09 | bwd_allreduce_microstep: 4294.50 | step_microstep: 39.38 [2024-06-10 04:20:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15099.05 | bwd: 44791.48 | bwd_inner: 40496.00 | bwd_allreduce: 4294.77 | step: 41.02 {'loss': 1.3586, 'learning_rate': 3.912719216930157e-05, 'epoch': 0.12} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484 [2024-06-10 04:20:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.17 | bwd_microstep: 1568.06 | bwd_inner_microstep: 1567.98 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4001 [2024-06-10 04:20:19,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1506.32 | bwd_inner_microstep: 1506.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-10 04:20:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.93 | bwd_microstep: 1561.62 | bwd_inner_microstep: 1561.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 04:20:23,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 04:20:25,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.19 | bwd_microstep: 1485.82 | bwd_inner_microstep: 1485.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2904 [2024-06-10 04:20:27,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.63 | bwd_microstep: 1186.58 | bwd_inner_microstep: 1186.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 04:20:29,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3594 [2024-06-10 04:20:31,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.82 | bwd_microstep: 1260.39 | bwd_inner_microstep: 1260.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3451 [2024-06-10 04:20:32,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.77 | bwd_microstep: 1222.69 | bwd_inner_microstep: 1222.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 04:20:34,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 805.09 | bwd_inner_microstep: 805.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3504 [2024-06-10 04:20:35,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1351.38 | bwd_inner_microstep: 1351.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 04:20:38,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.24 | bwd_microstep: 1524.94 | bwd_inner_microstep: 1524.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-10 04:20:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1447.82 | bwd_inner_microstep: 1447.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421 [2024-06-10 04:20:41,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.25 | bwd_microstep: 1311.35 | bwd_inner_microstep: 1311.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 04:20:43,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1347.57 | bwd_inner_microstep: 1347.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3684 [2024-06-10 04:20:45,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.92 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 04:20:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3791 [2024-06-10 04:20:50,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.64 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647 [2024-06-10 04:20:52,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1578.87 | bwd_inner_microstep: 1578.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982 [2024-06-10 04:20:53,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.12 | bwd_microstep: 738.66 | bwd_inner_microstep: 738.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-10 04:20:55,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.39 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 04:20:57,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.73 | bwd_microstep: 1558.28 | bwd_inner_microstep: 1558.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 04:20:58,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 800.60 | bwd_inner_microstep: 800.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 04:21:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1563.11 | bwd_inner_microstep: 1563.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 04:21:02,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 04:21:04,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.36 | bwd_microstep: 1406.54 | bwd_inner_microstep: 1406.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2244 [2024-06-10 04:21:05,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 968.90 | bwd_inner_microstep: 968.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3613 [2024-06-10 04:21:07,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.16 | bwd_microstep: 1648.88 | bwd_inner_microstep: 1648.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3768 [2024-06-10 04:21:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.95 | bwd_microstep: 1518.14 | bwd_inner_microstep: 1518.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 04:21:11,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.15 | bwd_microstep: 1367.45 | bwd_inner_microstep: 1367.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 04:21:13,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.38 | bwd_microstep: 1502.31 | bwd_inner_microstep: 1502.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:21:17,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 04:21:17,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 2780.79 | bwd_inner_microstep: 2020.79 | bwd_allreduce_microstep: 759.95 | step_microstep: 38.63 [2024-06-10 04:21:17,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16378.43 | bwd: 45059.52 | bwd_inner: 44298.60 | bwd_allreduce: 760.22 | step: 40.24 {'loss': 1.2737, 'learning_rate': 3.911619136062515e-05, 'epoch': 0.12} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 04:21:19,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.90 | bwd_microstep: 1473.27 | bwd_inner_microstep: 1473.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 04:21:21,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1250.20 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 04:21:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1491.38 | bwd_inner_microstep: 1491.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2256 [2024-06-10 04:21:24,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.99 | bwd_microstep: 834.63 | bwd_inner_microstep: 834.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 04:21:25,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.81 | bwd_microstep: 1148.72 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 04:21:27,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1413.26 | bwd_inner_microstep: 1413.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 04:21:29,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 04:21:31,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.15 | bwd_microstep: 1189.94 | bwd_inner_microstep: 1189.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1951 [2024-06-10 04:21:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.18 | bwd_microstep: 823.84 | bwd_inner_microstep: 823.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 04:21:34,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 04:21:36,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1446.81 | bwd_inner_microstep: 1446.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 04:21:38,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399 [2024-06-10 04:21:40,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1390.96 | bwd_inner_microstep: 1390.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 04:21:42,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.70 | bwd_microstep: 1610.95 | bwd_inner_microstep: 1610.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 04:21:44,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.56 | bwd_microstep: 1185.88 | bwd_inner_microstep: 1185.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 04:21:46,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1392.41 | bwd_inner_microstep: 1392.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 04:21:47,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.86 | bwd_microstep: 1399.91 | bwd_inner_microstep: 1399.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 04:21:50,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.48 | bwd_microstep: 1516.61 | bwd_inner_microstep: 1516.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 04:21:52,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.07 | bwd_microstep: 1558.42 | bwd_inner_microstep: 1558.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 04:21:54,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.59 | bwd_microstep: 1616.23 | bwd_inner_microstep: 1616.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 04:21:56,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1556.15 | bwd_inner_microstep: 1556.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:21:58,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1553.31 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998 [2024-06-10 04:21:59,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.98 | bwd_microstep: 740.15 | bwd_inner_microstep: 740.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3541 [2024-06-10 04:22:01,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.53 | bwd_microstep: 1231.97 | bwd_inner_microstep: 1231.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239 [2024-06-10 04:22:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.73 | bwd_microstep: 896.72 | bwd_inner_microstep: 896.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-10 04:22:04,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.70 | bwd_microstep: 1582.67 | bwd_inner_microstep: 1582.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 04:22:06,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1493.10 | bwd_inner_microstep: 1493.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2279 [2024-06-10 04:22:08,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.78 | bwd_microstep: 1072.67 | bwd_inner_microstep: 1072.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 04:22:10,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.84 | bwd_microstep: 1278.90 | bwd_inner_microstep: 1278.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580 [2024-06-10 04:22:12,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.65 | bwd_microstep: 1699.86 | bwd_inner_microstep: 1699.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3596 [2024-06-10 04:22:14,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.61 | bwd_microstep: 1569.57 | bwd_inner_microstep: 1569.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3764 [2024-06-10 04:22:18,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 04:22:18,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 3460.33 | bwd_inner_microstep: 1803.16 | bwd_allreduce_microstep: 1657.12 | step_microstep: 38.78 [2024-06-10 04:22:18,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16171.54 | bwd: 45029.67 | bwd_inner: 43371.60 | bwd_allreduce: 1657.35 | step: 40.39 206/1726 [3:38:46<26:36:17, 63.01s/it] 12%|█▏ | 207/1726 [3:39:48<26:22:40, 62.51s/it] 12%|█▏ | 207/1726 [3:39:48<26:22:40, 62.51s/it] 12%|█▏ | 208/1726 [3:40:48<26:08:35, 62.00s/it] 12%|█▏ | 208/1726 [3:40:48<26:08:35, 62.00s/it] 12%|█▏ | 209/1726 [3:41:51<26:15:11, 62.30s/it] 12%|█▏ | 209/1726 [3:41:51<26:15:11, 62.30s/it] 12%|█▏ | 210/1726 [3:42:52<25:58:27, 61.68s/it] 12%|█▏ | 210/1726 [3:42:52<25:58:27, 61.68s/it] 12%|█▏ | 211/1726 [3:43:53<25:58:16, 61.71s/it] 12%|█▏ | 211/1726 [3:43:53<25:58:16, 61.71s/it] 12%|█▏ | 212/1726 [3:44:55<25:55:59, 61.66s/it] {'loss': 1.3365, 'learning_rate': 3.9105123224822143e-05, 'epoch': 0.12} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3412 [2024-06-10 04:22:20,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1363.77 | bwd_inner_microstep: 1363.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3960 [2024-06-10 04:22:22,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.65 | bwd_microstep: 1595.09 | bwd_inner_microstep: 1595.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 04:22:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1387.95 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846 [2024-06-10 04:22:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.29 | bwd_microstep: 1662.46 | bwd_inner_microstep: 1662.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469 [2024-06-10 04:22:29,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 04:22:30,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1348.85 | bwd_inner_microstep: 1348.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732 [2024-06-10 04:22:32,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1438.36 | bwd_inner_microstep: 1438.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883 [2024-06-10 04:22:33,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.16 | bwd_microstep: 711.49 | bwd_inner_microstep: 711.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 04:22:36,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1640.93 | bwd_inner_microstep: 1640.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 04:22:38,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.91 | bwd_microstep: 1428.32 | bwd_inner_microstep: 1428.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663 [2024-06-10 04:22:40,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.06 | bwd_microstep: 1615.75 | bwd_inner_microstep: 1615.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425 [2024-06-10 04:22:42,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1395.04 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3497 [2024-06-10 04:22:44,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.94 | bwd_microstep: 1443.02 | bwd_inner_microstep: 1443.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 04:22:46,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1488.14 | bwd_inner_microstep: 1488.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2661 [2024-06-10 04:22:48,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.44 | bwd_microstep: 1214.77 | bwd_inner_microstep: 1214.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-10 04:22:50,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.30 | bwd_microstep: 1618.16 | bwd_inner_microstep: 1618.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 04:22:52,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1400.19 | bwd_inner_microstep: 1400.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2176 [2024-06-10 04:22:53,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.06 | bwd_microstep: 763.78 | bwd_inner_microstep: 763.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 04:22:55,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 04:22:57,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1386.81 | bwd_inner_microstep: 1386.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-10 04:22:58,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1302.01 | bwd_inner_microstep: 1301.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2019 [2024-06-10 04:22:59,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.06 | bwd_microstep: 714.23 | bwd_inner_microstep: 714.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 04:23:01,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1495.71 | bwd_inner_microstep: 1495.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2148 [2024-06-10 04:23:03,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.74 | bwd_microstep: 948.77 | bwd_inner_microstep: 948.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566 [2024-06-10 04:23:04,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.79 | bwd_microstep: 1206.29 | bwd_inner_microstep: 1206.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2427 [2024-06-10 04:23:06,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.99 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 04:23:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1254.19 | bwd_inner_microstep: 1254.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2072 [2024-06-10 04:23:09,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.42 | bwd_microstep: 850.84 | bwd_inner_microstep: 850.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2042 [2024-06-10 04:23:10,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 842.82 | bwd_inner_microstep: 842.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 04:23:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.59 | bwd_microstep: 1444.58 | bwd_inner_microstep: 1444.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812 [2024-06-10 04:23:14,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1384.50 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3442 [2024-06-10 04:23:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.26 | optimizer_step: 6.60 [2024-06-10 04:23:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 5833.15 | bwd_inner_microstep: 1762.40 | bwd_allreduce_microstep: 4070.69 | step_microstep: 38.87 [2024-06-10 04:23:20,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15600.59 | bwd: 45947.51 | bwd_inner: 41875.92 | bwd_allreduce: 4070.92 | step: 40.50 {'loss': 1.3109, 'learning_rate': 3.909398780087445e-05, 'epoch': 0.12} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453 [2024-06-10 04:23:22,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1281.16 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 04:23:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.67 | bwd_microstep: 1308.70 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 04:23:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795 [2024-06-10 04:23:28,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.06 | bwd_microstep: 1648.35 | bwd_inner_microstep: 1648.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 04:23:30,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.10 | bwd_microstep: 1275.88 | bwd_inner_microstep: 1275.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752 [2024-06-10 04:23:32,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.70 | bwd_microstep: 1638.43 | bwd_inner_microstep: 1638.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3478 [2024-06-10 04:23:34,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1215.65 | bwd_inner_microstep: 1215.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1890 [2024-06-10 04:23:35,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.76 | bwd_microstep: 745.47 | bwd_inner_microstep: 745.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2960 [2024-06-10 04:23:36,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.33 | bwd_microstep: 1070.97 | bwd_inner_microstep: 1070.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3683 [2024-06-10 04:23:38,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.01 | bwd_microstep: 1457.28 | bwd_inner_microstep: 1457.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671 [2024-06-10 04:23:40,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.44 | bwd_microstep: 1612.36 | bwd_inner_microstep: 1612.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-10 04:23:41,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 797.59 | bwd_inner_microstep: 797.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3493 [2024-06-10 04:23:44,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.03 | bwd_microstep: 1549.41 | bwd_inner_microstep: 1549.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515 [2024-06-10 04:23:45,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.19 | bwd_microstep: 1253.06 | bwd_inner_microstep: 1253.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 04:23:47,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 04:23:49,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 04:23:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.13 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-10 04:23:53,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1404.57 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2740 [2024-06-10 04:23:55,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.24 | bwd_microstep: 1043.29 | bwd_inner_microstep: 1043.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 04:23:57,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1496.48 | bwd_inner_microstep: 1496.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 04:23:59,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1398.78 | bwd_inner_microstep: 1398.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284 [2024-06-10 04:24:00,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.80 | bwd_microstep: 881.21 | bwd_inner_microstep: 881.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 04:24:02,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.62 | bwd_microstep: 1611.17 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3820 [2024-06-10 04:24:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1262.59 | bwd_inner_microstep: 1262.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 04:24:06,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 04:24:08,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1554.92 | bwd_inner_microstep: 1554.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 04:24:09,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.85 | bwd_microstep: 988.18 | bwd_inner_microstep: 988.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801 [2024-06-10 04:24:12,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.00 | bwd_microstep: 1450.02 | bwd_inner_microstep: 1449.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 04:24:14,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 04:24:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.89 | bwd_microstep: 972.35 | bwd_inner_microstep: 972.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 04:24:17,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1277.89 | bwd_inner_microstep: 1277.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3601 [2024-06-10 04:24:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 04:24:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.51 | bwd_microstep: 6063.38 | bwd_inner_microstep: 1791.65 | bwd_allreduce_microstep: 4271.68 | step_microstep: 38.71 [2024-06-10 04:24:23,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15940.05 | bwd: 46985.63 | bwd_inner: 42713.04 | bwd_allreduce: 4271.91 | step: 40.32 {'loss': 1.3205, 'learning_rate': 3.908278512800098e-05, 'epoch': 0.12} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 04:24:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.21 | bwd_microstep: 1234.84 | bwd_inner_microstep: 1234.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 04:24:26,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.72 | bwd_microstep: 696.73 | bwd_inner_microstep: 696.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 04:24:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 04:24:30,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 04:24:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-10 04:24:33,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.25 | bwd_microstep: 1215.58 | bwd_inner_microstep: 1215.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 04:24:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.29 | bwd_microstep: 1387.14 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-10 04:24:36,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 788.56 | bwd_inner_microstep: 788.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549 [2024-06-10 04:24:38,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.81 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503 [2024-06-10 04:24:40,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1224.21 | bwd_inner_microstep: 1224.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506 [2024-06-10 04:24:42,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-10 04:24:44,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1540.59 | bwd_inner_microstep: 1540.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 04:24:46,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 04:24:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.54 | bwd_microstep: 1486.40 | bwd_inner_microstep: 1486.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 04:24:50,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1418.68 | bwd_inner_microstep: 1418.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 04:24:52,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.01 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828 [2024-06-10 04:24:54,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1390.43 | bwd_inner_microstep: 1390.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947 [2024-06-10 04:24:56,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1200.35 | bwd_inner_microstep: 1200.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666 [2024-06-10 04:24:58,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1326.03 | bwd_inner_microstep: 1326.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 04:24:59,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1279.36 | bwd_inner_microstep: 1279.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 04:25:01,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1400.52 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 04:25:03,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1556.12 | bwd_inner_microstep: 1556.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 04:25:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1291.51 | bwd_inner_microstep: 1291.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 04:25:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819 [2024-06-10 04:25:10,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.71 | bwd_microstep: 1755.70 | bwd_inner_microstep: 1755.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381 [2024-06-10 04:25:12,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.37 | bwd_microstep: 1439.09 | bwd_inner_microstep: 1439.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-10 04:25:13,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1358.15 | bwd_inner_microstep: 1358.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3830 [2024-06-10 04:25:15,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1265.34 | bwd_inner_microstep: 1265.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567 [2024-06-10 04:25:17,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1531.46 | bwd_inner_microstep: 1531.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3570 [2024-06-10 04:25:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.16 | bwd_microstep: 1562.90 | bwd_inner_microstep: 1562.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 04:25:21,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 04:25:24,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.63 [2024-06-10 04:25:24,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 2356.42 | bwd_inner_microstep: 1565.74 | bwd_allreduce_microstep: 790.62 | step_microstep: 38.19 [2024-06-10 04:25:24,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16224.58 | bwd: 44181.74 | bwd_inner: 43390.19 | bwd_allreduce: 790.85 | step: 39.88 {'loss': 1.3079, 'learning_rate': 3.907151524565749e-05, 'epoch': 0.12} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 04:25:26,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1282.37 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 04:25:28,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1248.36 | bwd_inner_microstep: 1248.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 04:25:30,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1479.78 | bwd_inner_microstep: 1479.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-10 04:25:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.97 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4212 [2024-06-10 04:25:34,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1658.91 | bwd_inner_microstep: 1658.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4085 [2024-06-10 04:25:36,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1628.57 | bwd_inner_microstep: 1628.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 04:25:38,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1247.35 | bwd_inner_microstep: 1247.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-10 04:25:40,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1633.40 | bwd_inner_microstep: 1633.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-10 04:25:43,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.62 | bwd_microstep: 1635.02 | bwd_inner_microstep: 1634.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-10 04:25:45,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1427.83 | bwd_inner_microstep: 1427.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 04:25:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1287.36 | bwd_inner_microstep: 1287.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017 [2024-06-10 04:25:48,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.26 | bwd_microstep: 835.60 | bwd_inner_microstep: 835.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 04:25:49,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.86 | bwd_inner_microstep: 1286.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493 [2024-06-10 04:25:51,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2652 [2024-06-10 04:25:53,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.29 | bwd_microstep: 1023.14 | bwd_inner_microstep: 1023.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3445 [2024-06-10 04:25:55,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1412.59 | bwd_inner_microstep: 1412.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 04:25:57,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1339.71 | bwd_inner_microstep: 1339.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522 [2024-06-10 04:25:58,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1324.16 | bwd_inner_microstep: 1324.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137 [2024-06-10 04:26:00,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.76 | bwd_microstep: 834.12 | bwd_inner_microstep: 834.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-10 04:26:01,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.23 | bwd_microstep: 975.14 | bwd_inner_microstep: 975.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 04:26:03,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 04:26:05,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.45 | bwd_microstep: 1459.27 | bwd_inner_microstep: 1459.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 04:26:07,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.41 | bwd_microstep: 1461.26 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 04:26:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1461.29 | bwd_inner_microstep: 1461.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683 [2024-06-10 04:26:11,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1432.87 | bwd_inner_microstep: 1432.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694 [2024-06-10 04:26:13,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1433.70 | bwd_inner_microstep: 1433.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-10 04:26:15,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.14 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596 [2024-06-10 04:26:17,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1434.16 | bwd_inner_microstep: 1434.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2353 [2024-06-10 04:26:18,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.46 | bwd_microstep: 1025.73 | bwd_inner_microstep: 1025.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3738 [2024-06-10 04:26:21,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1559.73 | bwd_inner_microstep: 1559.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570 [2024-06-10 04:26:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.49 | bwd_microstep: 1597.65 | bwd_inner_microstep: 1597.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 04:26:25,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-10 04:26:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1529.89 | bwd_inner_microstep: 1522.19 | bwd_allreduce_microstep: 7.65 | step_microstep: 38.22 [2024-06-10 04:26:25,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16417.57 | bwd: 43880.13 | bwd_inner: 43871.56 | bwd_allreduce: 7.88 | step: 39.86 {'loss': 1.3158, 'learning_rate': 3.906017819353645e-05, 'epoch': 0.13} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3384 [2024-06-10 04:26:27,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.77 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4044 [2024-06-10 04:26:29,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.76 | bwd_microstep: 1719.26 | bwd_inner_microstep: 1719.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 04:26:31,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.12 | bwd_microstep: 1378.66 | bwd_inner_microstep: 1378.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060 [2024-06-10 04:26:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 815.12 | bwd_inner_microstep: 815.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 04:26:35,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.01 | bwd_microstep: 1654.44 | bwd_inner_microstep: 1654.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-10 04:26:36,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.97 | bwd_microstep: 819.93 | bwd_inner_microstep: 819.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 04:26:37,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.68 | bwd_microstep: 1151.58 | bwd_inner_microstep: 1151.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2461 [2024-06-10 04:26:39,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.91 | bwd_microstep: 950.25 | bwd_inner_microstep: 950.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 04:26:41,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1388.04 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 04:26:42,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.36 | bwd_microstep: 1193.65 | bwd_inner_microstep: 1193.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 04:26:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1250.52 | bwd_inner_microstep: 1250.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503 [2024-06-10 04:26:46,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1438.48 | bwd_inner_microstep: 1438.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-10 04:26:48,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.77 | bwd_microstep: 1317.96 | bwd_inner_microstep: 1317.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941 [2024-06-10 04:26:49,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.48 | bwd_microstep: 886.32 | bwd_inner_microstep: 886.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-10 04:26:51,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1708.48 | bwd_inner_microstep: 1708.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381 [2024-06-10 04:26:53,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1242.00 | bwd_inner_microstep: 1241.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960 [2024-06-10 04:26:54,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.16 | bwd_microstep: 824.53 | bwd_inner_microstep: 824.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832 [2024-06-10 04:26:56,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1388.88 | bwd_inner_microstep: 1388.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466 [2024-06-10 04:26:58,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2099 [2024-06-10 04:26:59,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.89 | bwd_microstep: 730.59 | bwd_inner_microstep: 730.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 04:27:01,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1252.48 | bwd_inner_microstep: 1252.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2984 [2024-06-10 04:27:02,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.94 | bwd_microstep: 1297.52 | bwd_inner_microstep: 1297.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2000 [2024-06-10 04:27:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.16 | bwd_microstep: 831.65 | bwd_inner_microstep: 831.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3626 [2024-06-10 04:27:06,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.87 | bwd_microstep: 1576.82 | bwd_inner_microstep: 1576.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2287 [2024-06-10 04:27:07,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.38 | bwd_microstep: 815.12 | bwd_inner_microstep: 815.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-10 04:27:09,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1309.76 | bwd_inner_microstep: 1309.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-10 04:27:10,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.23 | bwd_microstep: 815.89 | bwd_inner_microstep: 815.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 04:27:12,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1381.61 | bwd_inner_microstep: 1381.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 04:27:14,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2240 [2024-06-10 04:27:15,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-10 04:27:17,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.77 | bwd_microstep: 1305.69 | bwd_inner_microstep: 1305.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-10 04:27:24,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 04:27:24,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 7095.15 | bwd_inner_microstep: 1463.54 | bwd_allreduce_microstep: 5631.55 | step_microstep: 38.69 [2024-06-10 04:27:24,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14608.84 | bwd: 44634.63 | bwd_inner: 39002.12 | bwd_allreduce: 5631.84 | step: 40.37 {'loss': 1.3252, 'learning_rate': 3.9048774011566906e-05, 'epoch': 0.13} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 04:27:26,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1367.25 | bwd_inner_microstep: 1367.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4414 [2024-06-10 04:27:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.07 | bwd_microstep: 1814.42 | bwd_inner_microstep: 1814.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294 [2024-06-10 04:27:30,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.61 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 04:27:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.77 | bwd_microstep: 1493.86 | bwd_inner_microstep: 1493.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475 [2024-06-10 04:27:34,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1341.59 | bwd_inner_microstep: 1341.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 04:27:36,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2259 [2024-06-10 04:27:37,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 872.42 | bwd_inner_microstep: 872.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678 [2024-06-10 04:27:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1426.34 | bwd_inner_microstep: 1426.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 04:27:41,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.13 | bwd_microstep: 1518.69 | bwd_inner_microstep: 1518.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 04:27:43,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1387.91 | bwd_inner_microstep: 1387.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417 [2024-06-10 04:27:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1313.53 | bwd_inner_microstep: 1313.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516 [2024-06-10 04:27:47,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.49 | bwd_microstep: 1587.46 | bwd_inner_microstep: 1587.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439 [2024-06-10 04:27:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.48 | bwd_microstep: 950.16 | bwd_inner_microstep: 950.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637 [2024-06-10 04:27:51,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.30 | bwd_microstep: 1546.03 | bwd_inner_microstep: 1546.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 04:27:52,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1286.23 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3890 [2024-06-10 04:27:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 04:27:57,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.90 | bwd_microstep: 1487.98 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-10 04:27:58,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.37 | bwd_microstep: 919.19 | bwd_inner_microstep: 919.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709 [2024-06-10 04:28:00,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1338.49 | bwd_inner_microstep: 1338.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071 [2024-06-10 04:28:01,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.24 | bwd_microstep: 916.55 | bwd_inner_microstep: 916.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617 [2024-06-10 04:28:03,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1512.20 | bwd_inner_microstep: 1512.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 04:28:04,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 805.23 | bwd_inner_microstep: 805.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 04:28:06,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.76 | bwd_microstep: 1362.47 | bwd_inner_microstep: 1362.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454 [2024-06-10 04:28:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.91 | bwd_microstep: 1193.22 | bwd_inner_microstep: 1193.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2047 [2024-06-10 04:28:09,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.60 | bwd_microstep: 938.45 | bwd_inner_microstep: 938.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430 [2024-06-10 04:28:11,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2184 [2024-06-10 04:28:12,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.81 | bwd_microstep: 796.97 | bwd_inner_microstep: 796.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2062 [2024-06-10 04:28:13,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 914.69 | bwd_inner_microstep: 914.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 04:28:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1561.00 | bwd_inner_microstep: 1560.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2269 [2024-06-10 04:28:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.90 | bwd_microstep: 1068.00 | bwd_inner_microstep: 1067.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236 [2024-06-10 04:28:18,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.30 | bwd_microstep: 963.54 | bwd_inner_microstep: 963.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3801 [2024-06-10 04:28:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 04:28:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.32 | bwd_microstep: 6451.15 | bwd_inner_microstep: 1923.05 | bwd_allreduce_microstep: 4528.05 | step_microstep: 38.79 [2024-06-10 04:28:25,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15164.67 | bwd: 45263.75 | bwd_inner: 40734.71 | bwd_allreduce: 4528.33 | step: 40.43 12%|█▏ | 212/1726 [3:44:55<25:55:59, 61.66s/it] 12%|█▏ | 213/1726 [3:45:57<25:56:43, 61.73s/it] 12%|█▏ | 213/1726 [3:45:57<25:56:43, 61.73s/it] 12%|█▏ | 214/1726 [3:47:00<26:07:19, 62.20s/it] 12%|█▏ | 214/1726 [3:47:00<26:07:19, 62.20s/it] 12%|█▏ | 215/1726 [3:48:01<25:55:21, 61.76s/it] 12%|█▏ | 215/1726 [3:48:01<25:55:21, 61.76s/it] 13%|█▎ | 216/1726 [3:49:02<25:45:53, 61.43s/it] 13%|█▎ | 216/1726 [3:49:02<25:45:53, 61.43s/it] 13%|█▎ | 217/1726 [3:50:01<25:31:01, 60.88s/it] 13%|█▎ | 217/1726 [3:50:01<25:31:01, 60.88s/it] 13%|█▎ | 218/1726 [3:51:02<25:29:13, 60.84s/it] {'loss': 1.3438, 'learning_rate': 3.9037302739914306e-05, 'epoch': 0.13} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 04:28:27,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1375.92 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869 [2024-06-10 04:28:29,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.19 | bwd_microstep: 1655.73 | bwd_inner_microstep: 1655.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 04:28:31,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.97 | bwd_microstep: 1471.81 | bwd_inner_microstep: 1471.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977 [2024-06-10 04:28:33,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.88 | bwd_microstep: 829.96 | bwd_inner_microstep: 829.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215 [2024-06-10 04:28:34,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.94 | bwd_microstep: 955.98 | bwd_inner_microstep: 955.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 04:28:36,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1250.11 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 964 [2024-06-10 04:28:36,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.30 | bwd_microstep: 418.36 | bwd_inner_microstep: 418.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 04:28:38,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-10 04:28:40,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.48 | bwd_microstep: 1154.15 | bwd_inner_microstep: 1154.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501 [2024-06-10 04:28:41,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.95 | bwd_microstep: 1220.26 | bwd_inner_microstep: 1220.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454 [2024-06-10 04:28:43,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.31 | bwd_microstep: 1315.74 | bwd_inner_microstep: 1315.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 04:28:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.80 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002 [2024-06-10 04:28:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.26 | bwd_microstep: 900.35 | bwd_inner_microstep: 900.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 04:28:48,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.17 | bwd_microstep: 1620.93 | bwd_inner_microstep: 1620.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-10 04:28:49,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.83 | bwd_microstep: 700.54 | bwd_inner_microstep: 700.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-10 04:28:51,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1459.11 | bwd_inner_microstep: 1459.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-10 04:28:54,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.88 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702 [2024-06-10 04:28:55,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1331.99 | bwd_inner_microstep: 1331.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 04:28:57,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1413.17 | bwd_inner_microstep: 1413.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461 [2024-06-10 04:28:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.47 | bwd_microstep: 1217.96 | bwd_inner_microstep: 1217.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 04:29:01,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1408.38 | bwd_inner_microstep: 1408.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 04:29:03,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.30 | bwd_microstep: 1258.14 | bwd_inner_microstep: 1258.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-10 04:29:05,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.64 | bwd_microstep: 1613.41 | bwd_inner_microstep: 1613.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3726 [2024-06-10 04:29:07,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.39 | bwd_microstep: 1242.64 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-10 04:29:09,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1416.29 | bwd_inner_microstep: 1416.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3555 [2024-06-10 04:29:11,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1444.94 | bwd_inner_microstep: 1444.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809 [2024-06-10 04:29:13,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1581.03 | bwd_inner_microstep: 1581.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3599 [2024-06-10 04:29:14,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.29 | bwd_microstep: 1213.59 | bwd_inner_microstep: 1213.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3814 [2024-06-10 04:29:17,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.07 | bwd_microstep: 1850.41 | bwd_inner_microstep: 1850.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 04:29:19,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1505.97 | bwd_inner_microstep: 1505.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 04:29:21,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.22 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 04:29:28,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 04:29:28,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.59 | bwd_microstep: 6117.96 | bwd_inner_microstep: 1804.80 | bwd_allreduce_microstep: 4313.10 | step_microstep: 38.84 [2024-06-10 04:29:28,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15769.53 | bwd: 46566.28 | bwd_inner: 42252.22 | bwd_allreduce: 4313.36 | step: 40.46 {'loss': 1.3661, 'learning_rate': 3.9025764418980426e-05, 'epoch': 0.13} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 04:29:30,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1247.82 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4062 [2024-06-10 04:29:32,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1714.96 | bwd_inner_microstep: 1714.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2388 [2024-06-10 04:29:33,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.16 | bwd_microstep: 999.41 | bwd_inner_microstep: 999.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816 [2024-06-10 04:29:35,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.63 | bwd_microstep: 1355.79 | bwd_inner_microstep: 1355.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 04:29:37,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-10 04:29:38,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.98 | bwd_microstep: 791.34 | bwd_inner_microstep: 791.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 04:29:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.44 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067 [2024-06-10 04:29:41,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 818.13 | bwd_inner_microstep: 818.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-10 04:29:42,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.80 | bwd_microstep: 803.93 | bwd_inner_microstep: 803.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085 [2024-06-10 04:29:43,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.42 | bwd_microstep: 825.35 | bwd_inner_microstep: 825.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478 [2024-06-10 04:29:45,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.95 | bwd_microstep: 1341.36 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114 [2024-06-10 04:29:47,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.80 | bwd_microstep: 926.18 | bwd_inner_microstep: 926.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3705 [2024-06-10 04:29:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1487.51 | bwd_inner_microstep: 1487.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-10 04:29:51,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1455.19 | bwd_inner_microstep: 1455.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-10 04:29:52,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.52 | bwd_microstep: 890.20 | bwd_inner_microstep: 890.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 04:29:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 04:29:56,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.26 | bwd_microstep: 1524.06 | bwd_inner_microstep: 1524.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-10 04:29:58,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.99 | bwd_microstep: 1706.60 | bwd_inner_microstep: 1706.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919 [2024-06-10 04:29:59,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.01 | bwd_microstep: 719.15 | bwd_inner_microstep: 719.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3629 [2024-06-10 04:30:01,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.55 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-10 04:30:03,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1318.48 | bwd_inner_microstep: 1318.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 04:30:05,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.33 | bwd_microstep: 1452.05 | bwd_inner_microstep: 1452.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-10 04:30:07,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.95 | bwd_microstep: 1631.11 | bwd_inner_microstep: 1631.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760 [2024-06-10 04:30:10,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.20 | bwd_microstep: 1645.49 | bwd_inner_microstep: 1645.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-10 04:30:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.68 | bwd_microstep: 1541.01 | bwd_inner_microstep: 1540.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 04:30:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1552.61 | bwd_inner_microstep: 1552.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4067 [2024-06-10 04:30:16,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.15 | bwd_microstep: 1658.01 | bwd_inner_microstep: 1657.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-10 04:30:18,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1406.99 | bwd_inner_microstep: 1406.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3833 [2024-06-10 04:30:20,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.33 | bwd_microstep: 1727.04 | bwd_inner_microstep: 1727.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 04:30:23,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.60 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607 [2024-06-10 04:30:24,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.46 | bwd_microstep: 1313.82 | bwd_inner_microstep: 1313.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3587 [2024-06-10 04:30:30,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.46 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 04:30:30,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 5247.90 | bwd_inner_microstep: 1536.23 | bwd_allreduce_microstep: 3711.61 | step_microstep: 39.42 [2024-06-10 04:30:30,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15768.84 | bwd: 46124.96 | bwd_inner: 42412.37 | bwd_allreduce: 3711.88 | step: 41.02 {'loss': 1.3115, 'learning_rate': 3.9014159089403167e-05, 'epoch': 0.13} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 04:30:32,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.41 | bwd_microstep: 1327.54 | bwd_inner_microstep: 1327.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 04:30:34,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891 [2024-06-10 04:30:36,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1583.53 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503 [2024-06-10 04:30:38,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.75 | bwd_microstep: 1219.96 | bwd_inner_microstep: 1219.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3798 [2024-06-10 04:30:40,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.48 | bwd_microstep: 1314.99 | bwd_inner_microstep: 1314.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 04:30:41,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1347.35 | bwd_inner_microstep: 1347.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 04:30:43,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.33 | bwd_microstep: 1279.98 | bwd_inner_microstep: 1279.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895 [2024-06-10 04:30:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.70 | bwd_microstep: 778.33 | bwd_inner_microstep: 778.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-10 04:30:46,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 04:30:48,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1382.43 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 04:30:50,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1347.15 | bwd_inner_microstep: 1347.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:30:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1383.20 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002 [2024-06-10 04:30:53,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.43 | bwd_microstep: 901.05 | bwd_inner_microstep: 901.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450 [2024-06-10 04:30:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.97 | bwd_microstep: 1282.32 | bwd_inner_microstep: 1282.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2154 [2024-06-10 04:30:56,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.83 | bwd_microstep: 786.49 | bwd_inner_microstep: 786.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 04:30:58,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1389.14 | bwd_inner_microstep: 1389.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 04:31:00,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1296.32 | bwd_inner_microstep: 1296.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 04:31:01,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.31 | bwd_microstep: 1187.95 | bwd_inner_microstep: 1187.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 614 [2024-06-10 04:31:02,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.94 | bwd_microstep: 261.53 | bwd_inner_microstep: 261.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3622 [2024-06-10 04:31:03,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.95 | bwd_microstep: 1216.61 | bwd_inner_microstep: 1216.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136 [2024-06-10 04:31:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.97 | bwd_microstep: 834.75 | bwd_inner_microstep: 834.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544 [2024-06-10 04:31:06,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1231.26 | bwd_inner_microstep: 1231.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817 [2024-06-10 04:31:08,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.77 | bwd_microstep: 1584.89 | bwd_inner_microstep: 1584.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 04:31:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1347.80 | bwd_inner_microstep: 1347.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 04:31:12,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.27 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975 [2024-06-10 04:31:13,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.58 | bwd_microstep: 766.93 | bwd_inner_microstep: 766.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4032 [2024-06-10 04:31:15,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1453.40 | bwd_inner_microstep: 1453.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637 [2024-06-10 04:31:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.50 | bwd_microstep: 1540.10 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 04:31:20,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804 [2024-06-10 04:31:22,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.66 | bwd_microstep: 1758.32 | bwd_inner_microstep: 1758.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3384 [2024-06-10 04:31:24,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.75 | bwd_microstep: 1273.99 | bwd_inner_microstep: 1273.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 04:31:30,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 04:31:30,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 5731.03 | bwd_inner_microstep: 1700.77 | bwd_allreduce_microstep: 4030.21 | step_microstep: 38.64 [2024-06-10 04:31:30,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15122.12 | bwd: 44432.96 | bwd_inner: 40401.81 | bwd_allreduce: 4030.45 | step: 40.32 {'loss': 1.3138, 'learning_rate': 3.900248679205644e-05, 'epoch': 0.13} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3380 [2024-06-10 04:31:32,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.18 | bwd_microstep: 1236.03 | bwd_inner_microstep: 1236.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 04:31:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.26 | bwd_microstep: 788.58 | bwd_inner_microstep: 788.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3980 [2024-06-10 04:31:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1608.12 | bwd_inner_microstep: 1608.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 04:31:37,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 04:31:38,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.81 | bwd_microstep: 874.62 | bwd_inner_microstep: 874.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3413 [2024-06-10 04:31:40,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1394.38 | bwd_inner_microstep: 1394.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 04:31:42,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 04:31:43,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 792.14 | bwd_inner_microstep: 792.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 04:31:45,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1527.54 | bwd_inner_microstep: 1527.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 04:31:46,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 809.85 | bwd_inner_microstep: 809.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 04:31:48,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1399.62 | bwd_inner_microstep: 1399.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 04:31:49,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.89 | bwd_microstep: 727.83 | bwd_inner_microstep: 727.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 04:31:51,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1345.39 | bwd_inner_microstep: 1345.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 04:31:53,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640 [2024-06-10 04:31:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.44 | bwd_microstep: 1575.22 | bwd_inner_microstep: 1575.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 04:31:57,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.56 | bwd_inner_microstep: 1349.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 04:31:59,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.53 | bwd_microstep: 1391.04 | bwd_inner_microstep: 1391.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 04:32:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.11 | bwd_microstep: 1391.63 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 04:32:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1160.64 | bwd_inner_microstep: 1160.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 04:32:04,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.97 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 04:32:06,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1288.58 | bwd_inner_microstep: 1288.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1978 [2024-06-10 04:32:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.71 | bwd_microstep: 708.16 | bwd_inner_microstep: 708.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 04:32:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1253.83 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 04:32:11,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 04:32:13,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1548.67 | bwd_inner_microstep: 1548.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 04:32:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.51 | bwd_microstep: 805.24 | bwd_inner_microstep: 805.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603 [2024-06-10 04:32:16,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1469.05 | bwd_inner_microstep: 1469.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 04:32:18,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1434.87 | bwd_inner_microstep: 1434.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815 [2024-06-10 04:32:20,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 04:32:23,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.22 | bwd_inner_microstep: 1553.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 04:32:25,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1501.48 | bwd_inner_microstep: 1501.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 04:32:32,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 04:32:32,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.59 | bwd_microstep: 7174.71 | bwd_inner_microstep: 1857.92 | bwd_allreduce_microstep: 5316.73 | step_microstep: 38.65 [2024-06-10 04:32:32,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15392.29 | bwd: 46669.94 | bwd_inner: 41352.28 | bwd_allreduce: 5316.96 | step: 40.22 {'loss': 1.3859, 'learning_rate': 3.8990747568050016e-05, 'epoch': 0.13} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-10 04:32:34,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.79 | bwd_microstep: 1303.33 | bwd_inner_microstep: 1303.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 04:32:36,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.17 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 04:32:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.90 | bwd_microstep: 1243.66 | bwd_inner_microstep: 1243.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238 [2024-06-10 04:32:39,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.40 | bwd_microstep: 962.11 | bwd_inner_microstep: 962.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 04:32:41,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.33 | bwd_microstep: 1287.95 | bwd_inner_microstep: 1287.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 04:32:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1385.95 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 04:32:44,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 794.98 | bwd_inner_microstep: 794.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881 [2024-06-10 04:32:45,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.91 | bwd_microstep: 680.11 | bwd_inner_microstep: 680.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401 [2024-06-10 04:32:47,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.14 | bwd_microstep: 1185.12 | bwd_inner_microstep: 1185.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3409 [2024-06-10 04:32:48,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1295.99 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-10 04:32:50,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1346.69 | bwd_inner_microstep: 1346.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 04:32:52,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1495.07 | bwd_inner_microstep: 1495.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003 [2024-06-10 04:32:54,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.52 | bwd_microstep: 898.18 | bwd_inner_microstep: 898.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3650 [2024-06-10 04:32:56,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.46 | bwd_microstep: 1472.26 | bwd_inner_microstep: 1472.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 04:32:58,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1477.43 | bwd_inner_microstep: 1477.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3638 [2024-06-10 04:33:00,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.36 | bwd_microstep: 1553.52 | bwd_inner_microstep: 1553.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2414 [2024-06-10 04:33:01,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.32 | bwd_microstep: 936.19 | bwd_inner_microstep: 936.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3526 [2024-06-10 04:33:03,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1559.80 | bwd_inner_microstep: 1559.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 04:33:05,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1558.59 | bwd_inner_microstep: 1558.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2166 [2024-06-10 04:33:07,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.58 | bwd_microstep: 951.99 | bwd_inner_microstep: 951.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640 [2024-06-10 04:33:09,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 04:33:10,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 807.64 | bwd_inner_microstep: 807.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 04:33:12,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1554.86 | bwd_inner_microstep: 1554.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2420 [2024-06-10 04:33:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.94 | bwd_microstep: 968.47 | bwd_inner_microstep: 968.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688 [2024-06-10 04:33:15,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1433.42 | bwd_inner_microstep: 1433.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3610 [2024-06-10 04:33:18,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.89 | bwd_microstep: 1656.38 | bwd_inner_microstep: 1656.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2279 [2024-06-10 04:33:19,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.87 | bwd_microstep: 1008.87 | bwd_inner_microstep: 1008.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 04:33:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1350.92 | bwd_inner_microstep: 1350.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429 [2024-06-10 04:33:22,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.41 | bwd_microstep: 1156.32 | bwd_inner_microstep: 1156.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 04:33:24,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 04:33:26,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1280.44 | bwd_inner_microstep: 1280.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 04:33:35,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.26 | optimizer_step: 6.57 [2024-06-10 04:33:35,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 7850.80 | bwd_inner_microstep: 1691.04 | bwd_allreduce_microstep: 6159.70 | step_microstep: 38.81 [2024-06-10 04:33:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15109.70 | bwd: 46672.46 | bwd_inner: 40511.81 | bwd_allreduce: 6159.95 | step: 40.43 {'loss': 1.309, 'learning_rate': 3.897894145872939e-05, 'epoch': 0.13} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 04:33:36,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 784.68 | bwd_inner_microstep: 784.53 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3896 [2024-06-10 04:33:38,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.95 | bwd_microstep: 1385.85 | bwd_inner_microstep: 1385.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 04:33:39,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3775 [2024-06-10 04:33:41,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1402.35 | bwd_inner_microstep: 1402.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3233 [2024-06-10 04:33:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.78 | bwd_microstep: 1209.25 | bwd_inner_microstep: 1209.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:33:45,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1383.54 | bwd_inner_microstep: 1383.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 04:33:47,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1280.85 | bwd_inner_microstep: 1280.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 04:33:49,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1288.00 | bwd_inner_microstep: 1287.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 04:33:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1478.21 | bwd_inner_microstep: 1478.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 04:33:52,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.89 | bwd_microstep: 1348.32 | bwd_inner_microstep: 1348.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1874 [2024-06-10 04:33:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.36 | bwd_microstep: 771.04 | bwd_inner_microstep: 771.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398 [2024-06-10 04:33:56,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 04:33:57,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1340.52 | bwd_inner_microstep: 1340.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 04:33:59,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3323 [2024-06-10 04:34:01,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1294.83 | bwd_inner_microstep: 1294.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3651 [2024-06-10 04:34:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.91 | bwd_microstep: 1574.96 | bwd_inner_microstep: 1574.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-10 04:34:05,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437 [2024-06-10 04:34:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.14 | bwd_microstep: 1313.90 | bwd_inner_microstep: 1313.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 04:34:09,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 04:34:11,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.91 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176 [2024-06-10 04:34:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.10 | bwd_microstep: 856.33 | bwd_inner_microstep: 856.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3546 [2024-06-10 04:34:14,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.77 | bwd_microstep: 1356.67 | bwd_inner_microstep: 1356.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-10 04:34:16,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.89 | bwd_microstep: 1317.59 | bwd_inner_microstep: 1317.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 04:34:18,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1405.77 | bwd_inner_microstep: 1405.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 04:34:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1397.53 | bwd_inner_microstep: 1397.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782 [2024-06-10 04:34:21,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 04:34:23,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.36 | bwd_microstep: 1285.61 | bwd_inner_microstep: 1285.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-10 04:34:25,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1300.34 | bwd_inner_microstep: 1300.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033 [2024-06-10 04:34:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.32 | bwd_microstep: 839.51 | bwd_inner_microstep: 839.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566 [2024-06-10 04:34:28,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1545.82 | bwd_inner_microstep: 1545.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2309 [2024-06-10 04:34:30,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.13 | bwd_microstep: 884.37 | bwd_inner_microstep: 884.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 04:34:37,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-10 04:34:37,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 6615.26 | bwd_inner_microstep: 1692.59 | bwd_allreduce_microstep: 4922.62 | step_microstep: 38.69 [2024-06-10 04:34:37,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15513.18 | bwd: 46309.56 | bwd_inner: 41385.92 | bwd_allreduce: 4922.90 | step: 40.40 13%|█▎ | 218/1726 [3:51:02<25:29:13, 60.84s/it] 13%|█▎ | 219/1726 [3:52:05<25:42:05, 61.40s/it] 13%|█▎ | 219/1726 [3:52:05<25:42:05, 61.40s/it] 13%|█▎ | 220/1726 [3:53:07<25:47:25, 61.65s/it] 13%|█▎ | 220/1726 [3:53:07<25:47:25, 61.65s/it] 13%|█▎ | 221/1726 [3:54:07<25:33:10, 61.12s/it] 13%|█▎ | 221/1726 [3:54:07<25:33:10, 61.12s/it] 13%|█▎ | 222/1726 [3:55:09<25:41:46, 61.51s/it] 13%|█▎ | 222/1726 [3:55:09<25:41:46, 61.51s/it] 13%|█▎ | 223/1726 [3:56:11<25:45:22, 61.69s/it] 13%|█▎ | 223/1726 [3:56:11<25:45:22, 61.69s/it] 13%|█▎ | 224{'loss': 1.2389, 'learning_rate': 3.8967068505675594e-05, 'epoch': 0.13} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1905 [2024-06-10 04:34:38,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 803.81 | bwd_inner_microstep: 803.66 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3588 [2024-06-10 04:34:40,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1455.91 | bwd_inner_microstep: 1455.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741 [2024-06-10 04:34:42,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.57 | bwd_microstep: 1640.73 | bwd_inner_microstep: 1640.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 04:34:43,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.58 | bwd_microstep: 681.29 | bwd_inner_microstep: 681.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932 [2024-06-10 04:34:44,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-10 04:34:46,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.39 | bwd_microstep: 1305.95 | bwd_inner_microstep: 1305.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 04:34:48,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 04:34:49,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.55 | bwd_microstep: 801.93 | bwd_inner_microstep: 801.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3125 [2024-06-10 04:34:50,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.68 | bwd_microstep: 1152.61 | bwd_inner_microstep: 1152.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494 [2024-06-10 04:34:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1416.88 | bwd_inner_microstep: 1416.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 04:34:53,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 726.66 | bwd_inner_microstep: 726.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 04:34:55,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.76 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 04:34:57,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1248.19 | bwd_inner_microstep: 1248.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1852 [2024-06-10 04:34:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.17 | bwd_microstep: 673.33 | bwd_inner_microstep: 673.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-10 04:35:00,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.43 | bwd_microstep: 1515.75 | bwd_inner_microstep: 1515.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 04:35:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510 [2024-06-10 04:35:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.59 | bwd_microstep: 1223.29 | bwd_inner_microstep: 1223.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 04:35:05,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.19 | bwd_microstep: 1288.30 | bwd_inner_microstep: 1288.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453 [2024-06-10 04:35:07,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.31 | bwd_microstep: 1319.55 | bwd_inner_microstep: 1319.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912 [2024-06-10 04:35:08,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.74 | bwd_microstep: 687.87 | bwd_inner_microstep: 687.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3444 [2024-06-10 04:35:10,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1318.88 | bwd_inner_microstep: 1318.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 04:35:12,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1296.97 | bwd_inner_microstep: 1296.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724 [2024-06-10 04:35:14,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1338.23 | bwd_inner_microstep: 1338.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534 [2024-06-10 04:35:15,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1328.35 | bwd_inner_microstep: 1328.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 04:35:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.10 | bwd_microstep: 1463.68 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3450 [2024-06-10 04:35:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1316.54 | bwd_inner_microstep: 1316.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2245 [2024-06-10 04:35:20,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 873.27 | bwd_inner_microstep: 873.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3748 [2024-06-10 04:35:23,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.28 | bwd_microstep: 1675.37 | bwd_inner_microstep: 1675.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3693 [2024-06-10 04:35:25,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.83 | bwd_microstep: 1606.16 | bwd_inner_microstep: 1606.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 04:35:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1411.48 | bwd_inner_microstep: 1411.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 04:35:29,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1507.13 | bwd_inner_microstep: 1507.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3461 [2024-06-10 04:35:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 04:35:38,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 8432.23 | bwd_inner_microstep: 1607.91 | bwd_allreduce_microstep: 6824.27 | step_microstep: 38.81 [2024-06-10 04:35:38,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14743.52 | bwd: 46175.84 | bwd_inner: 39350.55 | bwd_allreduce: 6824.55 | step: 40.39 {'loss': 1.3663, 'learning_rate': 3.895512875070513e-05, 'epoch': 0.13} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2977 [2024-06-10 04:35:40,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.32 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391 [2024-06-10 04:35:41,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.05 | bwd_microstep: 1144.73 | bwd_inner_microstep: 1144.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 04:35:43,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1375.51 | bwd_inner_microstep: 1375.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-10 04:35:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.97 | bwd_microstep: 677.97 | bwd_inner_microstep: 677.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825 [2024-06-10 04:35:46,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.03 | bwd_microstep: 1385.13 | bwd_inner_microstep: 1385.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2246 [2024-06-10 04:35:47,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.88 | bwd_microstep: 899.66 | bwd_inner_microstep: 899.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-10 04:35:49,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.62 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 04:35:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.08 | bwd_microstep: 1397.50 | bwd_inner_microstep: 1397.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1388 [2024-06-10 04:35:52,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 214.89 | bwd_microstep: 558.02 | bwd_inner_microstep: 558.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945 [2024-06-10 04:35:53,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.85 | bwd_microstep: 698.28 | bwd_inner_microstep: 698.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 04:35:55,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.85 | bwd_microstep: 1387.63 | bwd_inner_microstep: 1387.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1950 [2024-06-10 04:35:56,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.78 | bwd_microstep: 762.80 | bwd_inner_microstep: 762.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2932 [2024-06-10 04:35:58,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1160.26 | bwd_inner_microstep: 1160.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3426 [2024-06-10 04:35:59,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.57 | bwd_microstep: 1218.80 | bwd_inner_microstep: 1218.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 04:36:01,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.24 | bwd_inner_microstep: 1379.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 04:36:03,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.78 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-10 04:36:05,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1529.87 | bwd_inner_microstep: 1529.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647 [2024-06-10 04:36:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293 [2024-06-10 04:36:09,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.71 | bwd_microstep: 979.34 | bwd_inner_microstep: 979.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 04:36:11,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1399.84 | bwd_inner_microstep: 1399.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503 [2024-06-10 04:36:12,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.35 | bwd_microstep: 1191.54 | bwd_inner_microstep: 1191.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 04:36:14,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.35 | bwd_microstep: 1301.87 | bwd_inner_microstep: 1301.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485 [2024-06-10 04:36:16,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1350.30 | bwd_inner_microstep: 1350.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 04:36:18,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1158.43 | bwd_inner_microstep: 1158.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465 [2024-06-10 04:36:19,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.26 | bwd_microstep: 1185.13 | bwd_inner_microstep: 1185.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 04:36:21,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.92 | bwd_microstep: 1400.12 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 04:36:23,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510 [2024-06-10 04:36:25,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1195.96 | bwd_inner_microstep: 1195.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3585 [2024-06-10 04:36:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.75 | bwd_microstep: 1574.10 | bwd_inner_microstep: 1574.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 04:36:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.58 | bwd_microstep: 1440.33 | bwd_inner_microstep: 1440.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899 [2024-06-10 04:36:30,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.26 | bwd_microstep: 778.13 | bwd_inner_microstep: 778.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 04:36:39,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 04:36:39,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.40 | bwd_microstep: 7966.67 | bwd_inner_microstep: 1858.63 | bwd_allreduce_microstep: 6107.99 | step_microstep: 38.74 [2024-06-10 04:36:39,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14783.68 | bwd: 45521.57 | bwd_inner: 39412.66 | bwd_allreduce: 6108.22 | step: 40.38 {'loss': 1.3638, 'learning_rate': 3.894312223586974e-05, 'epoch': 0.13} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1936 [2024-06-10 04:36:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.46 | bwd_microstep: 810.05 | bwd_inner_microstep: 809.90 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 04:36:42,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1379.42 | bwd_inner_microstep: 1379.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849 [2024-06-10 04:36:44,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.27 | bwd_microstep: 1558.17 | bwd_inner_microstep: 1558.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-10 04:36:46,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1646.03 | bwd_inner_microstep: 1646.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 04:36:48,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.23 | bwd_microstep: 1541.62 | bwd_inner_microstep: 1541.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 04:36:50,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1287.80 | bwd_inner_microstep: 1287.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 04:36:51,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.47 | bwd_microstep: 802.95 | bwd_inner_microstep: 802.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-10 04:36:53,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.68 | bwd_microstep: 1580.05 | bwd_inner_microstep: 1580.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3745 [2024-06-10 04:36:56,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.70 | bwd_microstep: 1733.57 | bwd_inner_microstep: 1733.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-10 04:36:57,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.48 | bwd_microstep: 807.33 | bwd_inner_microstep: 807.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379 [2024-06-10 04:36:59,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1432.50 | bwd_inner_microstep: 1432.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 04:37:01,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 04:37:03,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1388.40 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513 [2024-06-10 04:37:05,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997 [2024-06-10 04:37:06,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.57 | bwd_microstep: 831.36 | bwd_inner_microstep: 831.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3541 [2024-06-10 04:37:08,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.75 | bwd_microstep: 1586.15 | bwd_inner_microstep: 1586.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3538 [2024-06-10 04:37:10,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.25 | bwd_microstep: 1447.60 | bwd_inner_microstep: 1447.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3570 [2024-06-10 04:37:12,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.31 | bwd_microstep: 1347.92 | bwd_inner_microstep: 1347.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695 [2024-06-10 04:37:14,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.31 | bwd_microstep: 1533.37 | bwd_inner_microstep: 1533.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 04:37:16,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1404.28 | bwd_inner_microstep: 1404.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3916 [2024-06-10 04:37:18,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1489.26 | bwd_inner_microstep: 1489.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 04:37:19,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 789.82 | bwd_inner_microstep: 789.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663 [2024-06-10 04:37:21,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1323.12 | bwd_inner_microstep: 1323.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 04:37:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1289.29 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 04:37:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.74 | bwd_microstep: 1190.08 | bwd_inner_microstep: 1190.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589 [2024-06-10 04:37:27,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.11 | bwd_microstep: 1605.33 | bwd_inner_microstep: 1605.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-10 04:37:29,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1446.58 | bwd_inner_microstep: 1446.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660 [2024-06-10 04:37:31,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.07 | bwd_microstep: 1454.36 | bwd_inner_microstep: 1454.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 04:37:33,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1275.26 | bwd_inner_microstep: 1275.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 04:37:35,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1653.57 | bwd_inner_microstep: 1653.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-10 04:37:36,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 813.89 | bwd_inner_microstep: 813.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 04:37:40,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 04:37:40,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.61 | bwd_microstep: 3077.17 | bwd_inner_microstep: 1639.01 | bwd_allreduce_microstep: 1438.10 | step_microstep: 38.44 [2024-06-10 04:37:40,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.19 | bwd: 44563.00 | bwd_inner: 43123.89 | bwd_allreduce: 1438.38 | step: 40.10 {'loss': 1.3437, 'learning_rate': 3.893104900345631e-05, 'epoch': 0.13} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600 [2024-06-10 04:37:42,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1465.55 | bwd_inner_microstep: 1465.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 04:37:44,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 04:37:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1478.24 | bwd_inner_microstep: 1478.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068 [2024-06-10 04:37:47,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 818.56 | bwd_inner_microstep: 818.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801 [2024-06-10 04:37:49,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.27 | bwd_microstep: 1649.90 | bwd_inner_microstep: 1649.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 04:37:51,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.81 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 04:37:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 04:37:55,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 04:37:57,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 04:37:59,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1254.14 | bwd_inner_microstep: 1254.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 04:38:00,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3477 [2024-06-10 04:38:03,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1547.73 | bwd_inner_microstep: 1547.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 04:38:05,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1478.32 | bwd_inner_microstep: 1478.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429 [2024-06-10 04:38:07,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2347 [2024-06-10 04:38:08,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.94 | bwd_microstep: 989.40 | bwd_inner_microstep: 989.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428 [2024-06-10 04:38:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-10 04:38:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.95 | bwd_microstep: 1422.26 | bwd_inner_microstep: 1422.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2027 [2024-06-10 04:38:13,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.92 | bwd_microstep: 744.99 | bwd_inner_microstep: 744.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 04:38:15,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1298.64 | bwd_inner_microstep: 1298.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649 [2024-06-10 04:38:17,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.58 | bwd_microstep: 1518.51 | bwd_inner_microstep: 1518.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 04:38:19,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1415.34 | bwd_inner_microstep: 1415.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 04:38:20,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 716.55 | bwd_inner_microstep: 716.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:38:22,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828 [2024-06-10 04:38:23,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.35 | bwd_microstep: 1390.97 | bwd_inner_microstep: 1390.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 04:38:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1558.90 | bwd_inner_microstep: 1558.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 04:38:28,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1558.56 | bwd_inner_microstep: 1558.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-10 04:38:30,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.23 | bwd_inner_microstep: 1510.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-10 04:38:32,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.14 | bwd_microstep: 1489.05 | bwd_inner_microstep: 1489.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717 [2024-06-10 04:38:34,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1366.29 | bwd_inner_microstep: 1366.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-10 04:38:36,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1469.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3468 [2024-06-10 04:38:38,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1426.15 | bwd_inner_microstep: 1426.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2903 [2024-06-10 04:38:40,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-10 04:38:40,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.30 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 7.59 | step_microstep: 38.30 [2024-06-10 04:38:40,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16233.50 | bwd: 43385.42 | bwd_inner: 43376.90 | bwd_allreduce: 7.82 | step: 39.84 {'loss': 1.3862, 'learning_rate': 3.8918909095986704e-05, 'epoch': 0.13} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 04:38:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.69 | bwd_microstep: 1448.92 | bwd_inner_microstep: 1448.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 04:38:43,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.93 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 04:38:45,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1290.14 | bwd_inner_microstep: 1290.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 04:38:47,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1293.46 | bwd_inner_microstep: 1293.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2254 [2024-06-10 04:38:48,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.16 | bwd_microstep: 968.21 | bwd_inner_microstep: 968.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 04:38:50,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1480.58 | bwd_inner_microstep: 1480.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1368 [2024-06-10 04:38:51,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 211.83 | bwd_microstep: 553.71 | bwd_inner_microstep: 553.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 04:38:53,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.10 | bwd_microstep: 1527.55 | bwd_inner_microstep: 1527.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 04:38:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1247.76 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 04:38:57,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1388.98 | bwd_inner_microstep: 1388.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2525 [2024-06-10 04:38:58,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.66 | bwd_microstep: 935.03 | bwd_inner_microstep: 935.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416 [2024-06-10 04:39:00,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.10 | bwd_microstep: 1309.05 | bwd_inner_microstep: 1309.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 04:39:02,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1382.02 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3720 [2024-06-10 04:39:04,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.34 | bwd_microstep: 1835.41 | bwd_inner_microstep: 1835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 04:39:06,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3520 [2024-06-10 04:39:08,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.98 | bwd_microstep: 1358.60 | bwd_inner_microstep: 1358.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 04:39:10,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1392.00 | bwd_inner_microstep: 1391.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 04:39:12,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.88 | bwd_microstep: 1261.27 | bwd_inner_microstep: 1261.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 04:39:14,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.32 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-10 04:39:16,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.13 | bwd_microstep: 1314.11 | bwd_inner_microstep: 1314.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 04:39:18,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1390.48 | bwd_inner_microstep: 1390.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 04:39:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.87 | bwd_microstep: 978.08 | bwd_inner_microstep: 978.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-10 04:39:20,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 801.20 | bwd_inner_microstep: 801.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-10 04:39:21,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 697.63 | bwd_inner_microstep: 697.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-10 04:39:23,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.25 | bwd_microstep: 1635.36 | bwd_inner_microstep: 1635.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1153 [2024-06-10 04:39:24,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 182.76 | bwd_microstep: 468.56 | bwd_inner_microstep: 468.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3549 [2024-06-10 04:39:26,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.34 | bwd_microstep: 1345.13 | bwd_inner_microstep: 1345.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3420 [2024-06-10 04:39:28,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.75 | bwd_microstep: 1397.00 | bwd_inner_microstep: 1396.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2197 [2024-06-10 04:39:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.75 | bwd_microstep: 1020.08 | bwd_inner_microstep: 1020.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 04:39:31,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1312.07 | bwd_inner_microstep: 1312.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574 [2024-06-10 04:39:33,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.50 | bwd_microstep: 1570.24 | bwd_inner_microstep: 1570.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3774 [2024-06-10 04:39:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 04:39:40,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.09 | bwd_microstep: 6512.20 | bwd_inner_microstep: 1902.21 | bwd_allreduce_microstep: 4609.94 | step_microstep: 38.68 [2024-06-10 04:39:40,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15194.13 | bwd: 45342.07 | bwd_inner: 40731.22 | bwd_allreduce: 4610.17 | step: 40.27 {'loss': 1.3597, 'learning_rate': 3.890670255621761e-05, 'epoch': 0.13} /1726 [3:57:13<25:47:55, 61.83s/it] 13%|█▎ | 224/1726 [3:57:13<25:47:55, 61.83s/it] 13%|█▎ | 225/1726 [3:58:15<25:42:32, 61.66s/it] 13%|█▎ | 225/1726 [3:58:15<25:42:32, 61.66s/it] 13%|█▎ | 226/1726 [3:59:15<25:33:52, 61.35s/it] 13%|█▎ | 226/1726 [3:59:15<25:33:52, 61.35s/it] 13%|█▎ | 227/1726 [4:00:16<25:30:01, 61.24s/it] 13%|█▎ | 227/1726 [4:00:16<25:30:01, 61.24s/it] 13%|█▎ | 228/1726 [4:01:16<25:19:26, 60.86s/it] 13%|█▎ | 228/1726 [4:01:16<25:19:26, 60.86s/it] 13%|█▎ | 229/1726 [4:02:17<25:18:35, 60.87s/it] 13%|█▎ | 229/1726 [4:02:17<25:18:3dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 04:39:42,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2625 [2024-06-10 04:39:44,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.42 | bwd_microstep: 1011.45 | bwd_inner_microstep: 1011.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2309 [2024-06-10 04:39:45,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 981.29 | bwd_inner_microstep: 981.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-10 04:39:47,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 04:39:49,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.44 | bwd_microstep: 1541.17 | bwd_inner_microstep: 1541.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3754 [2024-06-10 04:39:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1880 [2024-06-10 04:39:52,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.12 | bwd_microstep: 773.46 | bwd_inner_microstep: 773.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-10 04:39:55,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.93 | bwd_microstep: 1630.38 | bwd_inner_microstep: 1630.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3747 [2024-06-10 04:39:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1503.03 | bwd_inner_microstep: 1503.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 04:39:59,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 04:40:00,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 798.58 | bwd_inner_microstep: 798.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 04:40:01,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1282.89 | bwd_inner_microstep: 1282.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3672 [2024-06-10 04:40:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1485.75 | bwd_inner_microstep: 1485.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911 [2024-06-10 04:40:05,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 779.76 | bwd_inner_microstep: 779.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147 [2024-06-10 04:40:06,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.60 | bwd_microstep: 948.98 | bwd_inner_microstep: 948.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3689 [2024-06-10 04:40:08,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.14 | bwd_microstep: 1829.10 | bwd_inner_microstep: 1829.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968 [2024-06-10 04:40:10,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.38 | bwd_microstep: 825.11 | bwd_inner_microstep: 825.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3650 [2024-06-10 04:40:11,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1291.92 | bwd_inner_microstep: 1291.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493 [2024-06-10 04:40:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.92 | bwd_microstep: 1193.23 | bwd_inner_microstep: 1193.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3618 [2024-06-10 04:40:15,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.19 | bwd_microstep: 1267.09 | bwd_inner_microstep: 1267.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 04:40:16,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.20 | bwd_microstep: 799.09 | bwd_inner_microstep: 799.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2147 [2024-06-10 04:40:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.03 | bwd_microstep: 853.90 | bwd_inner_microstep: 853.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555 [2024-06-10 04:40:19,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.81 | bwd_microstep: 1331.20 | bwd_inner_microstep: 1331.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641 [2024-06-10 04:40:21,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.72 | bwd_microstep: 1351.51 | bwd_inner_microstep: 1351.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 04:40:23,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1502.38 | bwd_inner_microstep: 1502.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 04:40:25,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.46 | bwd_microstep: 1285.54 | bwd_inner_microstep: 1285.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3809 [2024-06-10 04:40:26,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.18 | bwd_microstep: 1291.92 | bwd_inner_microstep: 1291.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2264 [2024-06-10 04:40:28,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.81 | bwd_microstep: 813.55 | bwd_inner_microstep: 813.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3428 [2024-06-10 04:40:29,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 1376.79 | bwd_inner_microstep: 1376.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 04:40:31,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.92 | bwd_microstep: 976.10 | bwd_inner_microstep: 976.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-10 04:40:33,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1518.77 | bwd_inner_microstep: 1518.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3851 [2024-06-10 04:40:41,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.64 [2024-06-10 04:40:41,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 681.35 | bwd_microstep: 7530.20 | bwd_inner_microstep: 2115.57 | bwd_allreduce_microstep: 5414.57 | step_microstep: 38.53 [2024-06-10 04:40:41,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14926.95 | bwd: 45483.42 | bwd_inner: 40067.88 | bwd_allreduce: 5414.84 | step: 40.21 {'loss': 1.286, 'learning_rate': 3.889442942714041e-05, 'epoch': 0.13} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4586 [2024-06-10 04:40:44,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.72 | bwd_microstep: 1770.38 | bwd_inner_microstep: 1770.29 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 04:40:46,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1393.29 | bwd_inner_microstep: 1393.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 04:40:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1486.38 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 04:40:50,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.70 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 04:40:52,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.91 | bwd_microstep: 1444.53 | bwd_inner_microstep: 1444.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 04:40:53,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1281.08 | bwd_inner_microstep: 1281.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 04:40:55,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 04:40:57,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.49 | bwd_microstep: 1152.42 | bwd_inner_microstep: 1152.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 04:40:59,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1385.43 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205 [2024-06-10 04:41:00,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.01 | bwd_microstep: 956.18 | bwd_inner_microstep: 956.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-10 04:41:02,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.15 | bwd_microstep: 1315.17 | bwd_inner_microstep: 1315.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 04:41:04,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482 [2024-06-10 04:41:06,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.07 | bwd_microstep: 1578.85 | bwd_inner_microstep: 1578.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 04:41:08,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-10 04:41:10,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1446.82 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389 [2024-06-10 04:41:12,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.76 | bwd_microstep: 1338.01 | bwd_inner_microstep: 1337.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 04:41:13,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.55 | bwd_microstep: 1162.32 | bwd_inner_microstep: 1162.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 04:41:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1487.48 | bwd_inner_microstep: 1487.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 04:41:17,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1282.39 | bwd_inner_microstep: 1282.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 04:41:19,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.75 | bwd_microstep: 1328.01 | bwd_inner_microstep: 1327.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 04:41:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1406.84 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623 [2024-06-10 04:41:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.40 | bwd_microstep: 1312.57 | bwd_inner_microstep: 1312.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 04:41:25,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 04:41:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1494.55 | bwd_inner_microstep: 1494.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 04:41:29,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1386.01 | bwd_inner_microstep: 1385.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 04:41:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1491.78 | bwd_inner_microstep: 1491.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687 [2024-06-10 04:41:33,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.75 | bwd_microstep: 1489.33 | bwd_inner_microstep: 1489.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643 [2024-06-10 04:41:35,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.91 | bwd_microstep: 1315.47 | bwd_inner_microstep: 1315.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 04:41:37,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1451.27 | bwd_inner_microstep: 1451.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617 [2024-06-10 04:41:39,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.06 | bwd_microstep: 1605.74 | bwd_inner_microstep: 1605.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3476 [2024-06-10 04:41:41,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.68 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2014 [2024-06-10 04:41:45,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 04:41:45,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.67 | bwd_microstep: 3313.81 | bwd_inner_microstep: 984.54 | bwd_allreduce_microstep: 2329.21 | step_microstep: 38.59 [2024-06-10 04:41:45,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16546.18 | bwd: 46581.96 | bwd_inner: 44251.72 | bwd_allreduce: 2329.49 | step: 40.24 {'loss': 1.3273, 'learning_rate': 3.8882089751980985e-05, 'epoch': 0.13} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-10 04:41:46,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.19 | bwd_microstep: 678.09 | bwd_inner_microstep: 677.95 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 04:41:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.12 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-10 04:41:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3754 [2024-06-10 04:41:51,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.08 | bwd_microstep: 1340.68 | bwd_inner_microstep: 1340.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463 [2024-06-10 04:41:53,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.82 | bwd_microstep: 1246.66 | bwd_inner_microstep: 1246.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1372 [2024-06-10 04:41:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.77 | bwd_microstep: 520.87 | bwd_inner_microstep: 520.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421 [2024-06-10 04:41:55,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.17 | bwd_microstep: 1216.16 | bwd_inner_microstep: 1216.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-10 04:41:58,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.45 | bwd_microstep: 1630.14 | bwd_inner_microstep: 1630.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420 [2024-06-10 04:41:59,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1157.06 | bwd_inner_microstep: 1157.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916 [2024-06-10 04:42:00,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.98 | bwd_microstep: 689.11 | bwd_inner_microstep: 689.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 04:42:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1314.15 | bwd_inner_microstep: 1314.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3666 [2024-06-10 04:42:04,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.98 | bwd_microstep: 1671.14 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 04:42:06,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1381.56 | bwd_inner_microstep: 1381.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510 [2024-06-10 04:42:08,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1419.87 | bwd_inner_microstep: 1419.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676 [2024-06-10 04:42:10,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.92 | bwd_microstep: 1722.31 | bwd_inner_microstep: 1722.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2158 [2024-06-10 04:42:12,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.78 | bwd_microstep: 822.91 | bwd_inner_microstep: 822.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 04:42:13,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1353.79 | bwd_inner_microstep: 1353.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485 [2024-06-10 04:42:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1405.30 | bwd_inner_microstep: 1405.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 04:42:17,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1409.34 | bwd_inner_microstep: 1409.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-10 04:42:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.53 | bwd_microstep: 1360.36 | bwd_inner_microstep: 1360.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 04:42:21,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1297.98 | bwd_inner_microstep: 1297.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3605 [2024-06-10 04:42:23,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1248.84 | bwd_inner_microstep: 1248.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 04:42:25,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1382.22 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-10 04:42:27,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.98 | bwd_microstep: 1541.02 | bwd_inner_microstep: 1541.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 04:42:29,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1405.84 | bwd_inner_microstep: 1405.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 04:42:31,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1459.00 | bwd_inner_microstep: 1458.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-10 04:42:33,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1599.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3578 [2024-06-10 04:42:35,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1447.14 | bwd_inner_microstep: 1447.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 04:42:37,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 1600.88 | bwd_inner_microstep: 1600.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 04:42:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1379.31 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 04:42:41,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1604.21 | bwd_inner_microstep: 1604.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-10 04:42:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.57 [2024-06-10 04:42:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.72 | bwd_microstep: 3131.73 | bwd_inner_microstep: 1099.72 | bwd_allreduce_microstep: 2031.96 | step_microstep: 38.49 [2024-06-10 04:42:45,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15748.88 | bwd: 44104.39 | bwd_inner: 42071.42 | bwd_allreduce: 2032.24 | step: 40.15 {'loss': 1.3328, 'learning_rate': 3.886968357419961e-05, 'epoch': 0.13} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 04:42:47,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1338.78 | bwd_inner_microstep: 1338.70 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 04:42:48,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1246.31 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3845 [2024-06-10 04:42:50,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1457.74 | bwd_inner_microstep: 1457.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 04:42:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1346.18 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4061 [2024-06-10 04:42:54,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.63 | bwd_microstep: 1555.01 | bwd_inner_microstep: 1554.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 04:42:56,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1384.55 | bwd_inner_microstep: 1384.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728 [2024-06-10 04:42:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.77 | bwd_microstep: 1464.90 | bwd_inner_microstep: 1464.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 04:43:00,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.34 | bwd_microstep: 1385.09 | bwd_inner_microstep: 1385.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 04:43:02,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.63 | bwd_microstep: 1153.02 | bwd_inner_microstep: 1152.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3585 [2024-06-10 04:43:04,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1239.97 | bwd_inner_microstep: 1239.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 04:43:06,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 04:43:07,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.55 | bwd_microstep: 1283.41 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3670 [2024-06-10 04:43:09,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.16 | bwd_microstep: 1548.09 | bwd_inner_microstep: 1548.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3505 [2024-06-10 04:43:12,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1498.72 | bwd_inner_microstep: 1498.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 04:43:13,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1247.76 | bwd_inner_microstep: 1247.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 04:43:15,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1850 [2024-06-10 04:43:16,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.29 | bwd_microstep: 672.66 | bwd_inner_microstep: 672.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422 [2024-06-10 04:43:18,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1544.69 | bwd_inner_microstep: 1544.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963 [2024-06-10 04:43:19,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.56 | bwd_microstep: 824.01 | bwd_inner_microstep: 823.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 04:43:21,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.49 | bwd_microstep: 1488.42 | bwd_inner_microstep: 1488.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290 [2024-06-10 04:43:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.50 | bwd_microstep: 1072.40 | bwd_inner_microstep: 1072.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533 [2024-06-10 04:43:25,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.75 | bwd_microstep: 1331.34 | bwd_inner_microstep: 1331.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430 [2024-06-10 04:43:27,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1410.28 | bwd_inner_microstep: 1410.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 04:43:29,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1600.69 | bwd_inner_microstep: 1600.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3664 [2024-06-10 04:43:31,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.60 | bwd_microstep: 1355.74 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 04:43:33,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 04:43:35,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1603.13 | bwd_inner_microstep: 1603.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914 [2024-06-10 04:43:36,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.16 | bwd_microstep: 719.83 | bwd_inner_microstep: 719.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 04:43:38,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1282.94 | bwd_inner_microstep: 1282.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 04:43:40,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.03 | bwd_microstep: 1649.15 | bwd_inner_microstep: 1649.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-10 04:43:42,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1303.98 | bwd_inner_microstep: 1303.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 04:43:48,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 04:43:48,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 5163.64 | bwd_inner_microstep: 1569.80 | bwd_allreduce_microstep: 3593.79 | step_microstep: 38.59 [2024-06-10 04:43:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15976.87 | bwd: 46330.66 | bwd_inner: 42735.91 | bwd_allreduce: 3594.05 | step: 40.16 {'loss': 1.3413, 'learning_rate': 3.885721093749078e-05, 'epoch': 0.13} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3390 [2024-06-10 04:43:49,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 04:43:51,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.87 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3399 [2024-06-10 04:43:53,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1469.39 | bwd_inner_microstep: 1469.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 04:43:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 790.76 | bwd_inner_microstep: 790.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 04:43:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1277.61 | bwd_inner_microstep: 1277.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 04:43:58,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.56 | bwd_microstep: 1526.72 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 04:44:00,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.42 | bwd_microstep: 1483.61 | bwd_inner_microstep: 1483.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4127 [2024-06-10 04:44:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.54 | bwd_microstep: 1639.19 | bwd_inner_microstep: 1639.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 429 [2024-06-10 04:44:03,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 88.67 | bwd_microstep: 217.65 | bwd_inner_microstep: 217.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 04:44:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 04:44:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1283.84 | bwd_inner_microstep: 1283.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1920 [2024-06-10 04:44:07,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.87 | bwd_microstep: 843.26 | bwd_inner_microstep: 843.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-10 04:44:09,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1317.78 | bwd_inner_microstep: 1317.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 04:44:11,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1481.00 | bwd_inner_microstep: 1480.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3468 [2024-06-10 04:44:13,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1391.35 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 04:44:15,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1255.10 | bwd_inner_microstep: 1255.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317 [2024-06-10 04:44:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.81 | bwd_microstep: 985.78 | bwd_inner_microstep: 985.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 04:44:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1289.43 | bwd_inner_microstep: 1289.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 04:44:20,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1493.79 | bwd_inner_microstep: 1493.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-10 04:44:22,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1380.66 | bwd_inner_microstep: 1380.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 04:44:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.96 | bwd_microstep: 1496.68 | bwd_inner_microstep: 1496.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 04:44:26,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.81 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 04:44:28,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1394.92 | bwd_inner_microstep: 1394.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4127 [2024-06-10 04:44:30,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.50 | bwd_microstep: 1643.07 | bwd_inner_microstep: 1643.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426 [2024-06-10 04:44:32,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.38 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 04:44:34,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1492.94 | bwd_inner_microstep: 1492.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3440 [2024-06-10 04:44:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1203.54 | bwd_inner_microstep: 1203.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 04:44:38,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1630.72 | bwd_inner_microstep: 1630.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3382 [2024-06-10 04:44:40,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3563 [2024-06-10 04:44:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.08 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2947 [2024-06-10 04:44:44,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 04:44:49,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.70 | optimizer_gradients: 4.24 | optimizer_step: 6.57 [2024-06-10 04:44:49,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 5046.29 | bwd_inner_microstep: 1685.05 | bwd_allreduce_microstep: 3361.19 | step_microstep: 39.61 [2024-06-10 04:44:49,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15774.51 | bwd: 45568.89 | bwd_inner: 42206.79 | bwd_allreduce: 3361.42 | step: 41.24 {'loss': 1.3456, 'learning_rate': 3.884467188578306e-05, 'epoch': 0.14} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459 [2024-06-10 04:44:51,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1329.09 | bwd_inner_microstep: 1329.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4623 [2024-06-10 04:44:54,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 699.63 | bwd_microstep: 1862.74 | bwd_inner_microstep: 1862.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-10 04:44:56,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.26 | bwd_microstep: 1549.53 | bwd_inner_microstep: 1549.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2346 [2024-06-10 04:44:57,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.98 | bwd_microstep: 985.84 | bwd_inner_microstep: 985.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839 [2024-06-10 04:44:59,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1555.01 | bwd_inner_microstep: 1554.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 04:45:01,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1479.68 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:45:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1380.28 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3713 [2024-06-10 04:45:05,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.47 | bwd_microstep: 1332.64 | bwd_inner_microstep: 1332.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1957 [2024-06-10 04:45:06,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.46 | bwd_microstep: 766.11 | bwd_inner_microstep: 766.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464 [2024-06-10 04:45:08,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.60 | bwd_microstep: 1343.17 | bwd_inner_microstep: 1343.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443 [2024-06-10 04:45:10,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.82 | bwd_microstep: 1158.69 | bwd_inner_microstep: 1158.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 04:45:11,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.70 | bwd_inner_microstep: 1256.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1916 [2024-06-10 04:45:12,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.27 | bwd_microstep: 755.25 | bwd_inner_microstep: 755.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 04:45:14,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 04:45:16,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.54 | bwd_microstep: 1448.73 | bwd_inner_microstep: 1448.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3465 [2024-06-10 04:45:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1423.69 | bwd_inner_microstep: 1423.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676 [2024-06-10 04:45:21,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.10 | bwd_microstep: 1618.06 | bwd_inner_microstep: 1618.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435 [2024-06-10 04:45:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1375.39 | bwd_inner_microstep: 1375.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3933 [2024-06-10 04:45:24,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1403.93 | bwd_inner_microstep: 1403.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-10 04:45:26,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.02 | bwd_microstep: 1182.64 | bwd_inner_microstep: 1182.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 04:45:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.86 | bwd_microstep: 798.54 | bwd_inner_microstep: 798.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3551 [2024-06-10 04:45:29,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3439 [2024-06-10 04:45:31,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.29 | bwd_microstep: 1378.78 | bwd_inner_microstep: 1378.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 04:45:33,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1412.80 | bwd_inner_microstep: 1412.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2114 [2024-06-10 04:45:35,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.77 | bwd_microstep: 956.63 | bwd_inner_microstep: 956.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594 [2024-06-10 04:45:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1312.27 | bwd_inner_microstep: 1312.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462 [2024-06-10 04:45:38,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.24 | bwd_microstep: 1182.12 | bwd_inner_microstep: 1182.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-10 04:45:40,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.27 | bwd_microstep: 1161.83 | bwd_inner_microstep: 1161.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 04:45:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1544.60 | bwd_inner_microstep: 1544.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 04:45:44,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1450.33 | bwd_inner_microstep: 1450.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806 [2024-06-10 04:45:46,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.78 | bwd_microstep: 1622.00 | bwd_inner_microstep: 1621.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589 [2024-06-10 04:45:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 04:45:50,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 3926.87 | bwd_inner_microstep: 1620.69 | bwd_allreduce_microstep: 2306.13 | step_microstep: 38.55 [2024-06-10 04:45:50,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.19 | bwd: 44965.46 | bwd_inner: 42658.42 | bwd_allreduce: 2306.35 | step: 40.13 {'loss': 1.3036, 'learning_rate': 3.883206646323892e-05, 'epoch': 0.14} 5, 60.87s/it] 13%|█▎ | 230/1726 [4:03:18<25:16:41, 60.83s/it] 13%|█▎ | 230/1726 [4:03:18<25:16:41, 60.83s/it] 13%|█▎ | 231/1726 [4:04:21<25:35:29, 61.62s/it] 13%|█▎ | 231/1726 [4:04:21<25:35:29, 61.62s/it] 13%|█▎ | 232/1726 [4:05:22<25:23:47, 61.20s/it] 13%|█▎ | 232/1726 [4:05:22<25:23:47, 61.20s/it] 13%|█▎ | 233/1726 [4:06:24<25:33:37, 61.63s/it] 13%|█▎ | 233/1726 [4:06:24<25:33:37, 61.63s/it] 14%|█▎ | 234/1726 [4:07:26<25:32:59, 61.65s/it] 14%|█▎ | 234/1726 [4:07:26<25:32:59, 61.65s/it] 14%|█▎ | 235/1726 [4:08:27<25:29:03, 61.53s/it] 14%|█dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 04:45:52,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1265.86 | bwd_inner_microstep: 1265.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3979 [2024-06-10 04:45:54,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1408.69 | bwd_inner_microstep: 1408.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 04:45:56,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 1476.02 | bwd_inner_microstep: 1475.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-10 04:45:58,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 04:45:59,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.58 | bwd_microstep: 791.67 | bwd_inner_microstep: 791.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 04:46:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.23 | bwd_microstep: 1384.28 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3783 [2024-06-10 04:46:03,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 04:46:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1536.62 | bwd_inner_microstep: 1536.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 04:46:07,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.48 | bwd_microstep: 1153.76 | bwd_inner_microstep: 1153.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657 [2024-06-10 04:46:09,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.53 | bwd_microstep: 1622.30 | bwd_inner_microstep: 1622.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691 [2024-06-10 04:46:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1424.13 | bwd_inner_microstep: 1424.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 04:46:13,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1315.38 | bwd_inner_microstep: 1315.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 04:46:15,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.60 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426 [2024-06-10 04:46:17,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.13 | bwd_inner_microstep: 1253.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 04:46:19,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1475.33 | bwd_inner_microstep: 1475.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 04:46:21,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 04:46:23,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.62 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696 [2024-06-10 04:46:25,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.10 | bwd_microstep: 1631.41 | bwd_inner_microstep: 1631.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 04:46:27,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.25 | bwd_microstep: 1463.35 | bwd_inner_microstep: 1463.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-10 04:46:28,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.12 | bwd_microstep: 797.11 | bwd_inner_microstep: 797.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 04:46:30,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1297.05 | bwd_inner_microstep: 1297.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 04:46:32,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.29 | bwd_microstep: 1522.12 | bwd_inner_microstep: 1522.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 04:46:34,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.12 | bwd_microstep: 1457.19 | bwd_inner_microstep: 1457.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 04:46:36,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.56 | bwd_microstep: 1288.85 | bwd_inner_microstep: 1288.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721 [2024-06-10 04:46:38,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1563.96 | bwd_inner_microstep: 1563.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 04:46:40,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1558.08 | bwd_inner_microstep: 1558.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951 [2024-06-10 04:46:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 700.00 | bwd_inner_microstep: 699.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 04:46:43,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1496.29 | bwd_inner_microstep: 1496.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 04:46:45,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.14 | bwd_microstep: 1403.62 | bwd_inner_microstep: 1403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585 [2024-06-10 04:46:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1608.48 | bwd_inner_microstep: 1608.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3618 [2024-06-10 04:46:50,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1653.55 | bwd_inner_microstep: 1653.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 04:46:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.57 [2024-06-10 04:46:51,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 1083.08 | bwd_inner_microstep: 819.57 | bwd_allreduce_microstep: 263.46 | step_microstep: 38.49 [2024-06-10 04:46:51,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.03 | bwd: 43818.17 | bwd_inner: 43553.81 | bwd_allreduce: 263.69 | step: 40.17 {'loss': 1.3891, 'learning_rate': 3.88193947142546e-05, 'epoch': 0.14} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 04:46:53,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.30 | bwd_microstep: 1272.42 | bwd_inner_microstep: 1272.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872 [2024-06-10 04:46:55,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.36 | bwd_microstep: 1667.52 | bwd_inner_microstep: 1667.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 04:46:57,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1311.09 | bwd_inner_microstep: 1311.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930 [2024-06-10 04:46:58,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 790.25 | bwd_inner_microstep: 790.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-10 04:47:00,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.12 | bwd_microstep: 1447.03 | bwd_inner_microstep: 1447.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716 [2024-06-10 04:47:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 04:47:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 04:47:05,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3484 [2024-06-10 04:47:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1497.08 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 04:47:09,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3382 [2024-06-10 04:47:10,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.15 | bwd_microstep: 1177.66 | bwd_inner_microstep: 1177.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 04:47:12,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.81 | bwd_microstep: 1347.73 | bwd_inner_microstep: 1347.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778 [2024-06-10 04:47:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.69 | bwd_microstep: 1745.87 | bwd_inner_microstep: 1745.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-10 04:47:17,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1438.68 | bwd_inner_microstep: 1438.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 04:47:18,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1292.48 | bwd_inner_microstep: 1292.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619 [2024-06-10 04:47:20,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1442.79 | bwd_inner_microstep: 1442.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 04:47:22,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.57 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517 [2024-06-10 04:47:24,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.40 | bwd_microstep: 1195.14 | bwd_inner_microstep: 1195.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 04:47:26,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.28 | bwd_microstep: 1321.95 | bwd_inner_microstep: 1321.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 04:47:28,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.43 | bwd_microstep: 1397.83 | bwd_inner_microstep: 1397.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 04:47:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1300.23 | bwd_inner_microstep: 1300.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 04:47:31,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 04:47:33,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1477.78 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3554 [2024-06-10 04:47:36,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1544.24 | bwd_inner_microstep: 1544.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 04:47:37,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823 [2024-06-10 04:47:40,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814 [2024-06-10 04:47:42,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.70 | bwd_microstep: 1755.69 | bwd_inner_microstep: 1755.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 04:47:44,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1384.08 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-10 04:47:45,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.58 | bwd_microstep: 977.87 | bwd_inner_microstep: 977.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 04:47:48,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.83 | bwd_microstep: 1661.46 | bwd_inner_microstep: 1661.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 04:47:49,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1356.51 | bwd_inner_microstep: 1356.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3615 [2024-06-10 04:47:54,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 04:47:54,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 3777.77 | bwd_inner_microstep: 1538.77 | bwd_allreduce_microstep: 2238.94 | step_microstep: 38.70 [2024-06-10 04:47:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16396.90 | bwd: 46145.02 | bwd_inner: 43905.12 | bwd_allreduce: 2239.19 | step: 40.30 {'loss': 1.375, 'learning_rate': 3.8806656683459916e-05, 'epoch': 0.14} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4576 [2024-06-10 04:47:56,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.99 | bwd_microstep: 1846.18 | bwd_inner_microstep: 1846.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023 [2024-06-10 04:47:57,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.13 | bwd_microstep: 806.36 | bwd_inner_microstep: 806.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 04:47:59,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1280.51 | bwd_inner_microstep: 1280.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794 [2024-06-10 04:48:02,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 04:48:03,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.73 | bwd_microstep: 1146.30 | bwd_inner_microstep: 1146.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757 [2024-06-10 04:48:05,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1639.22 | bwd_inner_microstep: 1639.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 04:48:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1298.70 | bwd_inner_microstep: 1298.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3798 [2024-06-10 04:48:09,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 04:48:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1298.53 | bwd_inner_microstep: 1298.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 04:48:12,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.04 | bwd_microstep: 797.24 | bwd_inner_microstep: 797.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 04:48:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1254.35 | bwd_inner_microstep: 1254.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 04:48:15,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1255.32 | bwd_inner_microstep: 1255.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979 [2024-06-10 04:48:17,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.95 | bwd_microstep: 892.12 | bwd_inner_microstep: 892.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 04:48:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1484.54 | bwd_inner_microstep: 1484.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 04:48:21,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1531.81 | bwd_inner_microstep: 1531.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 04:48:23,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1344.34 | bwd_inner_microstep: 1344.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 04:48:25,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3575 [2024-06-10 04:48:27,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.10 | bwd_microstep: 1561.70 | bwd_inner_microstep: 1561.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 04:48:29,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1294.83 | bwd_inner_microstep: 1294.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 04:48:30,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1285.33 | bwd_inner_microstep: 1285.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 04:48:32,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.59 | bwd_microstep: 1402.55 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-10 04:48:34,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1432.61 | bwd_inner_microstep: 1432.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 04:48:37,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.26 | bwd_microstep: 1563.64 | bwd_inner_microstep: 1563.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3463 [2024-06-10 04:48:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1494.01 | bwd_inner_microstep: 1493.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432 [2024-06-10 04:48:40,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.45 | bwd_microstep: 1311.66 | bwd_inner_microstep: 1311.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 04:48:42,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.03 | bwd_microstep: 809.18 | bwd_inner_microstep: 809.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 04:48:43,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1410.26 | bwd_inner_microstep: 1410.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 04:48:46,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.79 | bwd_microstep: 1648.20 | bwd_inner_microstep: 1648.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 04:48:48,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1447.36 | bwd_inner_microstep: 1447.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 04:48:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1508.51 | bwd_inner_microstep: 1508.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 04:48:52,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3730 [2024-06-10 04:48:54,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 04:48:54,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.23 | bwd_microstep: 2216.07 | bwd_inner_microstep: 1501.34 | bwd_allreduce_microstep: 714.69 | step_microstep: 38.50 [2024-06-10 04:48:54,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16190.84 | bwd: 44022.77 | bwd_inner: 43307.18 | bwd_allreduce: 714.91 | step: 40.09 {'loss': 1.3222, 'learning_rate': 3.879385241571817e-05, 'epoch': 0.14} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 04:48:56,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.78 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.54 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 04:48:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 04:49:00,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1558.50 | bwd_inner_microstep: 1558.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 04:49:02,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1557.59 | bwd_inner_microstep: 1557.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 04:49:04,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1256.86 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 04:49:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 04:49:08,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1550.53 | bwd_inner_microstep: 1550.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-10 04:49:10,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.32 | bwd_microstep: 1537.23 | bwd_inner_microstep: 1537.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 04:49:12,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.75 | bwd_microstep: 1433.59 | bwd_inner_microstep: 1433.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4030 [2024-06-10 04:49:14,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.08 | bwd_microstep: 1449.89 | bwd_inner_microstep: 1449.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 04:49:16,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1345.05 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1918 [2024-06-10 04:49:17,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.27 | bwd_microstep: 879.32 | bwd_inner_microstep: 879.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 04:49:18,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.96 | bwd_microstep: 791.87 | bwd_inner_microstep: 791.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3736 [2024-06-10 04:49:21,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1655.01 | bwd_inner_microstep: 1654.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2023 [2024-06-10 04:49:22,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.75 | bwd_microstep: 839.11 | bwd_inner_microstep: 839.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3853 [2024-06-10 04:49:24,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.25 | bwd_microstep: 1763.29 | bwd_inner_microstep: 1763.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3396 [2024-06-10 04:49:26,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.47 | bwd_microstep: 1374.52 | bwd_inner_microstep: 1374.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681 [2024-06-10 04:49:28,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1621.67 | bwd_inner_microstep: 1621.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3553 [2024-06-10 04:49:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.30 | bwd_microstep: 1451.32 | bwd_inner_microstep: 1451.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 04:49:32,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1490.16 | bwd_inner_microstep: 1490.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 04:49:35,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1530.32 | bwd_inner_microstep: 1530.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 04:49:36,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.47 | bwd_microstep: 1290.54 | bwd_inner_microstep: 1290.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-10 04:49:38,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.90 | bwd_microstep: 1463.67 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294 [2024-06-10 04:49:40,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 978.35 | bwd_inner_microstep: 978.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2282 [2024-06-10 04:49:41,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.14 | bwd_microstep: 785.76 | bwd_inner_microstep: 785.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089 [2024-06-10 04:49:42,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.09 | bwd_microstep: 918.46 | bwd_inner_microstep: 918.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 04:49:44,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1459.41 | bwd_inner_microstep: 1459.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 04:49:46,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.06 | bwd_microstep: 1503.41 | bwd_inner_microstep: 1503.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3614 [2024-06-10 04:49:48,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1539.39 | bwd_inner_microstep: 1539.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 04:49:50,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.01 | bwd_microstep: 1441.70 | bwd_inner_microstep: 1441.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 04:49:52,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.30 | bwd_microstep: 1505.11 | bwd_inner_microstep: 1505.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4007 [2024-06-10 04:49:56,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 04:49:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 2768.52 | bwd_inner_microstep: 1617.78 | bwd_allreduce_microstep: 1150.69 | step_microstep: 38.46 [2024-06-10 04:49:56,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16251.29 | bwd: 44745.35 | bwd_inner: 43593.66 | bwd_allreduce: 1150.97 | step: 40.07 {'loss': 1.3045, 'learning_rate': 3.8780981956125914e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 04:49:58,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.97 | bwd_microstep: 1442.31 | bwd_inner_microstep: 1442.24 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 04:50:00,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1279.67 | bwd_inner_microstep: 1279.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792 [2024-06-10 04:50:02,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1445.89 | bwd_inner_microstep: 1445.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 04:50:03,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1250.01 | bwd_inner_microstep: 1249.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402 [2024-06-10 04:50:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.10 | bwd_microstep: 1211.67 | bwd_inner_microstep: 1211.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 04:50:06,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 791.59 | bwd_inner_microstep: 791.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-10 04:50:08,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-10 04:50:10,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1633.57 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1879 [2024-06-10 04:50:11,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.92 | bwd_microstep: 711.15 | bwd_inner_microstep: 711.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 04:50:13,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1352.30 | bwd_inner_microstep: 1352.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-10 04:50:15,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.86 | bwd_microstep: 1419.25 | bwd_inner_microstep: 1419.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3400 [2024-06-10 04:50:17,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1401.99 | bwd_inner_microstep: 1401.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 04:50:19,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1280.15 | bwd_inner_microstep: 1280.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 04:50:20,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1251.01 | bwd_inner_microstep: 1250.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-10 04:50:22,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.14 | bwd_microstep: 1188.68 | bwd_inner_microstep: 1188.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 04:50:24,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.12 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522 [2024-06-10 04:50:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1195.24 | bwd_inner_microstep: 1195.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 04:50:27,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1288.56 | bwd_inner_microstep: 1288.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 04:50:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1406.81 | bwd_inner_microstep: 1406.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 04:50:32,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.79 | bwd_microstep: 1660.51 | bwd_inner_microstep: 1660.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 04:50:34,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.81 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663 [2024-06-10 04:50:36,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1321.95 | bwd_inner_microstep: 1321.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442 [2024-06-10 04:50:37,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.86 | bwd_microstep: 1155.69 | bwd_inner_microstep: 1155.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620 [2024-06-10 04:50:38,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.01 | bwd_microstep: 263.26 | bwd_inner_microstep: 263.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-10 04:50:39,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2521 [2024-06-10 04:50:41,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.99 | bwd_microstep: 873.78 | bwd_inner_microstep: 873.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2079 [2024-06-10 04:50:42,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.14 | bwd_microstep: 756.69 | bwd_inner_microstep: 756.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3702 [2024-06-10 04:50:43,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1266.85 | bwd_inner_microstep: 1266.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071 [2024-06-10 04:50:45,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.92 | bwd_microstep: 917.42 | bwd_inner_microstep: 917.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3800 [2024-06-10 04:50:47,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.32 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605 [2024-06-10 04:50:49,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1529.39 | bwd_inner_microstep: 1529.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2602 [2024-06-10 04:50:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 04:50:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.23 | bwd_microstep: 6194.81 | bwd_inner_microstep: 1326.02 | bwd_allreduce_microstep: 4868.74 | step_microstep: 38.76 [2024-06-10 04:50:56,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14888.47 | bwd: 44595.27 | bwd_inner: 39725.57 | bwd_allreduce: 4868.99 | step: 40.37 {'loss': 1.3398, 'learning_rate': 3.876804535001285e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 04:50:58,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1469.83 | bwd_inner_microstep: 1469.71 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 04:50:59,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1242.22 | bwd_inner_microstep: 1242.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 04:51:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.78 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 04:51:03,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.03 | bwd_microstep: 1281.34 | bwd_inner_microstep: 1281.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-10 04:51:05,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.72 | bwd_microstep: 1445.70 | bwd_inner_microstep: 1445.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2209 [2024-06-10 04:51:06,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.41 | bwd_microstep: 890.87 | bwd_inner_microstep: 890.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958 [2024-06-10 04:51:07,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.81 | bwd_microstep: 703.44 | bwd_inner_microstep: 703.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 04:51:09,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 04:51:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.58 | bwd_microstep: 1632.37 | bwd_inner_microstep: 1632.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 04:51:12,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.16 | bwd_microstep: 700.69 | bwd_inner_microstep: 700.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-10 04:51:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1441.58 | bwd_inner_microstep: 1441.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 04:51:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642 [2024-06-10 04:51:19,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1708.07 | bwd_inner_microstep: 1708.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 04:51:21,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.12 | bwd_microstep: 1515.70 | bwd_inner_microstep: 1515.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505 [2024-06-10 04:51:22,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.10 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 04:51:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.89 | bwd_microstep: 1502.69 | bwd_inner_microstep: 1502.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 04:51:27,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.93 | bwd_microstep: 1426.77 | bwd_inner_microstep: 1426.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454 [2024-06-10 04:51:28,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.10 | bwd_microstep: 1215.01 | bwd_inner_microstep: 1214.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-10 04:51:30,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1520.79 | bwd_inner_microstep: 1520.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-10 04:51:33,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.27 | bwd_microstep: 1647.17 | bwd_inner_microstep: 1647.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-10 04:51:35,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1539.91 | bwd_inner_microstep: 1539.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 04:51:37,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1399.86 | bwd_inner_microstep: 1399.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3610 [2024-06-10 04:51:39,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1572.28 | bwd_inner_microstep: 1572.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824 [2024-06-10 04:51:41,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1513.16 | bwd_inner_microstep: 1513.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 04:51:43,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.34 | bwd_microstep: 1556.35 | bwd_inner_microstep: 1556.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 04:51:45,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.15 | bwd_microstep: 1498.64 | bwd_inner_microstep: 1498.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 04:51:47,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1493.94 | bwd_inner_microstep: 1493.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3755 [2024-06-10 04:51:49,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.69 | bwd_microstep: 1573.43 | bwd_inner_microstep: 1573.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-10 04:51:51,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.44 | bwd_microstep: 912.93 | bwd_inner_microstep: 912.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3802 [2024-06-10 04:51:52,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538 [2024-06-10 04:51:54,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.30 | bwd_microstep: 1195.60 | bwd_inner_microstep: 1195.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2047 [2024-06-10 04:51:56,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.20 | optimizer_step: 6.58 [2024-06-10 04:51:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.07 | bwd_microstep: 1943.71 | bwd_inner_microstep: 856.08 | bwd_allreduce_microstep: 1087.58 | step_microstep: 38.64 [2024-06-10 04:51:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16159.28 | bwd: 44349.42 | bwd_inner: 43260.82 | bwd_allreduce: 1087.86 | step: 40.20 ▎ | 235/1726 [4:08:27<25:29:03, 61.53s/it] 14%|█▎ | 236/1726 [4:09:28<25:20:07, 61.21s/it] 14%|█▎ | 236/1726 [4:09:28<25:20:07, 61.21s/it] 14%|█▎ | 237/1726 [4:10:31<25:31:34, 61.72s/it] 14%|█▎ | 237/1726 [4:10:31<25:31:34, 61.72s/it] 14%|█▍ | 238/1726 [4:11:31<25:21:56, 61.37s/it] 14%|█▍ | 238/1726 [4:11:31<25:21:56, 61.37s/it] 14%|█▍ | 239/1726 [4:12:32<25:20:44, 61.36s/it] 14%|█▍ | 239/1726 [4:12:32<25:20:44, 61.36s/it] 14%|█▍ | 240/1726 [4:13:32<25:08:18, 60.90s/it] 14%|█▍ | 240/1726 [4:13:32<25:08:18, 60.90s/it] 14%|█▍ | 241/1726 [4:14:33<25:06:57, 60.89s/it] {'loss': 1.28, 'learning_rate': 3.875504264294161e-05, 'epoch': 0.14} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473 [2024-06-10 04:51:58,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1326.18 | bwd_inner_microstep: 1326.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907 [2024-06-10 04:52:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.65 | bwd_microstep: 1588.86 | bwd_inner_microstep: 1588.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 04:52:02,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1275.73 | bwd_inner_microstep: 1275.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 04:52:04,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1342.37 | bwd_inner_microstep: 1342.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483 [2024-06-10 04:52:06,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1427.92 | bwd_inner_microstep: 1427.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-10 04:52:08,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1530.62 | bwd_inner_microstep: 1530.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 04:52:10,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1312.32 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 04:52:12,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1249.00 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 04:52:13,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.10 | bwd_microstep: 1151.36 | bwd_inner_microstep: 1151.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423 [2024-06-10 04:52:15,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.14 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 04:52:17,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1483.31 | bwd_inner_microstep: 1483.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664 [2024-06-10 04:52:19,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.93 | bwd_microstep: 1611.50 | bwd_inner_microstep: 1611.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 04:52:21,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1486.79 | bwd_inner_microstep: 1486.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3411 [2024-06-10 04:52:23,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1444.75 | bwd_inner_microstep: 1444.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135 [2024-06-10 04:52:25,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.89 | bwd_microstep: 833.18 | bwd_inner_microstep: 833.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101 [2024-06-10 04:52:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.41 | bwd_microstep: 826.71 | bwd_inner_microstep: 826.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 04:52:27,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.23 | bwd_microstep: 810.79 | bwd_inner_microstep: 810.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 04:52:29,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1289.46 | bwd_inner_microstep: 1289.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 04:52:31,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.56 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 04:52:32,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1295.93 | bwd_inner_microstep: 1295.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3613 [2024-06-10 04:52:34,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1374.40 | bwd_inner_microstep: 1374.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 04:52:36,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547 [2024-06-10 04:52:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 1301.18 | bwd_inner_microstep: 1301.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 04:52:40,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1602.91 | bwd_inner_microstep: 1602.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719 [2024-06-10 04:52:42,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1568.63 | bwd_inner_microstep: 1568.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682 [2024-06-10 04:52:44,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1548.63 | bwd_inner_microstep: 1548.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3486 [2024-06-10 04:52:46,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1333.85 | bwd_inner_microstep: 1333.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 04:52:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.82 | bwd_microstep: 1556.50 | bwd_inner_microstep: 1556.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 04:52:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.31 | bwd_microstep: 980.56 | bwd_inner_microstep: 980.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-10 04:52:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.27 | bwd_microstep: 1439.94 | bwd_inner_microstep: 1439.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-10 04:52:53,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 880.62 | bwd_inner_microstep: 880.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764 [2024-06-10 04:52:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 04:52:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 3931.47 | bwd_inner_microstep: 1671.83 | bwd_allreduce_microstep: 2259.58 | step_microstep: 38.63 [2024-06-10 04:52:57,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15900.39 | bwd: 44820.55 | bwd_inner: 42560.02 | bwd_allreduce: 2259.83 | step: 40.49 {'loss': 1.3461, 'learning_rate': 3.874197388070769e-05, 'epoch': 0.14} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 04:52:59,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1360.85 | bwd_inner_microstep: 1360.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 04:53:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1243.20 | bwd_inner_microstep: 1243.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-10 04:53:03,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.41 | bwd_microstep: 1454.80 | bwd_inner_microstep: 1454.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512 [2024-06-10 04:53:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1321.84 | bwd_inner_microstep: 1321.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 04:53:07,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.27 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 04:53:09,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 04:53:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1404.02 | bwd_inner_microstep: 1404.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 04:53:13,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.18 | bwd_microstep: 1485.85 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 04:53:14,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.20 | bwd_microstep: 1188.48 | bwd_inner_microstep: 1188.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068 [2024-06-10 04:53:15,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.60 | bwd_microstep: 820.87 | bwd_inner_microstep: 820.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3489 [2024-06-10 04:53:17,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1329.88 | bwd_inner_microstep: 1329.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2369 [2024-06-10 04:53:19,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.73 | bwd_microstep: 1092.20 | bwd_inner_microstep: 1092.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3407 [2024-06-10 04:53:21,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1536.82 | bwd_inner_microstep: 1536.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645 [2024-06-10 04:53:23,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.13 | bwd_microstep: 1604.28 | bwd_inner_microstep: 1604.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 04:53:25,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.08 | bwd_microstep: 1376.48 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928 [2024-06-10 04:53:26,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.00 | bwd_microstep: 698.24 | bwd_inner_microstep: 698.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 04:53:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1255.77 | bwd_inner_microstep: 1255.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627 [2024-06-10 04:53:30,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1613.63 | bwd_inner_microstep: 1613.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082 [2024-06-10 04:53:31,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.10 | bwd_microstep: 821.47 | bwd_inner_microstep: 821.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771 [2024-06-10 04:53:33,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.09 | bwd_microstep: 1346.01 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 04:53:35,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.06 | bwd_microstep: 1611.17 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 04:53:37,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.59 | bwd_microstep: 1453.21 | bwd_inner_microstep: 1453.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2272 [2024-06-10 04:53:38,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.32 | bwd_microstep: 908.68 | bwd_inner_microstep: 908.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-10 04:53:40,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 04:53:42,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1413.25 | bwd_inner_microstep: 1413.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419 [2024-06-10 04:53:44,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.71 | bwd_microstep: 1373.13 | bwd_inner_microstep: 1373.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 872 [2024-06-10 04:53:45,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.19 | bwd_microstep: 367.26 | bwd_inner_microstep: 367.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2271 [2024-06-10 04:53:46,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.16 | bwd_microstep: 1070.69 | bwd_inner_microstep: 1070.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049 [2024-06-10 04:53:47,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.98 | bwd_microstep: 817.73 | bwd_inner_microstep: 817.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612 [2024-06-10 04:53:49,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.78 | bwd_microstep: 1342.70 | bwd_inner_microstep: 1342.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3414 [2024-06-10 04:53:51,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.12 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3580 [2024-06-10 04:53:59,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.09 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-10 04:53:59,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.31 | bwd_microstep: 7513.49 | bwd_inner_microstep: 1894.01 | bwd_allreduce_microstep: 5619.41 | step_microstep: 40.98 [2024-06-10 04:53:59,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15158.66 | bwd: 46279.37 | bwd_inner: 40659.03 | bwd_allreduce: 5619.64 | step: 42.66 {'loss': 1.2654, 'learning_rate': 3.8728839109339195e-05, 'epoch': 0.14} dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1879 [2024-06-10 04:54:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.04 | bwd_microstep: 738.84 | bwd_inner_microstep: 738.67 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926 [2024-06-10 04:54:02,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.18 | bwd_microstep: 1586.92 | bwd_inner_microstep: 1586.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 04:54:05,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.67 | bwd_microstep: 1548.24 | bwd_inner_microstep: 1548.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 04:54:06,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.60 | bwd_microstep: 1341.72 | bwd_inner_microstep: 1341.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 04:54:09,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.32 | bwd_microstep: 1482.10 | bwd_inner_microstep: 1482.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 04:54:10,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.18 | bwd_microstep: 1380.00 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 04:54:12,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 04:54:14,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1382.36 | bwd_inner_microstep: 1382.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 04:54:16,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.41 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 04:54:18,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 1486.94 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 04:54:20,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.19 | bwd_microstep: 1479.17 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3739 [2024-06-10 04:54:22,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1550.88 | bwd_inner_microstep: 1550.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3494 [2024-06-10 04:54:25,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.99 | bwd_microstep: 1680.13 | bwd_inner_microstep: 1680.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-10 04:54:27,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.25 | bwd_microstep: 1438.46 | bwd_inner_microstep: 1438.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-10 04:54:29,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.58 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 04:54:31,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1489.87 | bwd_inner_microstep: 1489.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 04:54:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1380.85 | bwd_inner_microstep: 1380.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 04:54:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.77 | bwd_microstep: 1470.05 | bwd_inner_microstep: 1470.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657 [2024-06-10 04:54:37,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1426.59 | bwd_inner_microstep: 1426.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717 [2024-06-10 04:54:38,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 04:54:40,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037 [2024-06-10 04:54:41,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.00 | bwd_microstep: 717.32 | bwd_inner_microstep: 717.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 04:54:43,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1488.91 | bwd_inner_microstep: 1488.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 04:54:45,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.07 | bwd_microstep: 802.75 | bwd_inner_microstep: 802.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746 [2024-06-10 04:54:47,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.18 | bwd_microstep: 1449.88 | bwd_inner_microstep: 1449.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 04:54:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1257.39 | bwd_inner_microstep: 1257.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 04:54:50,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.39 | bwd_microstep: 976.61 | bwd_inner_microstep: 976.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 04:54:52,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1657.53 | bwd_inner_microstep: 1657.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 04:54:54,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1399.96 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068 [2024-06-10 04:54:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.99 | bwd_microstep: 819.71 | bwd_inner_microstep: 819.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 04:54:57,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.03 | bwd_microstep: 1476.72 | bwd_inner_microstep: 1476.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 04:54:59,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-10 04:54:59,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1632.12 | bwd_inner_microstep: 1583.49 | bwd_allreduce_microstep: 48.57 | step_microstep: 38.48 [2024-06-10 04:54:59,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16241.89 | bwd: 43449.28 | bwd_inner: 43399.67 | bwd_allreduce: 48.86 | step: 40.14 {'loss': 1.3228, 'learning_rate': 3.871563837509672e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 04:55:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.46 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.45 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2010 [2024-06-10 04:55:02,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.91 | bwd_microstep: 777.55 | bwd_inner_microstep: 777.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3956 [2024-06-10 04:55:05,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.95 | bwd_microstep: 1600.60 | bwd_inner_microstep: 1600.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2799 [2024-06-10 04:55:06,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.04 | bwd_microstep: 1110.33 | bwd_inner_microstep: 1110.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2926 [2024-06-10 04:55:08,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.34 | bwd_microstep: 1029.56 | bwd_inner_microstep: 1029.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469 [2024-06-10 04:55:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.82 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 04:55:11,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1537.25 | bwd_inner_microstep: 1537.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 04:55:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.21 | bwd_microstep: 1290.24 | bwd_inner_microstep: 1290.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 04:55:15,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1355.67 | bwd_inner_microstep: 1355.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1922 [2024-06-10 04:55:16,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.80 | bwd_microstep: 726.15 | bwd_inner_microstep: 726.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-10 04:55:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 04:55:20,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 04:55:22,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1408.80 | bwd_inner_microstep: 1408.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490 [2024-06-10 04:55:24,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.74 | bwd_microstep: 1583.76 | bwd_inner_microstep: 1583.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775 [2024-06-10 04:55:26,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.42 | bwd_microstep: 1741.97 | bwd_inner_microstep: 1741.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645 [2024-06-10 04:55:29,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.42 | bwd_microstep: 1607.02 | bwd_inner_microstep: 1607.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2995 [2024-06-10 04:55:30,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1298.84 | bwd_inner_microstep: 1298.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-10 04:55:32,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.20 | bwd_microstep: 1165.09 | bwd_inner_microstep: 1165.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517 [2024-06-10 04:55:34,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.94 | bwd_microstep: 1323.90 | bwd_inner_microstep: 1323.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2721 [2024-06-10 04:55:35,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.15 | bwd_microstep: 947.48 | bwd_inner_microstep: 947.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 04:55:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.76 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 04:55:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.31 | bwd_microstep: 1454.35 | bwd_inner_microstep: 1454.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 04:55:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1558.63 | bwd_inner_microstep: 1558.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661 [2024-06-10 04:55:43,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.30 | bwd_microstep: 1484.45 | bwd_inner_microstep: 1484.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2509 [2024-06-10 04:55:45,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.87 | bwd_microstep: 1061.21 | bwd_inner_microstep: 1061.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 04:55:46,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.09 | bwd_microstep: 1302.37 | bwd_inner_microstep: 1302.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3810 [2024-06-10 04:55:49,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.85 | bwd_microstep: 1687.76 | bwd_inner_microstep: 1687.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 04:55:51,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1505.43 | bwd_inner_microstep: 1505.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 04:55:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.47 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 04:55:55,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1392.78 | bwd_inner_microstep: 1392.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 04:55:57,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.12 | bwd_microstep: 1597.69 | bwd_inner_microstep: 1597.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809 [2024-06-10 04:56:25,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.39 | optimizer_step: 6.59 [2024-06-10 04:56:25,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.24 | bwd_microstep: 27118.90 | bwd_inner_microstep: 1991.80 | bwd_allreduce_microstep: 25127.03 | step_microstep: 39.82 [2024-06-10 04:56:25,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16288.16 | bwd: 68887.00 | bwd_inner: 43758.99 | bwd_allreduce: 25127.29 | step: 41.43 {'loss': 1.3175, 'learning_rate': 3.870237172447317e-05, 'epoch': 0.14} dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2629 [2024-06-10 04:56:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.39 | bwd_microstep: 912.37 | bwd_inner_microstep: 912.28 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952 [2024-06-10 04:56:28,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1497.01 | bwd_inner_microstep: 1496.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 04:56:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1276.26 | bwd_inner_microstep: 1276.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814 [2024-06-10 04:56:32,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1599.17 | bwd_inner_microstep: 1599.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3776 [2024-06-10 04:56:34,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1503.47 | bwd_inner_microstep: 1503.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 04:56:36,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1280.27 | bwd_inner_microstep: 1280.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246 [2024-06-10 04:56:37,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.48 | bwd_microstep: 966.68 | bwd_inner_microstep: 966.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3542 [2024-06-10 04:56:39,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1230.83 | bwd_inner_microstep: 1230.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-10 04:56:40,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 790.55 | bwd_inner_microstep: 790.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-10 04:56:42,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1284.79 | bwd_inner_microstep: 1284.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2199 [2024-06-10 04:56:43,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.83 | bwd_microstep: 1016.45 | bwd_inner_microstep: 1016.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 04:56:45,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 04:56:47,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1256.07 | bwd_inner_microstep: 1256.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1880 [2024-06-10 04:56:48,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.54 | bwd_microstep: 758.97 | bwd_inner_microstep: 758.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 04:56:50,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.21 | bwd_microstep: 1479.99 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970 [2024-06-10 04:56:51,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.41 | bwd_microstep: 889.82 | bwd_inner_microstep: 889.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 04:56:53,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1529.70 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527 [2024-06-10 04:56:55,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1323.87 | bwd_inner_microstep: 1323.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 04:56:57,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1277.16 | bwd_inner_microstep: 1277.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3464 [2024-06-10 04:56:59,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1213.97 | bwd_inner_microstep: 1213.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 04:57:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1400.94 | bwd_inner_microstep: 1400.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 04:57:02,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1405.96 | bwd_inner_microstep: 1405.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 04:57:04,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.16 | bwd_microstep: 1462.30 | bwd_inner_microstep: 1462.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091 [2024-06-10 04:57:06,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.23 | bwd_microstep: 921.30 | bwd_inner_microstep: 921.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 04:57:08,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3771 [2024-06-10 04:57:10,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.75 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768 [2024-06-10 04:57:12,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.62 | bwd_microstep: 1445.20 | bwd_inner_microstep: 1445.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-10 04:57:13,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.90 | bwd_microstep: 1302.54 | bwd_inner_microstep: 1302.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3828 [2024-06-10 04:57:16,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.18 | bwd_microstep: 1692.84 | bwd_inner_microstep: 1692.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593 [2024-06-10 04:57:18,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1369.85 | bwd_inner_microstep: 1369.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 04:57:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 04:57:26,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 04:57:26,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 5657.07 | bwd_inner_microstep: 2009.91 | bwd_allreduce_microstep: 3647.11 | step_microstep: 38.72 [2024-06-10 04:57:26,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15454.90 | bwd: 45158.94 | bwd_inner: 41510.85 | bwd_allreduce: 3647.39 | step: 40.32 {'loss': 1.3045, 'learning_rate': 3.868903920419364e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3412 [2024-06-10 04:57:28,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.69 | bwd_microstep: 1440.51 | bwd_inner_microstep: 1440.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677 [2024-06-10 04:57:30,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.46 | bwd_microstep: 1625.24 | bwd_inner_microstep: 1625.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 04:57:32,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1347.24 | bwd_inner_microstep: 1347.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 04:57:34,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1278.66 | bwd_inner_microstep: 1278.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881 [2024-06-10 04:57:36,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.80 | bwd_microstep: 1582.76 | bwd_inner_microstep: 1582.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 04:57:37,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.74 | bwd_microstep: 1151.33 | bwd_inner_microstep: 1151.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-10 04:57:39,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1440.18 | bwd_inner_microstep: 1440.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 04:57:41,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 04:57:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-10 04:57:45,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.24 | bwd_microstep: 1424.52 | bwd_inner_microstep: 1424.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 04:57:47,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1255.04 | bwd_inner_microstep: 1255.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 04:57:49,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.48 | bwd_microstep: 1187.26 | bwd_inner_microstep: 1187.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958 [2024-06-10 04:57:50,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.01 | bwd_microstep: 830.20 | bwd_inner_microstep: 830.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-10 04:57:52,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.76 | bwd_microstep: 1613.02 | bwd_inner_microstep: 1612.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896 [2024-06-10 04:57:53,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 809.91 | bwd_inner_microstep: 809.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 04:57:55,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 04:57:57,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1486.24 | bwd_inner_microstep: 1486.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517 [2024-06-10 04:57:59,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1580.93 | bwd_inner_microstep: 1580.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3412 [2024-06-10 04:58:01,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.91 | bwd_microstep: 1471.20 | bwd_inner_microstep: 1471.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290 [2024-06-10 04:58:03,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.43 | bwd_microstep: 1070.04 | bwd_inner_microstep: 1070.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 04:58:05,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1486.03 | bwd_inner_microstep: 1486.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 04:58:06,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.12 | bwd_microstep: 980.29 | bwd_inner_microstep: 980.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607 [2024-06-10 04:58:08,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.38 | bwd_microstep: 1572.00 | bwd_inner_microstep: 1571.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3812 [2024-06-10 04:58:11,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.95 | bwd_microstep: 1823.19 | bwd_inner_microstep: 1823.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386 [2024-06-10 04:58:13,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.14 | bwd_microstep: 1437.65 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 04:58:15,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1405.93 | bwd_inner_microstep: 1405.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 04:58:17,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1355.64 | bwd_inner_microstep: 1355.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774 [2024-06-10 04:58:19,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.48 | bwd_microstep: 1744.77 | bwd_inner_microstep: 1744.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 04:58:21,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1496.49 | bwd_inner_microstep: 1496.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-10 04:58:23,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.41 | bwd_microstep: 1639.09 | bwd_inner_microstep: 1639.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184 [2024-06-10 04:58:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.27 | bwd_microstep: 856.86 | bwd_inner_microstep: 856.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3580 [2024-06-10 04:58:27,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 04:58:27,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.45 | bwd_microstep: 2107.53 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 764.18 | step_microstep: 38.44 [2024-06-10 04:58:27,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16322.69 | bwd: 44716.92 | bwd_inner: 43951.81 | bwd_allreduce: 764.41 | step: 40.00 14%|█▍ | 241/1726 [4:14:33<25:06:57, 60.89s/it] 14%|█▍ | 242/1726 [4:15:34<25:07:17, 60.94s/it] 14%|█▍ | 242/1726 [4:15:34<25:07:17, 60.94s/it] 14%|█▍ | 243/1726 [4:16:36<25:12:33, 61.20s/it] 14%|█▍ | 243/1726 [4:16:36<25:12:33, 61.20s/it] 14%|█▍ | 244/1726 [4:17:36<25:02:59, 60.85s/it] 14%|█▍ | 244/1726 [4:17:36<25:02:59, 60.85s/it] 14%|█▍ | 245/1726 [4:19:02<28:04:42, 68.25s/it] 14%|█▍ | 245/1726 [4:19:02<28:04:42, 68.25s/it] 14%|█▍ | 246/1726 [4:20:03<27:09:34, 66.06s/it] 14%|█▍ | 246/1726 [4:20:03<27:09:34, 66.06s/it] 14%|█▍ | 247/1726 [4:21:04<26:33:55, {'loss': 1.3099, 'learning_rate': 3.867564086121519e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 04:58:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1481.69 | bwd_inner_microstep: 1481.63 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 04:58:31,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.87 | bwd_microstep: 1281.06 | bwd_inner_microstep: 1281.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480 [2024-06-10 04:58:33,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.72 | bwd_microstep: 1314.83 | bwd_inner_microstep: 1314.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 04:58:35,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1486.17 | bwd_inner_microstep: 1486.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410 [2024-06-10 04:58:36,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.25 | bwd_microstep: 1150.36 | bwd_inner_microstep: 1150.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 04:58:38,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 04:58:40,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.31 | bwd_microstep: 1393.80 | bwd_inner_microstep: 1393.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-10 04:58:41,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 801.09 | bwd_inner_microstep: 801.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 04:58:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1350.51 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 04:58:45,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.58 | bwd_microstep: 1249.08 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 04:58:47,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1279.04 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3673 [2024-06-10 04:58:49,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.66 | bwd_microstep: 1771.42 | bwd_inner_microstep: 1771.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-10 04:58:51,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 04:58:53,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.99 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651 [2024-06-10 04:58:56,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.17 | bwd_microstep: 1714.82 | bwd_inner_microstep: 1714.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3647 [2024-06-10 04:58:58,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.11 | bwd_microstep: 1713.36 | bwd_inner_microstep: 1713.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927 [2024-06-10 04:58:59,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.96 | bwd_microstep: 726.12 | bwd_inner_microstep: 726.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 04:59:01,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1288.96 | bwd_inner_microstep: 1288.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 04:59:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1660.45 | bwd_inner_microstep: 1660.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-10 04:59:05,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1427.81 | bwd_inner_microstep: 1427.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2003 [2024-06-10 04:59:06,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.03 | bwd_microstep: 710.90 | bwd_inner_microstep: 710.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2015 [2024-06-10 04:59:07,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.14 | bwd_microstep: 744.45 | bwd_inner_microstep: 744.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047 [2024-06-10 04:59:08,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 813.75 | bwd_inner_microstep: 813.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 04:59:09,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 880.32 | bwd_inner_microstep: 880.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 04:59:12,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1560.12 | bwd_inner_microstep: 1560.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 905 [2024-06-10 04:59:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 137.56 | bwd_microstep: 346.79 | bwd_inner_microstep: 346.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2220 [2024-06-10 04:59:13,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.86 | bwd_microstep: 944.70 | bwd_inner_microstep: 944.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-10 04:59:16,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.96 | bwd_microstep: 1594.54 | bwd_inner_microstep: 1594.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 04:59:18,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.33 | bwd_microstep: 1473.27 | bwd_inner_microstep: 1473.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3569 [2024-06-10 04:59:20,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1540.82 | bwd_inner_microstep: 1540.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 04:59:22,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.30 | bwd_microstep: 1459.26 | bwd_inner_microstep: 1459.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3806 [2024-06-10 04:59:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 04:59:29,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 6418.13 | bwd_inner_microstep: 1849.21 | bwd_allreduce_microstep: 4568.86 | step_microstep: 38.98 [2024-06-10 04:59:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15378.98 | bwd: 45887.96 | bwd_inner: 41318.13 | bwd_allreduce: 4569.12 | step: 40.67 {'loss': 1.3131, 'learning_rate': 3.8662176742726706e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 04:59:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1468.60 | bwd_inner_microstep: 1468.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2953 [2024-06-10 04:59:33,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.45 | bwd_microstep: 1266.18 | bwd_inner_microstep: 1266.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841 [2024-06-10 04:59:35,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1660.09 | bwd_inner_microstep: 1660.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 04:59:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 04:59:38,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 04:59:40,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1480.38 | bwd_inner_microstep: 1480.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 04:59:42,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 04:59:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.39 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-10 04:59:46,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1155.37 | bwd_inner_microstep: 1155.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 04:59:47,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.04 | bwd_microstep: 1151.07 | bwd_inner_microstep: 1151.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 04:59:49,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1392.46 | bwd_inner_microstep: 1392.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 04:59:51,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2895 [2024-06-10 04:59:53,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.76 | bwd_microstep: 999.31 | bwd_inner_microstep: 999.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984 [2024-06-10 04:59:54,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.17 | bwd_microstep: 833.48 | bwd_inner_microstep: 833.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510 [2024-06-10 04:59:56,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1412.46 | bwd_inner_microstep: 1412.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 04:59:58,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1492.71 | bwd_inner_microstep: 1492.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2130 [2024-06-10 04:59:59,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.24 | bwd_microstep: 993.81 | bwd_inner_microstep: 993.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2080 [2024-06-10 05:00:00,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 791.20 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498 [2024-06-10 05:00:02,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1418.03 | bwd_inner_microstep: 1418.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 05:00:04,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.29 | bwd_microstep: 1403.91 | bwd_inner_microstep: 1403.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761 [2024-06-10 05:00:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1346.00 | bwd_inner_microstep: 1345.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922 [2024-06-10 05:00:07,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.04 | bwd_microstep: 697.27 | bwd_inner_microstep: 697.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695 [2024-06-10 05:00:09,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1333.55 | bwd_inner_microstep: 1333.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3785 [2024-06-10 05:00:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1509.57 | bwd_inner_microstep: 1509.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-10 05:00:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.44 | bwd_microstep: 1331.06 | bwd_inner_microstep: 1331.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 05:00:15,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-10 05:00:17,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1422.38 | bwd_inner_microstep: 1422.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 05:00:19,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3776 [2024-06-10 05:00:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.99 | bwd_microstep: 1616.38 | bwd_inner_microstep: 1616.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812 [2024-06-10 05:00:23,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 1624.17 | bwd_inner_microstep: 1624.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3802 [2024-06-10 05:00:25,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.15 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 05:00:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 05:00:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.31 | bwd_microstep: 3264.22 | bwd_inner_microstep: 1651.50 | bwd_allreduce_microstep: 1612.67 | step_microstep: 41.09 [2024-06-10 05:00:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15881.65 | bwd: 44064.72 | bwd_inner: 42451.14 | bwd_allreduce: 1612.90 | step: 42.74 {'loss': 1.302, 'learning_rate': 3.864864689614875e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 05:00:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.97 | bwd_microstep: 1475.62 | bwd_inner_microstep: 1475.42 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1481 [2024-06-10 05:00:32,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 193.80 | bwd_microstep: 489.15 | bwd_inner_microstep: 489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1862 [2024-06-10 05:00:33,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.66 | bwd_microstep: 739.08 | bwd_inner_microstep: 739.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 05:00:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-10 05:00:37,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.83 | bwd_microstep: 1444.49 | bwd_inner_microstep: 1444.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 05:00:39,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.56 | bwd_microstep: 1389.76 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-10 05:00:41,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.50 | bwd_microstep: 1431.12 | bwd_inner_microstep: 1431.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 05:00:43,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.10 | bwd_microstep: 1633.48 | bwd_inner_microstep: 1633.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 05:00:45,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.13 | bwd_inner_microstep: 1385.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1947 [2024-06-10 05:00:46,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 762.26 | bwd_inner_microstep: 762.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2652 [2024-06-10 05:00:48,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.97 | bwd_microstep: 1216.38 | bwd_inner_microstep: 1216.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2160 [2024-06-10 05:00:49,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.19 | bwd_microstep: 853.10 | bwd_inner_microstep: 853.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 05:00:51,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.38 | bwd_microstep: 1325.58 | bwd_inner_microstep: 1325.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2633 [2024-06-10 05:00:52,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.96 | bwd_microstep: 1016.40 | bwd_inner_microstep: 1016.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1992 [2024-06-10 05:00:53,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.06 | bwd_microstep: 830.11 | bwd_inner_microstep: 830.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3702 [2024-06-10 05:00:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.54 | bwd_microstep: 1728.78 | bwd_inner_microstep: 1728.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657 [2024-06-10 05:00:58,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.42 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3646 [2024-06-10 05:01:00,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1664.37 | bwd_inner_microstep: 1664.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 05:01:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834 [2024-06-10 05:01:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1391.34 | bwd_inner_microstep: 1391.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 05:01:06,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.30 | bwd_microstep: 1514.80 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2002 [2024-06-10 05:01:07,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.22 | bwd_microstep: 740.12 | bwd_inner_microstep: 740.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 05:01:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1379.12 | bwd_inner_microstep: 1379.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-10 05:01:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-10 05:01:13,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.28 | bwd_microstep: 1172.62 | bwd_inner_microstep: 1172.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665 [2024-06-10 05:01:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1327.21 | bwd_inner_microstep: 1327.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683 [2024-06-10 05:01:16,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1330.94 | bwd_inner_microstep: 1330.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2705 [2024-06-10 05:01:18,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.23 | bwd_microstep: 1167.48 | bwd_inner_microstep: 1167.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2281 [2024-06-10 05:01:19,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.91 | bwd_microstep: 961.49 | bwd_inner_microstep: 961.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263 [2024-06-10 05:01:21,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.16 | bwd_microstep: 977.84 | bwd_inner_microstep: 977.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607 [2024-06-10 05:01:23,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1432.90 | bwd_inner_microstep: 1432.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 05:01:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.56 [2024-06-10 05:01:32,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.11 | bwd_microstep: 8527.46 | bwd_inner_microstep: 1811.97 | bwd_allreduce_microstep: 6715.44 | step_microstep: 38.75 [2024-06-10 05:01:32,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15135.01 | bwd: 47322.05 | bwd_inner: 40605.53 | bwd_allreduce: 6715.75 | step: 40.45 {'loss': 1.3133, 'learning_rate': 3.863505136913337e-05, 'epoch': 0.14} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 05:01:34,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1492.57 | bwd_inner_microstep: 1492.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-10 05:01:35,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.53 | bwd_microstep: 676.13 | bwd_inner_microstep: 676.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3861 [2024-06-10 05:01:37,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351 [2024-06-10 05:01:38,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.16 | bwd_microstep: 984.52 | bwd_inner_microstep: 984.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791 [2024-06-10 05:01:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1543.61 | bwd_inner_microstep: 1543.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2205 [2024-06-10 05:01:42,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.68 | bwd_microstep: 888.73 | bwd_inner_microstep: 888.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 05:01:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 05:01:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1343.11 | bwd_inner_microstep: 1343.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-10 05:01:46,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.78 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 05:01:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.61 | bwd_microstep: 1387.77 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 05:01:49,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.45 | bwd_microstep: 1302.41 | bwd_inner_microstep: 1302.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666 [2024-06-10 05:01:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.95 | bwd_microstep: 1588.03 | bwd_inner_microstep: 1588.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3670 [2024-06-10 05:01:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.02 | bwd_microstep: 1789.28 | bwd_inner_microstep: 1789.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 05:01:56,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.58 | bwd_microstep: 1251.41 | bwd_inner_microstep: 1251.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615 [2024-06-10 05:01:58,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1472.68 | bwd_inner_microstep: 1472.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 05:02:00,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.69 | bwd_microstep: 1493.37 | bwd_inner_microstep: 1493.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 05:02:02,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1386.84 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 05:02:04,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.84 | bwd_microstep: 1428.44 | bwd_inner_microstep: 1428.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3705 [2024-06-10 05:02:06,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.60 | bwd_microstep: 1728.92 | bwd_inner_microstep: 1728.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967 [2024-06-10 05:02:07,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.00 | bwd_microstep: 734.77 | bwd_inner_microstep: 734.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000 [2024-06-10 05:02:08,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.92 | bwd_microstep: 772.19 | bwd_inner_microstep: 772.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 05:02:10,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1351.78 | bwd_inner_microstep: 1351.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942 [2024-06-10 05:02:11,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.27 | bwd_microstep: 698.88 | bwd_inner_microstep: 698.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 05:02:13,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1292.78 | bwd_inner_microstep: 1292.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 05:02:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1400.96 | bwd_inner_microstep: 1400.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 05:02:17,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1466.14 | bwd_inner_microstep: 1466.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 05:02:19,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 05:02:21,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1406.09 | bwd_inner_microstep: 1406.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3541 [2024-06-10 05:02:23,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1451.31 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-10 05:02:25,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.97 | bwd_microstep: 1644.26 | bwd_inner_microstep: 1644.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 05:02:27,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.08 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-10 05:02:32,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 05:02:32,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.41 | bwd_microstep: 4960.17 | bwd_inner_microstep: 992.43 | bwd_allreduce_microstep: 3967.69 | step_microstep: 38.73 [2024-06-10 05:02:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15265.91 | bwd: 44797.88 | bwd_inner: 40829.28 | bwd_allreduce: 3967.93 | step: 40.34 {'loss': 1.4019, 'learning_rate': 3.862139020956395e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 05:02:34,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1442.23 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450 [2024-06-10 05:02:36,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1415.20 | bwd_inner_microstep: 1415.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 05:02:38,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1552.06 | bwd_inner_microstep: 1552.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 05:02:40,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1493.89 | bwd_inner_microstep: 1493.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 05:02:42,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1344.62 | bwd_inner_microstep: 1344.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2222 [2024-06-10 05:02:44,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.23 | bwd_microstep: 863.11 | bwd_inner_microstep: 863.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 05:02:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1382.88 | bwd_inner_microstep: 1382.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 05:02:47,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.62 | bwd_microstep: 1359.65 | bwd_inner_microstep: 1359.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 05:02:49,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 05:02:51,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1312.55 | bwd_inner_microstep: 1312.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611 [2024-06-10 05:02:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 05:02:55,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1487.23 | bwd_inner_microstep: 1487.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2718 [2024-06-10 05:02:57,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1194.63 | bwd_inner_microstep: 1194.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627 [2024-06-10 05:02:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1709.24 | bwd_inner_microstep: 1709.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-10 05:03:01,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1155.18 | bwd_inner_microstep: 1155.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950 [2024-06-10 05:03:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 147.86 | bwd_microstep: 382.21 | bwd_inner_microstep: 382.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 05:03:03,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1392.93 | bwd_inner_microstep: 1392.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1970 [2024-06-10 05:03:04,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.33 | bwd_microstep: 703.94 | bwd_inner_microstep: 703.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 05:03:06,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1282.12 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3002 [2024-06-10 05:03:07,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.98 | bwd_microstep: 1111.46 | bwd_inner_microstep: 1111.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 05:03:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 793.66 | bwd_inner_microstep: 793.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 05:03:11,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1556.17 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 05:03:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 800.30 | bwd_inner_microstep: 800.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 05:03:14,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1375.89 | bwd_inner_microstep: 1375.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 05:03:15,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.76 | bwd_microstep: 1260.59 | bwd_inner_microstep: 1260.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606 [2024-06-10 05:03:18,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.38 | bwd_microstep: 1543.28 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615 [2024-06-10 05:03:20,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1549.74 | bwd_inner_microstep: 1549.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 05:03:22,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829 [2024-06-10 05:03:24,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.98 | bwd_microstep: 1758.83 | bwd_inner_microstep: 1758.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 05:03:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1280.64 | bwd_inner_microstep: 1280.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 05:03:28,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1494.76 | bwd_inner_microstep: 1494.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379 [2024-06-10 05:03:32,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 05:03:32,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.56 | bwd_microstep: 3527.64 | bwd_inner_microstep: 1443.92 | bwd_allreduce_microstep: 2083.67 | step_microstep: 38.65 [2024-06-10 05:03:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15517.69 | bwd: 43648.39 | bwd_inner: 41563.81 | bwd_allreduce: 2083.90 | step: 40.27 {'loss': 1.3044, 'learning_rate': 3.860766346555501e-05, 'epoch': 0.15} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 05:03:34,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-10 05:03:35,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.39 | bwd_microstep: 1309.37 | bwd_inner_microstep: 1309.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-10 05:03:38,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.29 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 05:03:39,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 793.28 | bwd_inner_microstep: 793.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479 [2024-06-10 05:03:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1331.81 | bwd_inner_microstep: 1331.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 05:03:42,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.41 | bwd_microstep: 1185.06 | bwd_inner_microstep: 1185.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3754 [2024-06-10 05:03:44,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1403.21 | bwd_inner_microstep: 1403.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974 [2024-06-10 05:03:45,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 766.68 | bwd_inner_microstep: 766.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697 [2024-06-10 05:03:47,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 05:03:49,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1248.32 | bwd_inner_microstep: 1248.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 05:03:51,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.11 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3831 [2024-06-10 05:03:53,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.49 | bwd_microstep: 1490.14 | bwd_inner_microstep: 1490.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3496 [2024-06-10 05:03:55,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.99 | bwd_microstep: 1435.60 | bwd_inner_microstep: 1435.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 05:03:57,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1486.31 | bwd_inner_microstep: 1486.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-10 05:03:59,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.92 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2650 [2024-06-10 05:04:00,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.04 | bwd_microstep: 1115.16 | bwd_inner_microstep: 1115.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 05:04:02,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1520.17 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 05:04:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.72 | bwd_microstep: 1624.74 | bwd_inner_microstep: 1624.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-10 05:04:07,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.67 | bwd_microstep: 1605.08 | bwd_inner_microstep: 1605.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2066 [2024-06-10 05:04:08,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.18 | bwd_microstep: 948.31 | bwd_inner_microstep: 948.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3488 [2024-06-10 05:04:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1595.29 | bwd_inner_microstep: 1595.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807 [2024-06-10 05:04:13,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.86 | bwd_microstep: 1756.25 | bwd_inner_microstep: 1756.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142 [2024-06-10 05:04:14,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.12 | bwd_microstep: 931.08 | bwd_inner_microstep: 931.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 05:04:15,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.20 | bwd_microstep: 803.89 | bwd_inner_microstep: 803.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-10 05:04:17,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1432.20 | bwd_inner_microstep: 1432.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 05:04:19,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1302.02 | bwd_inner_microstep: 1301.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-10 05:04:21,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.91 | bwd_microstep: 1626.64 | bwd_inner_microstep: 1626.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2223 [2024-06-10 05:04:23,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.73 | bwd_microstep: 961.91 | bwd_inner_microstep: 961.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767 [2024-06-10 05:04:25,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.20 | bwd_microstep: 1573.61 | bwd_inner_microstep: 1573.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 05:04:27,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1398.42 | bwd_inner_microstep: 1398.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-10 05:04:29,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 05:04:33,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.63 [2024-06-10 05:04:33,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.75 | bwd_microstep: 3771.34 | bwd_inner_microstep: 1618.22 | bwd_allreduce_microstep: 2153.07 | step_microstep: 38.76 [2024-06-10 05:04:33,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15902.35 | bwd: 44986.46 | bwd_inner: 42832.48 | bwd_allreduce: 2153.29 | step: 40.40 64.66s/it] 14%|█▍ | 247/1726 [4:21:04<26:33:55, 64.66s/it] 14%|█▍ | 248/1726 [4:22:06<26:10:21, 63.75s/it] 14%|█▍ | 248/1726 [4:22:06<26:10:21, 63.75s/it] 14%|█▍ | 249/1726 [4:23:06<25:43:47, 62.71s/it] 14%|█▍ | 249/1726 [4:23:06<25:43:47, 62.71s/it] 14%|█▍ | 250/1726 [4:24:09<25:43:25, 62.74s/it] 14%|█▍ | 250/1726 [4:24:09<25:43:25, 62.74s/it] 15%|█▍ | 251/1726 [4:25:09<25:25:09, 62.04s/it] 15%|█▍ | 251/1726 [4:25:09<25:25:09, 62.04s/it] 15%|█▍ | 252/1726 [4:26:09<25:05:28, 61.28s/it] 15%|█▍ | 252/1726 [4:26:09<25:05:28, 61.28s/it] 15%|█▍{'loss': 1.3452, 'learning_rate': 3.8593871185452074e-05, 'epoch': 0.15} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088 [2024-06-10 05:04:34,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.59 | bwd_microstep: 920.13 | bwd_inner_microstep: 920.02 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 05:04:36,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1492.81 | bwd_inner_microstep: 1492.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679 [2024-06-10 05:04:38,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1422.96 | bwd_inner_microstep: 1422.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 05:04:40,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 05:04:41,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.20 | bwd_microstep: 679.52 | bwd_inner_microstep: 679.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 05:04:43,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.14 | bwd_microstep: 1219.54 | bwd_inner_microstep: 1219.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715 [2024-06-10 05:04:45,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.67 | bwd_microstep: 1463.24 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884 [2024-06-10 05:04:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.05 | bwd_microstep: 712.81 | bwd_inner_microstep: 712.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:04:48,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1389.35 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-10 05:04:50,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3488 [2024-06-10 05:04:52,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.57 | bwd_microstep: 1576.22 | bwd_inner_microstep: 1576.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 05:04:54,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 05:04:56,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1345.29 | bwd_inner_microstep: 1345.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514 [2024-06-10 05:04:58,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.87 | bwd_microstep: 1587.92 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-10 05:05:00,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.70 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-10 05:05:02,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.45 | bwd_microstep: 1457.29 | bwd_inner_microstep: 1457.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526 [2024-06-10 05:05:04,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.67 | bwd_microstep: 1199.69 | bwd_inner_microstep: 1199.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 05:05:06,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1490.14 | bwd_inner_microstep: 1490.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 05:05:08,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1454.05 | bwd_inner_microstep: 1454.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 05:05:10,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.18 | bwd_microstep: 1560.50 | bwd_inner_microstep: 1560.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 05:05:12,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.56 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1918 [2024-06-10 05:05:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.82 | bwd_microstep: 720.08 | bwd_inner_microstep: 720.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 05:05:15,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1423.88 | bwd_inner_microstep: 1423.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1148 [2024-06-10 05:05:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 167.26 | bwd_microstep: 430.40 | bwd_inner_microstep: 430.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 05:05:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1314.08 | bwd_inner_microstep: 1314.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 05:05:19,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.62 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1906 [2024-06-10 05:05:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.16 | bwd_microstep: 757.50 | bwd_inner_microstep: 757.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3422 [2024-06-10 05:05:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.18 | bwd_microstep: 1411.01 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575 [2024-06-10 05:05:24,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1460.61 | bwd_inner_microstep: 1460.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493 [2024-06-10 05:05:26,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.27 | bwd_microstep: 1578.19 | bwd_inner_microstep: 1578.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 05:05:28,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429 [2024-06-10 05:05:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 05:05:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 8121.91 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 6554.93 | step_microstep: 38.75 [2024-06-10 05:05:37,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15391.41 | bwd: 47767.68 | bwd_inner: 41211.74 | bwd_allreduce: 6555.20 | step: 40.41 {'loss': 1.3368, 'learning_rate': 3.858001341783149e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637 [2024-06-10 05:05:39,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1502.50 | bwd_inner_microstep: 1502.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 05:05:40,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470 [2024-06-10 05:05:42,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.75 | bwd_microstep: 1439.39 | bwd_inner_microstep: 1439.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 05:05:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 05:05:46,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.01 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3837 [2024-06-10 05:05:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.99 | bwd_microstep: 1629.86 | bwd_inner_microstep: 1629.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091 [2024-06-10 05:05:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 822.87 | bwd_inner_microstep: 822.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-10 05:05:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.59 | bwd_microstep: 1529.09 | bwd_inner_microstep: 1529.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 05:05:54,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.44 | bwd_microstep: 1484.78 | bwd_inner_microstep: 1484.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 05:05:56,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1346.17 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-10 05:05:58,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.92 | bwd_microstep: 1617.15 | bwd_inner_microstep: 1617.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 05:06:00,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864 [2024-06-10 05:06:01,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.15 | bwd_microstep: 707.94 | bwd_inner_microstep: 707.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707 [2024-06-10 05:06:03,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.29 | bwd_microstep: 1724.61 | bwd_inner_microstep: 1724.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793 [2024-06-10 05:06:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.74 | bwd_microstep: 1643.11 | bwd_inner_microstep: 1643.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 05:06:07,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3670 [2024-06-10 05:06:09,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1659.90 | bwd_inner_microstep: 1659.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515 [2024-06-10 05:06:11,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1420.11 | bwd_inner_microstep: 1420.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3633 [2024-06-10 05:06:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1473.34 | bwd_inner_microstep: 1473.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3103 [2024-06-10 05:06:15,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 05:06:17,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1496.17 | bwd_inner_microstep: 1496.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 05:06:19,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.96 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-10 05:06:21,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1360.12 | bwd_inner_microstep: 1360.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865 [2024-06-10 05:06:23,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1479.55 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046 [2024-06-10 05:06:24,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.30 | bwd_microstep: 910.90 | bwd_inner_microstep: 910.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-10 05:06:26,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.57 | bwd_microstep: 1186.91 | bwd_inner_microstep: 1186.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 05:06:28,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 05:06:30,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.29 | bwd_microstep: 1284.43 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3719 [2024-06-10 05:06:32,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.11 | bwd_microstep: 1300.44 | bwd_inner_microstep: 1300.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767 [2024-06-10 05:06:34,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.39 | bwd_microstep: 1533.65 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 05:06:36,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1394.76 | bwd_inner_microstep: 1394.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 05:06:40,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 05:06:40,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 3478.61 | bwd_inner_microstep: 1554.67 | bwd_allreduce_microstep: 1923.88 | step_microstep: 38.86 [2024-06-10 05:06:40,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16630.14 | bwd: 46553.38 | bwd_inner: 44628.58 | bwd_allreduce: 1924.11 | step: 40.63 {'loss': 1.3071, 'learning_rate': 3.856609021150022e-05, 'epoch': 0.15} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 05:06:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1340.39 | bwd_inner_microstep: 1340.27 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 05:06:44,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1349.37 | bwd_inner_microstep: 1349.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3914 [2024-06-10 05:06:46,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.78 | bwd_microstep: 1691.53 | bwd_inner_microstep: 1691.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 05:06:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.18 | bwd_microstep: 1295.37 | bwd_inner_microstep: 1295.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742 [2024-06-10 05:06:50,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.33 | bwd_microstep: 1534.59 | bwd_inner_microstep: 1534.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 05:06:52,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 05:06:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.85 | bwd_microstep: 1342.57 | bwd_inner_microstep: 1342.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 05:06:56,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 05:06:58,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1250.06 | bwd_inner_microstep: 1250.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3197 [2024-06-10 05:06:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.40 | bwd_microstep: 1170.94 | bwd_inner_microstep: 1170.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483 [2024-06-10 05:07:01,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.58 | bwd_microstep: 1444.37 | bwd_inner_microstep: 1444.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3705 [2024-06-10 05:07:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.18 | bwd_microstep: 1725.22 | bwd_inner_microstep: 1725.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3668 [2024-06-10 05:07:06,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.17 | bwd_microstep: 1719.46 | bwd_inner_microstep: 1719.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 05:07:08,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1486.61 | bwd_inner_microstep: 1486.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122 [2024-06-10 05:07:09,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.44 | bwd_microstep: 827.33 | bwd_inner_microstep: 827.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427 [2024-06-10 05:07:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676 [2024-06-10 05:07:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1430.64 | bwd_inner_microstep: 1430.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 05:07:15,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1387.65 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825 [2024-06-10 05:07:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1482.24 | bwd_inner_microstep: 1482.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 05:07:19,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1511.34 | bwd_inner_microstep: 1511.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 05:07:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1460.70 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612 [2024-06-10 05:07:23,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1312.96 | bwd_inner_microstep: 1312.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 05:07:24,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1250.07 | bwd_inner_microstep: 1250.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 05:07:26,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.70 | bwd_microstep: 879.66 | bwd_inner_microstep: 879.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 05:07:28,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1475.65 | bwd_inner_microstep: 1475.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639 [2024-06-10 05:07:30,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1414.56 | bwd_inner_microstep: 1414.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463 [2024-06-10 05:07:31,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1316.88 | bwd_inner_microstep: 1316.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2186 [2024-06-10 05:07:33,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 956.69 | bwd_inner_microstep: 956.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005 [2024-06-10 05:07:34,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.04 | bwd_microstep: 831.70 | bwd_inner_microstep: 831.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 05:07:36,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 05:07:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1396.67 | bwd_inner_microstep: 1396.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3786 [2024-06-10 05:07:42,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 05:07:42,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 3852.58 | bwd_inner_microstep: 1591.96 | bwd_allreduce_microstep: 2260.56 | step_microstep: 38.70 [2024-06-10 05:07:42,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.72 | bwd: 45571.87 | bwd_inner: 43310.30 | bwd_allreduce: 2260.86 | step: 40.31 {'loss': 1.3265, 'learning_rate': 3.8552101615495755e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 05:07:44,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.81 | bwd_microstep: 1466.02 | bwd_inner_microstep: 1465.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 05:07:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1343.96 | bwd_inner_microstep: 1343.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3886 [2024-06-10 05:07:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1385.65 | bwd_inner_microstep: 1385.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3793 [2024-06-10 05:07:50,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1478.43 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 05:07:52,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.98 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 05:07:54,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1386.58 | bwd_inner_microstep: 1386.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 05:07:55,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.49 | bwd_microstep: 680.29 | bwd_inner_microstep: 680.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046 [2024-06-10 05:07:56,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.99 | bwd_microstep: 810.53 | bwd_inner_microstep: 810.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759 [2024-06-10 05:07:58,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1639.59 | bwd_inner_microstep: 1639.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 05:08:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.79 | bwd_microstep: 1156.78 | bwd_inner_microstep: 1156.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3479 [2024-06-10 05:08:02,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.50 | bwd_microstep: 1424.26 | bwd_inner_microstep: 1424.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 05:08:03,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.35 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-10 05:08:05,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1420.75 | bwd_inner_microstep: 1420.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662 [2024-06-10 05:08:07,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.89 | bwd_microstep: 1420.06 | bwd_inner_microstep: 1420.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 05:08:09,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.05 | bwd_microstep: 1514.42 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997 [2024-06-10 05:08:10,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.58 | bwd_microstep: 835.21 | bwd_inner_microstep: 835.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518 [2024-06-10 05:08:12,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1420.43 | bwd_inner_microstep: 1420.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-10 05:08:14,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.95 | bwd_microstep: 1306.69 | bwd_inner_microstep: 1306.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3631 [2024-06-10 05:08:16,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.62 | bwd_microstep: 1251.38 | bwd_inner_microstep: 1251.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540 [2024-06-10 05:08:18,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.66 | bwd_microstep: 1329.75 | bwd_inner_microstep: 1329.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 05:08:20,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 05:08:22,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1509.77 | bwd_inner_microstep: 1509.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 05:08:24,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1401.84 | bwd_inner_microstep: 1401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 05:08:26,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1496.23 | bwd_inner_microstep: 1496.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 05:08:28,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.98 | bwd_microstep: 1663.01 | bwd_inner_microstep: 1662.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3730 [2024-06-10 05:08:30,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1339.56 | bwd_inner_microstep: 1339.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 05:08:32,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1255.58 | bwd_inner_microstep: 1255.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 05:08:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.52 | bwd_microstep: 1556.78 | bwd_inner_microstep: 1556.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 05:08:36,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.93 | bwd_microstep: 1341.75 | bwd_inner_microstep: 1341.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3554 [2024-06-10 05:08:38,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.19 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 05:08:40,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 05:08:45,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 05:08:45,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 4410.51 | bwd_inner_microstep: 1682.62 | bwd_allreduce_microstep: 2727.84 | step_microstep: 38.71 [2024-06-10 05:08:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16217.49 | bwd: 46093.78 | bwd_inner: 43364.98 | bwd_allreduce: 2728.09 | step: 40.29 {'loss': 1.2914, 'learning_rate': 3.853804767908584e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 05:08:47,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1469.34 | bwd_inner_microstep: 1469.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 05:08:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412 [2024-06-10 05:08:51,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2320 [2024-06-10 05:08:52,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.91 | bwd_microstep: 981.35 | bwd_inner_microstep: 981.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 05:08:54,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1350.68 | bwd_inner_microstep: 1350.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 05:08:56,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1282.13 | bwd_inner_microstep: 1282.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3723 [2024-06-10 05:08:58,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.51 | bwd_microstep: 1840.42 | bwd_inner_microstep: 1840.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-10 05:08:59,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 797.96 | bwd_inner_microstep: 797.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 05:09:01,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.68 | bwd_microstep: 1302.81 | bwd_inner_microstep: 1302.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:09:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1391.69 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694 [2024-06-10 05:09:05,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1424.46 | bwd_inner_microstep: 1424.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 05:09:06,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.42 | bwd_microstep: 799.45 | bwd_inner_microstep: 799.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-10 05:09:08,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2993 [2024-06-10 05:09:10,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.71 | bwd_microstep: 1203.07 | bwd_inner_microstep: 1203.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 05:09:11,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1393.91 | bwd_inner_microstep: 1393.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2942 [2024-06-10 05:09:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.33 | bwd_microstep: 1098.19 | bwd_inner_microstep: 1098.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902 [2024-06-10 05:09:15,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.00 | bwd_microstep: 1685.25 | bwd_inner_microstep: 1685.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-10 05:09:18,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1577.63 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3604 [2024-06-10 05:09:19,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.67 | bwd_microstep: 1321.51 | bwd_inner_microstep: 1321.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 05:09:21,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.98 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532 [2024-06-10 05:09:23,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.44 | bwd_microstep: 1329.00 | bwd_inner_microstep: 1328.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 05:09:25,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 05:09:26,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.36 | bwd_microstep: 805.92 | bwd_inner_microstep: 805.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2274 [2024-06-10 05:09:27,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.41 | bwd_microstep: 910.08 | bwd_inner_microstep: 910.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623 [2024-06-10 05:09:29,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1315.72 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 05:09:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.42 | bwd_microstep: 881.87 | bwd_inner_microstep: 881.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3641 [2024-06-10 05:09:32,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1667.00 | bwd_inner_microstep: 1666.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3522 [2024-06-10 05:09:34,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.57 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2272 [2024-06-10 05:09:36,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.58 | bwd_microstep: 934.35 | bwd_inner_microstep: 934.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3471 [2024-06-10 05:09:38,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1533.16 | bwd_inner_microstep: 1533.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 05:09:40,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 05:09:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 05:09:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 5744.62 | bwd_inner_microstep: 1753.18 | bwd_allreduce_microstep: 3991.38 | step_microstep: 38.66 [2024-06-10 05:09:46,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15469.14 | bwd: 45445.26 | bwd_inner: 41452.93 | bwd_allreduce: 3991.62 | step: 40.30 {'loss': 1.3321, 'learning_rate': 3.852392845176837e-05, 'epoch': 0.15} | 253/1726 [4:27:10<25:04:06, 61.27s/it] 15%|█▍ | 253/1726 [4:27:10<25:04:06, 61.27s/it] 15%|█▍ | 254/1726 [4:28:13<25:19:30, 61.94s/it] 15%|█▍ | 254/1726 [4:28:13<25:19:30, 61.94s/it] 15%|█▍ | 255/1726 [4:29:17<25:30:20, 62.42s/it] 15%|█▍ | 255/1726 [4:29:17<25:30:20, 62.42s/it] 15%|█▍ | 256/1726 [4:30:19<25:26:51, 62.32s/it] 15%|█▍ | 256/1726 [4:30:19<25:26:51, 62.32s/it] 15%|█▍ | 257/1726 [4:31:22<25:28:17, 62.42s/it] 15%|█▍ | 257/1726 [4:31:22<25:28:17, 62.42s/it] 15%|█▍ | 258/1726 [4:32:23<25:18:46, 62.08s/it] 15%|█▍ | 258/1726 [4:dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459 [2024-06-10 05:09:48,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.36 | bwd_microstep: 1568.12 | bwd_inner_microstep: 1568.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 05:09:50,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.36 | bwd_microstep: 1255.19 | bwd_inner_microstep: 1255.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3861 [2024-06-10 05:09:52,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1492.65 | bwd_inner_microstep: 1492.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 05:09:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.73 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998 [2024-06-10 05:09:55,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.50 | bwd_microstep: 738.12 | bwd_inner_microstep: 738.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 05:09:57,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 05:09:59,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3719 [2024-06-10 05:10:00,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.39 | bwd_microstep: 1273.70 | bwd_inner_microstep: 1273.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755 [2024-06-10 05:10:02,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.71 | bwd_microstep: 1472.66 | bwd_inner_microstep: 1472.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 05:10:04,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.99 | bwd_microstep: 1153.73 | bwd_inner_microstep: 1153.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 05:10:06,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.80 | bwd_microstep: 1629.07 | bwd_inner_microstep: 1629.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 05:10:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1384.49 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 05:10:10,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1483.73 | bwd_inner_microstep: 1483.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551 [2024-06-10 05:10:12,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.61 | bwd_microstep: 1234.41 | bwd_inner_microstep: 1234.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508 [2024-06-10 05:10:14,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.38 | bwd_microstep: 1221.51 | bwd_inner_microstep: 1221.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 05:10:15,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1289.11 | bwd_inner_microstep: 1289.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2411 [2024-06-10 05:10:17,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.94 | bwd_microstep: 909.79 | bwd_inner_microstep: 909.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000 [2024-06-10 05:10:18,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 804.83 | bwd_inner_microstep: 804.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 05:10:19,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.20 | bwd_microstep: 1158.75 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2160 [2024-06-10 05:10:20,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.34 | bwd_microstep: 759.77 | bwd_inner_microstep: 759.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 05:10:22,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.97 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2126 [2024-06-10 05:10:23,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.92 | bwd_microstep: 832.75 | bwd_inner_microstep: 832.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470 [2024-06-10 05:10:25,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.77 | bwd_microstep: 955.16 | bwd_inner_microstep: 955.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 05:10:26,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.54 | bwd_microstep: 730.50 | bwd_inner_microstep: 730.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-10 05:10:28,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.18 | bwd_microstep: 1513.17 | bwd_inner_microstep: 1513.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 05:10:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1280.91 | bwd_inner_microstep: 1280.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1908 [2024-06-10 05:10:31,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.43 | bwd_microstep: 780.39 | bwd_inner_microstep: 780.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 05:10:32,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 804.87 | bwd_inner_microstep: 804.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-10 05:10:34,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.65 | bwd_microstep: 1603.33 | bwd_inner_microstep: 1603.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 05:10:36,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.76 | bwd_microstep: 1358.70 | bwd_inner_microstep: 1358.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650 [2024-06-10 05:10:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.74 | bwd_microstep: 1612.67 | bwd_inner_microstep: 1612.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-10 05:10:46,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 05:10:46,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 7624.04 | bwd_inner_microstep: 1811.54 | bwd_allreduce_microstep: 5812.45 | step_microstep: 38.69 [2024-06-10 05:10:46,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14741.75 | bwd: 45133.58 | bwd_inner: 39320.23 | bwd_allreduce: 5812.68 | step: 40.43 {'loss': 1.2924, 'learning_rate': 3.8509743983271196e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 05:10:48,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1470.59 | bwd_inner_microstep: 1470.51 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2380 [2024-06-10 05:10:50,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.18 | bwd_microstep: 1059.14 | bwd_inner_microstep: 1059.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 05:10:52,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1340.94 | bwd_inner_microstep: 1340.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 05:10:54,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1483.69 | bwd_inner_microstep: 1483.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 05:10:56,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 05:10:57,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.03 | bwd_microstep: 1151.07 | bwd_inner_microstep: 1151.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 05:10:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1385.55 | bwd_inner_microstep: 1385.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 05:11:01,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.20 | bwd_microstep: 1499.04 | bwd_inner_microstep: 1499.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 05:11:03,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958 [2024-06-10 05:11:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.26 | bwd_microstep: 826.72 | bwd_inner_microstep: 826.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2295 [2024-06-10 05:11:05,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.55 | bwd_microstep: 1007.98 | bwd_inner_microstep: 1007.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-10 05:11:07,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977 [2024-06-10 05:11:09,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.89 | bwd_microstep: 830.62 | bwd_inner_microstep: 830.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 05:11:11,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1477.22 | bwd_inner_microstep: 1477.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 05:11:13,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1408.50 | bwd_inner_microstep: 1408.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1965 [2024-06-10 05:11:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.24 | bwd_microstep: 747.80 | bwd_inner_microstep: 747.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3552 [2024-06-10 05:11:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1279.84 | bwd_inner_microstep: 1279.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 05:11:18,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1608.09 | bwd_inner_microstep: 1608.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 05:11:20,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.99 | bwd_microstep: 1418.17 | bwd_inner_microstep: 1418.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3552 [2024-06-10 05:11:21,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1204.23 | bwd_inner_microstep: 1204.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-10 05:11:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.82 | bwd_microstep: 811.42 | bwd_inner_microstep: 811.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-10 05:11:24,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1412.58 | bwd_inner_microstep: 1412.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-10 05:11:26,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-10 05:11:29,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.77 | bwd_microstep: 1519.15 | bwd_inner_microstep: 1519.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 05:11:31,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1556.60 | bwd_inner_microstep: 1556.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822 [2024-06-10 05:11:33,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1418.66 | bwd_inner_microstep: 1418.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3541 [2024-06-10 05:11:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.73 | bwd_microstep: 1260.89 | bwd_inner_microstep: 1260.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2242 [2024-06-10 05:11:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.88 | bwd_microstep: 1062.04 | bwd_inner_microstep: 1062.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 05:11:37,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.72 | bwd_microstep: 974.59 | bwd_inner_microstep: 974.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2085 [2024-06-10 05:11:38,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.77 | bwd_microstep: 765.84 | bwd_inner_microstep: 765.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 05:11:40,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.38 | bwd_microstep: 977.26 | bwd_inner_microstep: 977.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598 [2024-06-10 05:11:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 05:11:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.09 | bwd_microstep: 8629.26 | bwd_inner_microstep: 1641.11 | bwd_allreduce_microstep: 6988.09 | step_microstep: 38.81 [2024-06-10 05:11:49,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15015.66 | bwd: 47155.51 | bwd_inner: 40166.45 | bwd_allreduce: 6988.36 | step: 40.55 {'loss': 1.2814, 'learning_rate': 3.849549432355192e-05, 'epoch': 0.15} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3446 [2024-06-10 05:11:51,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.95 | bwd_microstep: 1506.85 | bwd_inner_microstep: 1506.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3401 [2024-06-10 05:11:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1287.86 | bwd_inner_microstep: 1287.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637 [2024-06-10 05:11:54,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.96 | bwd_microstep: 1113.62 | bwd_inner_microstep: 1113.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 05:11:56,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1375.20 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493 [2024-06-10 05:11:58,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.80 | bwd_microstep: 1185.39 | bwd_inner_microstep: 1185.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-10 05:12:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.45 | bwd_microstep: 1544.00 | bwd_inner_microstep: 1543.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 05:12:02,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 05:12:04,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-10 05:12:06,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-10 05:12:08,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1438.42 | bwd_inner_microstep: 1438.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 05:12:10,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1346.92 | bwd_inner_microstep: 1346.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3701 [2024-06-10 05:12:12,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1451.31 | bwd_inner_microstep: 1451.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-10 05:12:14,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.09 | bwd_microstep: 1627.49 | bwd_inner_microstep: 1627.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 05:12:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1449.84 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-10 05:12:17,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 894.37 | bwd_inner_microstep: 894.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 05:12:19,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1484.16 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3865 [2024-06-10 05:12:21,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1629.92 | bwd_inner_microstep: 1629.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3873 [2024-06-10 05:12:24,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.36 | bwd_microstep: 1484.07 | bwd_inner_microstep: 1484.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528 [2024-06-10 05:12:26,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1543.94 | bwd_inner_microstep: 1543.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 05:12:28,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1488.54 | bwd_inner_microstep: 1488.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 05:12:29,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.11 | bwd_microstep: 1158.92 | bwd_inner_microstep: 1158.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2131 [2024-06-10 05:12:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.96 | bwd_microstep: 836.21 | bwd_inner_microstep: 836.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 05:12:33,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1561.78 | bwd_inner_microstep: 1561.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 05:12:35,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029 [2024-06-10 05:12:36,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.81 | bwd_microstep: 809.02 | bwd_inner_microstep: 808.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 05:12:38,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.37 | bwd_microstep: 1291.25 | bwd_inner_microstep: 1291.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3539 [2024-06-10 05:12:39,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.63 | bwd_microstep: 1344.15 | bwd_inner_microstep: 1344.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3834 [2024-06-10 05:12:42,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1587.17 | bwd_inner_microstep: 1587.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 05:12:44,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1464.70 | bwd_inner_microstep: 1464.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 05:12:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1498.09 | bwd_inner_microstep: 1498.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-10 05:12:47,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 915.03 | bwd_inner_microstep: 915.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 05:12:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.90 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-10 05:12:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1515.17 | bwd_inner_microstep: 1507.46 | bwd_allreduce_microstep: 7.66 | step_microstep: 39.55 [2024-06-10 05:12:49,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.76 | bwd: 43627.17 | bwd_inner: 43618.59 | bwd_allreduce: 7.89 | step: 41.21 {'loss': 1.2995, 'learning_rate': 3.84811795227978e-05, 'epoch': 0.15} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 05:12:51,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.19 | bwd_microstep: 1392.09 | bwd_inner_microstep: 1392.01 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 05:12:53,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1250.15 | bwd_inner_microstep: 1250.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3875 [2024-06-10 05:12:55,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437 [2024-06-10 05:12:57,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.95 | bwd_inner_microstep: 1281.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 05:12:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1407.65 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-10 05:13:00,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.04 | bwd_microstep: 1346.12 | bwd_inner_microstep: 1346.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2205 [2024-06-10 05:13:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.09 | bwd_microstep: 1058.57 | bwd_inner_microstep: 1058.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 05:13:04,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 05:13:06,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1396.99 | bwd_inner_microstep: 1396.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-10 05:13:08,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.46 | bwd_microstep: 1626.52 | bwd_inner_microstep: 1626.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 05:13:10,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.04 | bwd_microstep: 1536.09 | bwd_inner_microstep: 1536.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 05:13:12,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.33 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416 [2024-06-10 05:13:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.30 | bwd_microstep: 1408.45 | bwd_inner_microstep: 1408.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413 [2024-06-10 05:13:16,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.25 | bwd_microstep: 1446.20 | bwd_inner_microstep: 1446.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646 [2024-06-10 05:13:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1712.85 | bwd_inner_microstep: 1712.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532 [2024-06-10 05:13:20,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523 [2024-06-10 05:13:22,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1455.51 | bwd_inner_microstep: 1455.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 05:13:24,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1661.03 | bwd_inner_microstep: 1661.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479 [2024-06-10 05:13:26,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.98 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088 [2024-06-10 05:13:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.49 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996 [2024-06-10 05:13:28,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.09 | bwd_microstep: 804.42 | bwd_inner_microstep: 804.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273 [2024-06-10 05:13:29,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 878.58 | bwd_inner_microstep: 878.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 05:13:31,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.71 | bwd_microstep: 1488.13 | bwd_inner_microstep: 1488.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 05:13:33,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3637 [2024-06-10 05:13:35,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.84 | bwd_microstep: 1220.06 | bwd_inner_microstep: 1220.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 05:13:37,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1496.48 | bwd_inner_microstep: 1496.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 05:13:39,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1499.80 | bwd_inner_microstep: 1499.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 05:13:40,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.90 | bwd_microstep: 718.71 | bwd_inner_microstep: 718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 05:13:42,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.73 | bwd_microstep: 1550.01 | bwd_inner_microstep: 1549.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3554 [2024-06-10 05:13:44,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.87 | bwd_microstep: 1474.40 | bwd_inner_microstep: 1474.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574 [2024-06-10 05:13:46,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 05:13:51,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.60 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 05:13:51,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 4591.73 | bwd_inner_microstep: 1684.91 | bwd_allreduce_microstep: 2906.77 | step_microstep: 41.58 [2024-06-10 05:13:51,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16013.14 | bwd: 45762.72 | bwd_inner: 42854.98 | bwd_allreduce: 2907.03 | step: 43.22 {'loss': 1.3466, 'learning_rate': 3.8466799631425474e-05, 'epoch': 0.15} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449 [2024-06-10 05:13:53,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.95 | bwd_microstep: 1306.95 | bwd_inner_microstep: 1306.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 05:13:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 05:13:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.18 | bwd_microstep: 1489.66 | bwd_inner_microstep: 1489.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 05:13:59,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1243.13 | bwd_inner_microstep: 1243.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769 [2024-06-10 05:14:01,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 05:14:03,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1379.56 | bwd_inner_microstep: 1379.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 05:14:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.42 | bwd_microstep: 1407.12 | bwd_inner_microstep: 1407.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 05:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.87 | bwd_microstep: 1296.67 | bwd_inner_microstep: 1296.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 05:14:09,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.95 | bwd_microstep: 1478.68 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3437 [2024-06-10 05:14:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 05:14:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1444.20 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2153 [2024-06-10 05:14:14,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.26 | bwd_microstep: 946.85 | bwd_inner_microstep: 946.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3917 [2024-06-10 05:14:16,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.69 | bwd_microstep: 1597.10 | bwd_inner_microstep: 1597.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417 [2024-06-10 05:14:18,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.55 | bwd_microstep: 1310.79 | bwd_inner_microstep: 1310.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-10 05:14:20,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.55 | bwd_microstep: 1446.92 | bwd_inner_microstep: 1446.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3627 [2024-06-10 05:14:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1566.68 | bwd_inner_microstep: 1566.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2996 [2024-06-10 05:14:24,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.75 | bwd_microstep: 1203.64 | bwd_inner_microstep: 1203.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 05:14:26,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1395.43 | bwd_inner_microstep: 1395.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3649 [2024-06-10 05:14:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.03 | bwd_microstep: 1447.04 | bwd_inner_microstep: 1447.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 05:14:29,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1380.44 | bwd_inner_microstep: 1380.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3674 [2024-06-10 05:14:31,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.67 | bwd_microstep: 1327.57 | bwd_inner_microstep: 1327.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 05:14:33,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702 [2024-06-10 05:14:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.26 | bwd_microstep: 1633.77 | bwd_inner_microstep: 1633.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 05:14:37,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.14 | bwd_microstep: 1458.33 | bwd_inner_microstep: 1458.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 05:14:39,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1393.60 | bwd_inner_microstep: 1393.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2080 [2024-06-10 05:14:40,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.02 | bwd_microstep: 758.53 | bwd_inner_microstep: 758.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2534 [2024-06-10 05:14:42,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.74 | bwd_microstep: 1062.08 | bwd_inner_microstep: 1062.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 05:14:44,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.35 | bwd_microstep: 1754.53 | bwd_inner_microstep: 1754.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875 [2024-06-10 05:14:47,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1588.30 | bwd_inner_microstep: 1588.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271 [2024-06-10 05:14:48,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.23 | bwd_microstep: 1010.94 | bwd_inner_microstep: 1010.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567 [2024-06-10 05:14:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.73 | bwd_microstep: 1599.57 | bwd_inner_microstep: 1599.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-10 05:14:52,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.56 [2024-06-10 05:14:52,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1656.68 | bwd_inner_microstep: 1512.08 | bwd_allreduce_microstep: 144.56 | step_microstep: 38.30 [2024-06-10 05:14:52,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16483.18 | bwd: 44249.43 | bwd_inner: 44103.92 | bwd_allreduce: 144.81 | step: 39.91 {'loss': 1.239, 'learning_rate': 3.845235470008084e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 05:14:54,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1467.33 | bwd_inner_microstep: 1467.26 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920 [2024-06-10 05:14:57,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.92 | bwd_microstep: 1696.66 | bwd_inner_microstep: 1696.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 05:14:58,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 05:15:00,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.11 | bwd_microstep: 1652.55 | bwd_inner_microstep: 1652.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3802 [2024-06-10 05:15:02,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.37 | bwd_microstep: 1599.41 | bwd_inner_microstep: 1599.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 05:15:04,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1285.47 | bwd_inner_microstep: 1285.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 956 [2024-06-10 05:15:05,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 148.23 | bwd_microstep: 384.62 | bwd_inner_microstep: 384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3717 [2024-06-10 05:15:07,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.06 | bwd_microstep: 1734.00 | bwd_inner_microstep: 1733.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063 [2024-06-10 05:15:08,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.14 | bwd_microstep: 821.25 | bwd_inner_microstep: 821.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 05:15:10,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1260.00 | bwd_inner_microstep: 1259.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 05:15:12,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.54 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-10 05:15:14,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.20 | bwd_microstep: 1719.85 | bwd_inner_microstep: 1719.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 05:15:16,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.15 | bwd_microstep: 1481.82 | bwd_inner_microstep: 1481.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3709 [2024-06-10 05:15:19,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.96 | bwd_microstep: 1728.35 | bwd_inner_microstep: 1728.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625 [2024-06-10 05:15:21,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.50 | bwd_microstep: 1319.50 | bwd_inner_microstep: 1319.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478 [2024-06-10 05:15:22,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.21 | bwd_microstep: 1190.07 | bwd_inner_microstep: 1190.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 05:15:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.08 | bwd_microstep: 1514.97 | bwd_inner_microstep: 1514.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 05:15:26,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 05:15:28,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1296.79 | bwd_inner_microstep: 1296.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2299 [2024-06-10 05:15:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.68 | bwd_microstep: 882.67 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-10 05:15:31,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1315.65 | bwd_inner_microstep: 1315.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 05:15:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1405.64 | bwd_inner_microstep: 1405.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3883 [2024-06-10 05:15:35,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.97 | bwd_microstep: 1615.29 | bwd_inner_microstep: 1615.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3872 [2024-06-10 05:15:38,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.64 | bwd_microstep: 1772.26 | bwd_inner_microstep: 1772.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142 [2024-06-10 05:15:39,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.42 | bwd_microstep: 835.16 | bwd_inner_microstep: 835.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3616 [2024-06-10 05:15:41,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.01 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3717 [2024-06-10 05:15:43,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1495.37 | bwd_inner_microstep: 1495.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 05:15:45,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1648.53 | bwd_inner_microstep: 1648.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 05:15:47,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.44 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 05:15:49,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.53 | bwd_microstep: 1603.23 | bwd_inner_microstep: 1603.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848 [2024-06-10 05:15:52,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.94 | bwd_microstep: 1761.19 | bwd_inner_microstep: 1761.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3746 [2024-06-10 05:15:54,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 05:15:54,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.93 | bwd_microstep: 1727.26 | bwd_inner_microstep: 1719.54 | bwd_allreduce_microstep: 7.67 | step_microstep: 38.42 [2024-06-10 05:15:54,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16670.29 | bwd: 44756.82 | bwd_inner: 44748.19 | bwd_allreduce: 7.93 | step: 40.01 32:23<25:18:46, 62.08s/it] 15%|█▌ | 259/1726 [4:33:23<25:04:12, 61.52s/it] 15%|█▌ | 259/1726 [4:33:23<25:04:12, 61.52s/it] 15%|█▌ | 260/1726 [4:34:26<25:10:29, 61.82s/it] 15%|█▌ | 260/1726 [4:34:26<25:10:29, 61.82s/it] 15%|█▌ | 261/1726 [4:35:26<24:58:12, 61.36s/it] 15%|█▌ | 261/1726 [4:35:26<24:58:12, 61.36s/it] 15%|█▌ | 262/1726 [4:36:28<25:02:50, 61.59s/it] 15%|█▌ | 262/1726 [4:36:28<25:02:50, 61.59s/it] 15%|█▌ | 263/1726 [4:37:29<24:58:06, 61.44s/it] 15%|█▌ | 263/1726 [4:37:29<24:58:06, 61.44s/it] 15%|█▌ | 264/1726 [4:38:31<24:59:34, 61.54s/it] {'loss': 1.3251, 'learning_rate': 3.843784477963888e-05, 'epoch': 0.15} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 05:15:56,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.66 | bwd_microstep: 1292.96 | bwd_inner_microstep: 1292.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3916 [2024-06-10 05:15:58,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4219 [2024-06-10 05:16:00,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.02 | bwd_microstep: 1658.60 | bwd_inner_microstep: 1658.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 05:16:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1343.86 | bwd_inner_microstep: 1343.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 05:16:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.37 | bwd_microstep: 1403.73 | bwd_inner_microstep: 1403.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 05:16:06,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2042 [2024-06-10 05:16:07,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.52 | bwd_microstep: 842.39 | bwd_inner_microstep: 842.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 05:16:09,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.46 | bwd_microstep: 1291.38 | bwd_inner_microstep: 1291.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4006 [2024-06-10 05:16:11,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.85 | bwd_microstep: 1716.49 | bwd_inner_microstep: 1716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2502 [2024-06-10 05:16:12,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.04 | bwd_microstep: 964.82 | bwd_inner_microstep: 964.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2140 [2024-06-10 05:16:14,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.99 | bwd_microstep: 990.01 | bwd_inner_microstep: 989.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3415 [2024-06-10 05:16:16,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 05:16:18,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.86 | bwd_microstep: 1442.38 | bwd_inner_microstep: 1442.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 05:16:20,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1392.11 | bwd_inner_microstep: 1392.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 05:16:22,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1485.68 | bwd_inner_microstep: 1485.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 05:16:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1340.69 | bwd_inner_microstep: 1340.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 05:16:26,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.71 | bwd_inner_microstep: 1412.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-10 05:16:28,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1589.13 | bwd_inner_microstep: 1589.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 05:16:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1418.46 | bwd_inner_microstep: 1418.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-10 05:16:32,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.99 | bwd_microstep: 1427.15 | bwd_inner_microstep: 1427.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 05:16:34,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.92 | bwd_microstep: 1297.21 | bwd_inner_microstep: 1297.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3611 [2024-06-10 05:16:35,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.30 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001 [2024-06-10 05:16:36,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.00 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 05:16:38,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.79 | bwd_microstep: 1404.40 | bwd_inner_microstep: 1404.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2015 [2024-06-10 05:16:40,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.65 | bwd_microstep: 931.66 | bwd_inner_microstep: 931.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3778 [2024-06-10 05:16:42,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 05:16:44,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 05:16:45,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.75 | bwd_microstep: 975.61 | bwd_inner_microstep: 975.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 05:16:47,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1512.60 | bwd_inner_microstep: 1512.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 05:16:49,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 05:16:51,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576 [2024-06-10 05:16:56,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 05:16:56,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.74 | bwd_microstep: 4272.69 | bwd_inner_microstep: 1804.63 | bwd_allreduce_microstep: 2468.00 | step_microstep: 38.63 [2024-06-10 05:16:56,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.22 | bwd: 45346.37 | bwd_inner: 42877.42 | bwd_allreduce: 2468.24 | step: 40.31 {'loss': 1.3241, 'learning_rate': 3.842326992120345e-05, 'epoch': 0.15} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5008 [2024-06-10 05:16:59,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 727.45 | bwd_microstep: 1953.72 | bwd_inner_microstep: 1953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 05:17:00,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 795.86 | bwd_inner_microstep: 795.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 05:17:01,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1311.78 | bwd_inner_microstep: 1311.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 05:17:04,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1550.68 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3487 [2024-06-10 05:17:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1235.84 | bwd_inner_microstep: 1235.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 05:17:06,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 796.54 | bwd_inner_microstep: 796.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 05:17:08,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729 [2024-06-10 05:17:10,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.64 | bwd_microstep: 1634.89 | bwd_inner_microstep: 1634.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 05:17:12,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1488.83 | bwd_inner_microstep: 1488.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-10 05:17:15,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.38 | bwd_microstep: 1646.55 | bwd_inner_microstep: 1646.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 05:17:16,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.96 | bwd_microstep: 793.24 | bwd_inner_microstep: 793.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 05:17:18,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.83 | bwd_microstep: 1485.75 | bwd_inner_microstep: 1485.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3493 [2024-06-10 05:17:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.40 | bwd_microstep: 1270.57 | bwd_inner_microstep: 1270.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418 [2024-06-10 05:17:21,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1282.22 | bwd_inner_microstep: 1282.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 05:17:23,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1396.51 | bwd_inner_microstep: 1396.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966 [2024-06-10 05:17:24,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.82 | bwd_microstep: 826.23 | bwd_inner_microstep: 826.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3498 [2024-06-10 05:17:27,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1631.77 | bwd_inner_microstep: 1631.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821 [2024-06-10 05:17:29,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 1750.41 | bwd_inner_microstep: 1750.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939 [2024-06-10 05:17:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.66 | bwd_microstep: 825.04 | bwd_inner_microstep: 825.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 05:17:32,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2089 [2024-06-10 05:17:33,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.62 | bwd_microstep: 803.09 | bwd_inner_microstep: 803.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 05:17:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1459.74 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 05:17:37,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 876.13 | bwd_inner_microstep: 876.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-10 05:17:38,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.18 | bwd_microstep: 1188.63 | bwd_inner_microstep: 1188.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3456 [2024-06-10 05:17:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1414.94 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273 [2024-06-10 05:17:42,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.68 | bwd_microstep: 1073.37 | bwd_inner_microstep: 1073.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 05:17:43,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1286.81 | bwd_inner_microstep: 1286.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 05:17:45,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1503.89 | bwd_inner_microstep: 1503.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 05:17:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.57 | bwd_microstep: 1262.10 | bwd_inner_microstep: 1262.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 05:17:49,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1351.83 | bwd_inner_microstep: 1351.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809 [2024-06-10 05:17:52,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.95 | bwd_microstep: 1753.65 | bwd_inner_microstep: 1753.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 05:17:56,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 05:17:56,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 3499.72 | bwd_inner_microstep: 1303.46 | bwd_allreduce_microstep: 2196.21 | step_microstep: 38.76 [2024-06-10 05:17:56,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15525.18 | bwd: 43779.27 | bwd_inner: 41582.16 | bwd_allreduce: 2196.44 | step: 40.37 {'loss': 1.3004, 'learning_rate': 3.840863017610714e-05, 'epoch': 0.15} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 05:17:58,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1486.25 | bwd_inner_microstep: 1486.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1980 [2024-06-10 05:17:59,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.15 | bwd_microstep: 768.72 | bwd_inner_microstep: 768.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3891 [2024-06-10 05:18:01,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.11 | bwd_microstep: 1683.89 | bwd_inner_microstep: 1683.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3847 [2024-06-10 05:18:03,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1361.88 | bwd_inner_microstep: 1361.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851 [2024-06-10 05:18:05,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1559.49 | bwd_inner_microstep: 1559.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 05:18:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 05:18:09,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 05:18:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4087 [2024-06-10 05:18:13,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1630.11 | bwd_inner_microstep: 1630.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-10 05:18:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1315.53 | bwd_inner_microstep: 1315.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510 [2024-06-10 05:18:17,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 05:18:18,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 05:18:20,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1484.12 | bwd_inner_microstep: 1484.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 05:18:22,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.48 | bwd_microstep: 1585.27 | bwd_inner_microstep: 1585.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 05:18:24,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.47 | bwd_microstep: 1619.69 | bwd_inner_microstep: 1619.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2075 [2024-06-10 05:18:25,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.10 | bwd_microstep: 787.30 | bwd_inner_microstep: 787.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640 [2024-06-10 05:18:27,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1541.47 | bwd_inner_microstep: 1541.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657 [2024-06-10 05:18:29,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1428.53 | bwd_inner_microstep: 1428.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 05:18:31,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1496.29 | bwd_inner_microstep: 1496.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 05:18:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1398.26 | bwd_inner_microstep: 1398.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3548 [2024-06-10 05:18:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.61 | bwd_microstep: 1560.10 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-10 05:18:38,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.55 | bwd_microstep: 1611.42 | bwd_inner_microstep: 1611.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 05:18:40,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 05:18:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1460.89 | bwd_inner_microstep: 1460.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475 [2024-06-10 05:18:44,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1316.37 | bwd_inner_microstep: 1316.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-10 05:18:45,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.69 | bwd_microstep: 1215.55 | bwd_inner_microstep: 1215.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 05:18:47,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2213 [2024-06-10 05:18:49,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.05 | bwd_microstep: 863.76 | bwd_inner_microstep: 863.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 05:18:51,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1497.09 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451 [2024-06-10 05:18:53,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1418.58 | bwd_inner_microstep: 1418.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591 [2024-06-10 05:18:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1426.22 | bwd_inner_microstep: 1426.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 05:18:57,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 05:18:57,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1927.14 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 387.01 | step_microstep: 38.41 [2024-06-10 05:18:57,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16522.11 | bwd: 44619.93 | bwd_inner: 44231.98 | bwd_allreduce: 387.25 | step: 40.14 {'loss': 1.3095, 'learning_rate': 3.839392559591104e-05, 'epoch': 0.15} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 05:18:59,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1377.21 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3396 [2024-06-10 05:19:01,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.42 | bwd_microstep: 1148.36 | bwd_inner_microstep: 1148.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904 [2024-06-10 05:19:03,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1494.05 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 05:19:04,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1281.83 | bwd_inner_microstep: 1281.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-10 05:19:06,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1451.88 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867 [2024-06-10 05:19:07,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.29 | bwd_microstep: 710.00 | bwd_inner_microstep: 709.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-10 05:19:09,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1186.13 | bwd_inner_microstep: 1186.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 740 [2024-06-10 05:19:09,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.61 | bwd_microstep: 299.85 | bwd_inner_microstep: 299.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792 [2024-06-10 05:19:11,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1455.83 | bwd_inner_microstep: 1455.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 05:19:13,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.71 | bwd_microstep: 1282.95 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881 [2024-06-10 05:19:14,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.36 | bwd_microstep: 711.16 | bwd_inner_microstep: 711.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:19:16,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 05:19:18,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1386.95 | bwd_inner_microstep: 1386.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 05:19:20,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1388.62 | bwd_inner_microstep: 1388.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 05:19:22,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1346.70 | bwd_inner_microstep: 1346.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-10 05:19:24,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 05:19:26,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1376.31 | bwd_inner_microstep: 1376.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 05:19:27,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 800.47 | bwd_inner_microstep: 800.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 05:19:29,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1283.89 | bwd_inner_microstep: 1283.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2149 [2024-06-10 05:19:31,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.68 | bwd_microstep: 1806.87 | bwd_inner_microstep: 1806.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275 [2024-06-10 05:19:32,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.46 | bwd_microstep: 910.17 | bwd_inner_microstep: 910.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 05:19:34,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1397.06 | bwd_inner_microstep: 1397.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 05:19:36,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.50 | bwd_microstep: 1390.10 | bwd_inner_microstep: 1390.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 05:19:38,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1414.23 | bwd_inner_microstep: 1414.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-10 05:19:40,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.31 | bwd_microstep: 1632.25 | bwd_inner_microstep: 1632.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542 [2024-06-10 05:19:42,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.82 | bwd_microstep: 1591.51 | bwd_inner_microstep: 1591.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601 [2024-06-10 05:19:45,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.09 | bwd_microstep: 1704.58 | bwd_inner_microstep: 1704.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3817 [2024-06-10 05:19:47,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.40 | bwd_microstep: 1820.46 | bwd_inner_microstep: 1820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3589 [2024-06-10 05:19:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.22 | bwd_microstep: 1354.81 | bwd_inner_microstep: 1354.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 05:19:51,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 1542.97 | bwd_inner_microstep: 1542.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033 [2024-06-10 05:19:52,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.80 | bwd_microstep: 806.31 | bwd_inner_microstep: 806.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2240 [2024-06-10 05:20:00,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.33 | optimizer_step: 6.63 [2024-06-10 05:20:00,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.98 | bwd_microstep: 7748.80 | bwd_inner_microstep: 983.81 | bwd_allreduce_microstep: 6764.93 | step_microstep: 38.88 [2024-06-10 05:20:00,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15058.37 | bwd: 47917.23 | bwd_inner: 41151.33 | bwd_allreduce: 6765.18 | step: 40.56 {'loss': 1.3688, 'learning_rate': 3.837915623240462e-05, 'epoch': 0.16} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 05:20:02,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.40 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2361 [2024-06-10 05:20:04,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.57 | bwd_microstep: 984.69 | bwd_inner_microstep: 984.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352 [2024-06-10 05:20:05,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.97 | bwd_microstep: 985.55 | bwd_inner_microstep: 985.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886 [2024-06-10 05:20:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1581.38 | bwd_inner_microstep: 1581.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 05:20:09,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1248.50 | bwd_inner_microstep: 1248.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-10 05:20:11,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 05:20:13,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1386.03 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-10 05:20:14,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.06 | bwd_microstep: 681.95 | bwd_inner_microstep: 681.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230 [2024-06-10 05:20:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.10 | bwd_microstep: 958.18 | bwd_inner_microstep: 958.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 05:20:17,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.36 | bwd_microstep: 1533.57 | bwd_inner_microstep: 1533.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 05:20:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709 [2024-06-10 05:20:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1634.93 | bwd_inner_microstep: 1634.67 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-10 05:20:23,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.78 | bwd_microstep: 1539.40 | bwd_inner_microstep: 1539.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 05:20:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.34 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 05:20:27,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.29 | bwd_microstep: 1503.24 | bwd_inner_microstep: 1503.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.32 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642 [2024-06-10 05:20:30,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.40 | bwd_microstep: 1710.48 | bwd_inner_microstep: 1710.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 05:20:32,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1492.06 | bwd_inner_microstep: 1492.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 05:20:34,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 05:20:36,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1514.43 | bwd_inner_microstep: 1514.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2664 [2024-06-10 05:20:37,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.44 | bwd_microstep: 1119.41 | bwd_inner_microstep: 1119.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3608 [2024-06-10 05:20:40,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.24 | bwd_microstep: 1809.14 | bwd_inner_microstep: 1809.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 05:20:42,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1416.39 | bwd_inner_microstep: 1416.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 05:20:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1280.57 | bwd_inner_microstep: 1280.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 05:20:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.10 | bwd_microstep: 810.07 | bwd_inner_microstep: 810.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 05:20:46,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.31 | bwd_microstep: 1255.67 | bwd_inner_microstep: 1255.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 05:20:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.28 | bwd_microstep: 1637.64 | bwd_inner_microstep: 1637.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524 [2024-06-10 05:20:50,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.77 | bwd_microstep: 1203.43 | bwd_inner_microstep: 1203.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797 [2024-06-10 05:20:52,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1554.33 | bwd_inner_microstep: 1554.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 05:20:55,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.41 | bwd_microstep: 1754.01 | bwd_inner_microstep: 1753.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2625 [2024-06-10 05:20:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.46 | bwd_microstep: 1109.70 | bwd_inner_microstep: 1109.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803 [2024-06-10 05:20:59,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.38 | bwd_microstep: 1647.62 | bwd_inner_microstep: 1647.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 05:21:02,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.62 [2024-06-10 05:21:02,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 2536.51 | bwd_inner_microstep: 1686.95 | bwd_allreduce_microstep: 849.49 | step_microstep: 38.94 [2024-06-10 05:21:02,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.06 | bwd: 44793.07 | bwd_inner: 43942.25 | bwd_allreduce: 849.92 | step: 41.10 {'loss': 1.3367, 'learning_rate': 3.8364322137605484e-05, 'epoch': 0.16} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866 [2024-06-10 05:21:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.66 | bwd_microstep: 671.47 | bwd_inner_microstep: 671.33 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898 [2024-06-10 05:21:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1590.92 | bwd_inner_microstep: 1590.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2310 [2024-06-10 05:21:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.52 | bwd_microstep: 979.17 | bwd_inner_microstep: 979.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 05:21:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1378.07 | bwd_inner_microstep: 1378.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4255 [2024-06-10 05:21:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.28 | bwd_microstep: 1565.62 | bwd_inner_microstep: 1565.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 05:21:12,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.58 | bwd_microstep: 1186.72 | bwd_inner_microstep: 1186.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3667 [2024-06-10 05:21:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1453.18 | bwd_inner_microstep: 1453.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 05:21:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.02 | bwd_microstep: 1626.70 | bwd_inner_microstep: 1626.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409 [2024-06-10 05:21:18,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.05 | bwd_microstep: 1177.61 | bwd_inner_microstep: 1177.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427 [2024-06-10 05:21:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.85 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 05:21:22,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 05:21:24,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1344.12 | bwd_inner_microstep: 1344.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 1954 [2024-06-10 05:21:25,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.95 | bwd_microstep: 947.96 | bwd_inner_microstep: 947.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496 [2024-06-10 05:21:27,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.88 | bwd_microstep: 1574.44 | bwd_inner_microstep: 1574.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 05:21:29,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1339.11 | bwd_inner_microstep: 1339.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-10 05:21:31,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1419.68 | bwd_inner_microstep: 1419.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2125 [2024-06-10 05:21:32,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.50 | bwd_microstep: 836.69 | bwd_inner_microstep: 836.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-10 05:21:34,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1509.36 | bwd_inner_microstep: 1509.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 05:21:36,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3553 [2024-06-10 05:21:38,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.79 | bwd_microstep: 1201.84 | bwd_inner_microstep: 1201.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 05:21:40,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.89 | bwd_microstep: 1449.75 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 05:21:41,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.89 | bwd_inner_microstep: 1255.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-10 05:21:43,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.55 | bwd_microstep: 1304.00 | bwd_inner_microstep: 1303.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-10 05:21:45,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1326.11 | bwd_inner_microstep: 1326.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3175 [2024-06-10 05:21:47,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.78 | bwd_microstep: 1237.03 | bwd_inner_microstep: 1237.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-10 05:21:49,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1447.93 | bwd_inner_microstep: 1447.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588 [2024-06-10 05:21:51,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1564.98 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639 [2024-06-10 05:21:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1577.55 | bwd_inner_microstep: 1577.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 05:21:55,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 05:21:57,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1393.77 | bwd_inner_microstep: 1393.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 05:21:59,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1343.01 | bwd_inner_microstep: 1342.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 05:22:06,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.39 | optimizer_step: 6.60 [2024-06-10 05:22:06,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 6185.56 | bwd_inner_microstep: 1440.98 | bwd_allreduce_microstep: 4744.51 | step_microstep: 39.82 [2024-06-10 05:22:06,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16015.27 | bwd: 47519.05 | bwd_inner: 42773.51 | bwd_allreduce: 4744.81 | step: 41.42 15%|█▌ | 264/1726 [4:38:31<24:59:34, 61.54s/it] 15%|█▌ | 265/1726 [4:39:33<24:59:42, 61.59s/it] 15%|█▌ | 265/1726 [4:39:33<24:59:42, 61.59s/it] 15%|█▌ | 266/1726 [4:40:32<24:44:33, 61.01s/it] 15%|█▌ | 266/1726 [4:40:32<24:44:33, 61.01s/it] 15%|█▌ | 267/1726 [4:41:34<24:47:05, 61.16s/it] 15%|█▌ | 267/1726 [4:41:34<24:47:05, 61.16s/it] 16%|█▌ | 268/1726 [4:42:37<25:01:48, 61.80s/it] 16%|█▌ | 268/1726 [4:42:37<25:01:48, 61.80s/it] 16%|█▌ | 269/1726 [4:43:39<24:58:33, 61.71s/it] 16%|█▌ | 269/1726 [4:43:39<24:58:33, 61.71s/it] 16%|█▌ | 270/1726 [4:44:42<25:13:22, 62.36s/it] {'loss': 1.308, 'learning_rate': 3.834942336375925e-05, 'epoch': 0.16} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466 [2024-06-10 05:22:08,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.32 | bwd_microstep: 1564.66 | bwd_inner_microstep: 1564.55 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 05:22:10,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912 [2024-06-10 05:22:12,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.67 | bwd_microstep: 1686.31 | bwd_inner_microstep: 1686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-10 05:22:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.77 | bwd_microstep: 1277.26 | bwd_inner_microstep: 1277.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 05:22:16,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1458.73 | bwd_inner_microstep: 1458.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 05:22:18,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.79 | bwd_microstep: 1553.88 | bwd_inner_microstep: 1553.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3742 [2024-06-10 05:22:20,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1368.20 | bwd_inner_microstep: 1368.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 05:22:22,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1399.90 | bwd_inner_microstep: 1399.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 05:22:24,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728 [2024-06-10 05:22:26,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.48 | bwd_microstep: 1465.09 | bwd_inner_microstep: 1465.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533 [2024-06-10 05:22:27,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.04 | bwd_microstep: 1200.10 | bwd_inner_microstep: 1200.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 05:22:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 1645.15 | bwd_inner_microstep: 1645.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416 [2024-06-10 05:22:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 05:22:33,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3386 [2024-06-10 05:22:35,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917 [2024-06-10 05:22:36,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.45 | bwd_microstep: 722.69 | bwd_inner_microstep: 722.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-10 05:22:37,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.50 | bwd_microstep: 792.93 | bwd_inner_microstep: 792.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 05:22:39,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.53 | bwd_microstep: 1248.49 | bwd_inner_microstep: 1248.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-10 05:22:41,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 05:22:43,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.99 | bwd_microstep: 1297.52 | bwd_inner_microstep: 1297.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-10 05:22:45,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-10 05:22:47,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1463.14 | bwd_inner_microstep: 1463.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2119 [2024-06-10 05:22:48,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.64 | bwd_microstep: 766.11 | bwd_inner_microstep: 766.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 05:22:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1356.16 | bwd_inner_microstep: 1356.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 05:22:52,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1397.08 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 05:22:54,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1511.74 | bwd_inner_microstep: 1511.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 05:22:56,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1601.17 | bwd_inner_microstep: 1601.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 05:22:58,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1444.70 | bwd_inner_microstep: 1444.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2253 [2024-06-10 05:22:59,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 869.95 | bwd_inner_microstep: 869.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291 [2024-06-10 05:23:00,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.33 | bwd_microstep: 879.13 | bwd_inner_microstep: 879.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596 [2024-06-10 05:23:02,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1438.29 | bwd_inner_microstep: 1438.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575 [2024-06-10 05:23:07,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 05:23:07,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.68 | bwd_microstep: 4051.13 | bwd_inner_microstep: 1921.42 | bwd_allreduce_microstep: 2129.66 | step_microstep: 39.10 [2024-06-10 05:23:07,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16030.46 | bwd: 45115.81 | bwd_inner: 42985.14 | bwd_allreduce: 2129.96 | step: 40.78 {'loss': 1.2707, 'learning_rate': 3.833445996333932e-05, 'epoch': 0.16} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2875 [2024-06-10 05:23:09,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.80 | bwd_microstep: 1172.17 | bwd_inner_microstep: 1172.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4016 [2024-06-10 05:23:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882 [2024-06-10 05:23:13,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.85 | bwd_microstep: 1488.73 | bwd_inner_microstep: 1488.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3840 [2024-06-10 05:23:15,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.69 | bwd_microstep: 1487.20 | bwd_inner_microstep: 1487.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237 [2024-06-10 05:23:16,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.67 | bwd_microstep: 864.39 | bwd_inner_microstep: 864.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702 [2024-06-10 05:23:18,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1434.46 | bwd_inner_microstep: 1434.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 05:23:20,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.75 | bwd_microstep: 1540.11 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 05:23:22,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1396.93 | bwd_inner_microstep: 1396.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 05:23:23,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.50 | bwd_microstep: 798.98 | bwd_inner_microstep: 798.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-10 05:23:25,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1283.42 | bwd_inner_microstep: 1283.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 05:23:27,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.01 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 05:23:29,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1384.75 | bwd_inner_microstep: 1384.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 05:23:31,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2147 [2024-06-10 05:23:32,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1043.21 | bwd_inner_microstep: 1043.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 05:23:34,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.13 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3447 [2024-06-10 05:23:36,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.39 | bwd_microstep: 1486.06 | bwd_inner_microstep: 1486.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3695 [2024-06-10 05:23:39,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.64 | bwd_microstep: 1665.78 | bwd_inner_microstep: 1665.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 05:23:41,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.24 | bwd_microstep: 1663.11 | bwd_inner_microstep: 1663.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 05:23:43,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1410.28 | bwd_inner_microstep: 1410.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-10 05:23:45,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.17 | bwd_microstep: 1527.13 | bwd_inner_microstep: 1527.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 05:23:47,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.89 | bwd_microstep: 1464.18 | bwd_inner_microstep: 1464.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 05:23:49,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 1397.71 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 05:23:51,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1290.93 | bwd_inner_microstep: 1290.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3543 [2024-06-10 05:23:53,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1426.64 | bwd_inner_microstep: 1426.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2182 [2024-06-10 05:23:54,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.22 | bwd_microstep: 890.53 | bwd_inner_microstep: 890.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 05:23:56,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.86 | bwd_microstep: 1533.22 | bwd_inner_microstep: 1533.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3778 [2024-06-10 05:23:58,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.66 | bwd_microstep: 1413.01 | bwd_inner_microstep: 1412.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 05:24:00,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1353.84 | bwd_inner_microstep: 1353.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3581 [2024-06-10 05:24:02,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.21 | bwd_microstep: 1668.60 | bwd_inner_microstep: 1668.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 05:24:04,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 05:24:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1402.22 | bwd_inner_microstep: 1402.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 05:24:11,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.62 [2024-06-10 05:24:11,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 4323.32 | bwd_inner_microstep: 1568.00 | bwd_allreduce_microstep: 2755.24 | step_microstep: 41.54 [2024-06-10 05:24:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16418.20 | bwd: 46852.62 | bwd_inner: 44096.41 | bwd_allreduce: 2755.49 | step: 43.15 {'loss': 1.3312, 'learning_rate': 3.8319431989046704e-05, 'epoch': 0.16} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 05:24:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.88 | bwd_microstep: 1364.53 | bwd_inner_microstep: 1364.44 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3971 [2024-06-10 05:24:15,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.65 | bwd_microstep: 1668.83 | bwd_inner_microstep: 1668.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887 [2024-06-10 05:24:17,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1686.82 | bwd_inner_microstep: 1686.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 05:24:19,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769 [2024-06-10 05:24:21,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.60 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 05:24:23,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.23 | bwd_microstep: 1557.33 | bwd_inner_microstep: 1557.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-10 05:24:26,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 05:24:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 05:24:29,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.31 | bwd_microstep: 1426.44 | bwd_inner_microstep: 1426.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 05:24:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.56 | bwd_microstep: 1262.60 | bwd_inner_microstep: 1262.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085 [2024-06-10 05:24:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 922.82 | bwd_inner_microstep: 922.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 05:24:35,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.73 | bwd_microstep: 1617.44 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3956 [2024-06-10 05:24:37,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.20 | bwd_microstep: 1696.35 | bwd_inner_microstep: 1696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1983 [2024-06-10 05:24:38,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.48 | bwd_microstep: 831.23 | bwd_inner_microstep: 831.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975 [2024-06-10 05:24:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.79 | bwd_microstep: 890.46 | bwd_inner_microstep: 890.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 05:24:41,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2021 [2024-06-10 05:24:42,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.45 | bwd_microstep: 746.21 | bwd_inner_microstep: 746.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288 [2024-06-10 05:24:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.82 | bwd_microstep: 881.36 | bwd_inner_microstep: 881.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 05:24:45,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.67 | bwd_microstep: 1201.06 | bwd_inner_microstep: 1201.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 05:24:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.61 | bwd_microstep: 1282.85 | bwd_inner_microstep: 1282.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 05:24:49,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1513.81 | bwd_inner_microstep: 1513.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752 [2024-06-10 05:24:51,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1344.83 | bwd_inner_microstep: 1344.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 05:24:53,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.40 | bwd_microstep: 1408.83 | bwd_inner_microstep: 1408.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121 [2024-06-10 05:24:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.87 | bwd_microstep: 831.20 | bwd_inner_microstep: 831.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2063 [2024-06-10 05:24:55,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.91 | bwd_microstep: 945.40 | bwd_inner_microstep: 945.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 05:24:57,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.22 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3616 [2024-06-10 05:24:59,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.85 | bwd_microstep: 1559.38 | bwd_inner_microstep: 1559.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-10 05:25:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.13 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 05:25:03,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.45 | bwd_microstep: 1645.80 | bwd_inner_microstep: 1645.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 05:25:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1477.44 | bwd_inner_microstep: 1477.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491 [2024-06-10 05:25:07,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1321.48 | bwd_inner_microstep: 1321.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 05:25:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 05:25:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.78 | bwd_microstep: 3766.97 | bwd_inner_microstep: 1836.60 | bwd_allreduce_microstep: 1930.32 | step_microstep: 38.76 [2024-06-10 05:25:12,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15801.98 | bwd: 44544.68 | bwd_inner: 42613.38 | bwd_allreduce: 1930.60 | step: 40.42 {'loss': 1.3399, 'learning_rate': 3.8304339493809866e-05, 'epoch': 0.16} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 05:25:13,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.23 | bwd_microstep: 1271.74 | bwd_inner_microstep: 1271.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 05:25:15,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887 [2024-06-10 05:25:17,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.82 | bwd_microstep: 1586.98 | bwd_inner_microstep: 1586.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3837 [2024-06-10 05:25:19,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 05:25:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.21 | bwd_microstep: 1274.69 | bwd_inner_microstep: 1274.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594 [2024-06-10 05:25:23,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.80 | bwd_microstep: 1212.39 | bwd_inner_microstep: 1212.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-10 05:25:25,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1440.51 | bwd_inner_microstep: 1440.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 05:25:26,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.96 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 05:25:28,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.61 | bwd_microstep: 1388.53 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773 [2024-06-10 05:25:30,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 05:25:32,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.93 | bwd_microstep: 1289.04 | bwd_inner_microstep: 1289.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 05:25:34,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.66 | bwd_microstep: 1488.59 | bwd_inner_microstep: 1488.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3995 [2024-06-10 05:25:37,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.93 | bwd_microstep: 1909.66 | bwd_inner_microstep: 1909.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 05:25:39,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2069 [2024-06-10 05:25:40,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.08 | bwd_microstep: 882.55 | bwd_inner_microstep: 882.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3636 [2024-06-10 05:25:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1374.81 | bwd_inner_microstep: 1374.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647 [2024-06-10 05:25:44,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1505.39 | bwd_inner_microstep: 1505.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519 [2024-06-10 05:25:46,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.06 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 05:25:48,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.79 | bwd_microstep: 1297.04 | bwd_inner_microstep: 1297.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 05:25:50,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.57 | bwd_microstep: 1288.45 | bwd_inner_microstep: 1288.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040 [2024-06-10 05:25:51,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.54 | bwd_microstep: 812.31 | bwd_inner_microstep: 812.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3616 [2024-06-10 05:25:53,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1443.39 | bwd_inner_microstep: 1443.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3891 [2024-06-10 05:25:55,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1618.16 | bwd_inner_microstep: 1618.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 05:25:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.49 | bwd_microstep: 1662.05 | bwd_inner_microstep: 1662.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3820 [2024-06-10 05:25:59,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1507.71 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 05:26:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267 [2024-06-10 05:26:03,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.79 | bwd_microstep: 974.65 | bwd_inner_microstep: 974.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 05:26:04,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1255.55 | bwd_inner_microstep: 1255.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 05:26:07,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.09 | bwd_microstep: 1502.94 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2270 [2024-06-10 05:26:08,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.10 | bwd_microstep: 1066.66 | bwd_inner_microstep: 1066.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 05:26:10,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1454.11 | bwd_inner_microstep: 1454.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 05:26:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 05:26:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 3339.47 | bwd_inner_microstep: 1533.02 | bwd_allreduce_microstep: 1806.40 | step_microstep: 38.75 [2024-06-10 05:26:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16339.14 | bwd: 45688.31 | bwd_inner: 43880.96 | bwd_allreduce: 1806.64 | step: 40.40 {'loss': 1.3183, 'learning_rate': 3.828918253078448e-05, 'epoch': 0.16} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-10 05:26:15,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.08 | bwd_microstep: 676.28 | bwd_inner_microstep: 676.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 05:26:17,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1276.42 | bwd_inner_microstep: 1276.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3772 [2024-06-10 05:26:19,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1503.15 | bwd_inner_microstep: 1503.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 05:26:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1486.86 | bwd_inner_microstep: 1486.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 05:26:23,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.37 | bwd_microstep: 1527.25 | bwd_inner_microstep: 1527.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775 [2024-06-10 05:26:25,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1448.46 | bwd_inner_microstep: 1448.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734 [2024-06-10 05:26:27,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1431.16 | bwd_inner_microstep: 1431.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 05:26:29,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 05:26:31,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.38 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 05:26:33,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1481.94 | bwd_inner_microstep: 1481.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 05:26:35,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972 [2024-06-10 05:26:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.86 | bwd_microstep: 891.24 | bwd_inner_microstep: 891.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 05:26:38,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1390.79 | bwd_inner_microstep: 1390.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 05:26:40,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3635 [2024-06-10 05:26:42,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.01 | bwd_microstep: 1531.93 | bwd_inner_microstep: 1531.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 05:26:44,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1500.22 | bwd_inner_microstep: 1500.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 670 [2024-06-10 05:26:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 111.60 | bwd_microstep: 279.89 | bwd_inner_microstep: 279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683 [2024-06-10 05:26:46,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 05:26:48,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.74 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3710 [2024-06-10 05:26:51,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.23 | bwd_microstep: 1600.83 | bwd_inner_microstep: 1600.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 05:26:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 802.74 | bwd_inner_microstep: 802.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 05:26:54,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1513.53 | bwd_inner_microstep: 1513.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 05:26:56,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.26 | bwd_microstep: 1531.65 | bwd_inner_microstep: 1531.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3523 [2024-06-10 05:26:58,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.49 | bwd_microstep: 1344.09 | bwd_inner_microstep: 1344.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828 [2024-06-10 05:27:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1490.68 | bwd_inner_microstep: 1490.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 05:27:02,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1324.56 | bwd_inner_microstep: 1324.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 05:27:04,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.59 | bwd_microstep: 1405.10 | bwd_inner_microstep: 1405.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2027 [2024-06-10 05:27:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.05 | bwd_microstep: 746.37 | bwd_inner_microstep: 746.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3417 [2024-06-10 05:27:06,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.63 | bwd_microstep: 1213.46 | bwd_inner_microstep: 1213.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 05:27:09,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.87 | bwd_microstep: 1544.55 | bwd_inner_microstep: 1544.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3573 [2024-06-10 05:27:11,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1662.86 | bwd_inner_microstep: 1662.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-10 05:27:15,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 05:27:15,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 3297.59 | bwd_inner_microstep: 1739.58 | bwd_allreduce_microstep: 1557.95 | step_microstep: 38.92 [2024-06-10 05:27:15,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15983.97 | bwd: 44490.47 | bwd_inner: 42931.51 | bwd_allreduce: 1558.23 | step: 40.50 {'loss': 1.2838, 'learning_rate': 3.8273961153353296e-05, 'epoch': 0.16} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-10 05:27:16,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.84 | bwd_microstep: 1141.60 | bwd_inner_microstep: 1141.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2450 [2024-06-10 05:27:18,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.63 | bwd_microstep: 1016.78 | bwd_inner_microstep: 1016.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 05:27:20,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1655.22 | bwd_inner_microstep: 1655.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793 [2024-06-10 05:27:22,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.11 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-10 05:27:24,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.24 | bwd_microstep: 1438.80 | bwd_inner_microstep: 1438.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 05:27:26,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.31 | bwd_microstep: 1248.97 | bwd_inner_microstep: 1248.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730 [2024-06-10 05:27:28,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.34 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927 [2024-06-10 05:27:29,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.65 | bwd_microstep: 727.43 | bwd_inner_microstep: 727.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 05:27:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1256.44 | bwd_inner_microstep: 1256.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 05:27:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.15 | bwd_microstep: 1256.86 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 05:27:35,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.47 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2012 [2024-06-10 05:27:36,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.43 | bwd_microstep: 773.47 | bwd_inner_microstep: 773.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2684 [2024-06-10 05:27:37,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.73 | bwd_microstep: 1220.30 | bwd_inner_microstep: 1220.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465 [2024-06-10 05:27:39,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.10 | bwd_microstep: 1346.43 | bwd_inner_microstep: 1346.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 05:27:41,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3542 [2024-06-10 05:27:43,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001 [2024-06-10 05:27:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.65 | bwd_microstep: 897.95 | bwd_inner_microstep: 897.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 05:27:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 05:27:49,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.84 | bwd_microstep: 1609.52 | bwd_inner_microstep: 1609.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 05:27:51,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 1400.75 | bwd_inner_microstep: 1400.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 05:27:52,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 05:27:54,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1379.74 | bwd_inner_microstep: 1379.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 05:27:57,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.82 | bwd_microstep: 1593.61 | bwd_inner_microstep: 1593.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772 [2024-06-10 05:27:58,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.43 | bwd_microstep: 1251.74 | bwd_inner_microstep: 1251.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729 [2024-06-10 05:28:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.17 | bwd_microstep: 1438.63 | bwd_inner_microstep: 1438.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 05:28:02,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.17 | bwd_microstep: 1500.37 | bwd_inner_microstep: 1500.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 05:28:04,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1536.81 | bwd_inner_microstep: 1536.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 05:28:06,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1248.91 | bwd_inner_microstep: 1248.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 05:28:08,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1557.55 | bwd_inner_microstep: 1557.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379 [2024-06-10 05:28:10,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.02 | bwd_microstep: 1242.65 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 05:28:12,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1407.28 | bwd_inner_microstep: 1407.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3771 [2024-06-10 05:28:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.12 | optimizer_gradients: 4.40 | optimizer_step: 6.60 [2024-06-10 05:28:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.92 | bwd_microstep: 27606.96 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 25935.78 | step_microstep: 41.22 [2024-06-10 05:28:40,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16112.73 | bwd: 69052.25 | bwd_inner: 43115.52 | bwd_allreduce: 25936.02 | step: 42.92 16%|█▌ | 270/1726 [4:44:42<25:13:22, 62.36s/it] 16%|█▌ | 271/1726 [4:45:44<25:06:02, 62.10s/it] 16%|█▌ | 271/1726 [4:45:44<25:06:02, 62.10s/it] 16%|█▌ | 272/1726 [4:46:48<25:16:03, 62.56s/it] 16%|█▌ | 272/1726 [4:46:48<25:16:03, 62.56s/it] 16%|█▌ | 273/1726 [4:47:48<25:01:26, 62.00s/it] 16%|█▌ | 273/1726 [4:47:48<25:01:26, 62.00s/it] 16%|█▌ | 274/1726 [4:48:51<25:03:11, 62.12s/it] 16%|█▌ | 274/1726 [4:48:51<25:03:11, 62.12s/it] 16%|█▌ | 275/1726 [4:49:51<24:52:45, 61.73s/it] 16%|█▌ | 275/1726 [4:49:51<24:52:45, 61.73s/it] 16%|█▌ | 276/1726 [4:51:{'loss': 1.267, 'learning_rate': 3.825867541512593e-05, 'epoch': 0.16} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450 [2024-06-10 05:28:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.04 | bwd_microstep: 1362.93 | bwd_inner_microstep: 1362.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3896 [2024-06-10 05:28:44,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.89 | bwd_microstep: 1675.89 | bwd_inner_microstep: 1675.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 05:28:46,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1372.08 | bwd_inner_microstep: 1372.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 05:28:48,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1471.34 | bwd_inner_microstep: 1471.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928 [2024-06-10 05:28:49,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.29 | bwd_microstep: 786.69 | bwd_inner_microstep: 786.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 05:28:51,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1375.04 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 05:28:53,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 05:28:55,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1378.46 | bwd_inner_microstep: 1378.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-10 05:28:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 1528.54 | bwd_inner_microstep: 1528.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 05:28:59,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489 [2024-06-10 05:29:01,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1442.40 | bwd_inner_microstep: 1442.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 05:29:03,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.46 | bwd_microstep: 1476.08 | bwd_inner_microstep: 1476.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-10 05:29:04,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 890.88 | bwd_inner_microstep: 890.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-10 05:29:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.11 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451 [2024-06-10 05:29:08,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1413.41 | bwd_inner_microstep: 1413.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 05:29:10,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 795.89 | bwd_inner_microstep: 795.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 05:29:10,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 699.12 | bwd_inner_microstep: 699.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 05:29:12,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1293.46 | bwd_inner_microstep: 1293.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3532 [2024-06-10 05:29:14,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1229.54 | bwd_inner_microstep: 1229.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521 [2024-06-10 05:29:16,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1422.39 | bwd_inner_microstep: 1422.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 05:29:18,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1346.62 | bwd_inner_microstep: 1346.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 05:29:20,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1293.99 | bwd_inner_microstep: 1293.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 05:29:22,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1451.84 | bwd_inner_microstep: 1451.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3560 [2024-06-10 05:29:24,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1546.76 | bwd_inner_microstep: 1546.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012 [2024-06-10 05:29:25,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.37 | bwd_microstep: 837.02 | bwd_inner_microstep: 836.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 05:29:27,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1351.19 | bwd_inner_microstep: 1351.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3760 [2024-06-10 05:29:29,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.48 | bwd_microstep: 1603.56 | bwd_inner_microstep: 1603.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581 [2024-06-10 05:29:31,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.94 | bwd_microstep: 1697.44 | bwd_inner_microstep: 1697.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 05:29:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1493.54 | bwd_inner_microstep: 1493.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547 [2024-06-10 05:29:35,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1519.07 | bwd_inner_microstep: 1519.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 05:29:38,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 05:29:44,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 05:29:44,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 5480.40 | bwd_inner_microstep: 1438.82 | bwd_allreduce_microstep: 4041.52 | step_microstep: 38.86 [2024-06-10 05:29:44,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16039.98 | bwd: 46934.22 | bwd_inner: 42891.71 | bwd_allreduce: 4041.81 | step: 40.45 {'loss': 1.2909, 'learning_rate': 3.8243325369938674e-05, 'epoch': 0.16} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 05:29:45,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.26 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.85 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3908 [2024-06-10 05:29:47,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.65 | bwd_microstep: 1420.09 | bwd_inner_microstep: 1420.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3459 [2024-06-10 05:29:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 05:29:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1648.91 | bwd_inner_microstep: 1648.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 05:29:53,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1546.00 | bwd_inner_microstep: 1545.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 05:29:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1283.39 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233 [2024-06-10 05:29:56,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.74 | bwd_microstep: 959.95 | bwd_inner_microstep: 959.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-10 05:29:58,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.07 | bwd_microstep: 1187.57 | bwd_inner_microstep: 1187.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 05:30:00,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2091 [2024-06-10 05:30:01,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.96 | bwd_microstep: 731.29 | bwd_inner_microstep: 731.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 05:30:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-10 05:30:05,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1449.54 | bwd_inner_microstep: 1449.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677 [2024-06-10 05:30:07,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.87 | bwd_microstep: 1719.88 | bwd_inner_microstep: 1719.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3517 [2024-06-10 05:30:09,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1418.99 | bwd_inner_microstep: 1418.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618 [2024-06-10 05:30:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.69 | bwd_microstep: 1312.74 | bwd_inner_microstep: 1312.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 05:30:13,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 05:30:15,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1399.00 | bwd_inner_microstep: 1398.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 05:30:17,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1515.75 | bwd_inner_microstep: 1515.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 05:30:18,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642 [2024-06-10 05:30:20,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2268 [2024-06-10 05:30:22,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.97 | bwd_microstep: 1005.25 | bwd_inner_microstep: 1005.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-10 05:30:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1408.00 | bwd_inner_microstep: 1407.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 05:30:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.70 | bwd_microstep: 1559.15 | bwd_inner_microstep: 1559.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673 [2024-06-10 05:30:28,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1328.93 | bwd_inner_microstep: 1328.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 05:30:30,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1496.27 | bwd_inner_microstep: 1496.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 05:30:32,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1376.77 | bwd_inner_microstep: 1376.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469 [2024-06-10 05:30:34,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1538.38 | bwd_inner_microstep: 1538.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466 [2024-06-10 05:30:36,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819 [2024-06-10 05:30:38,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1517.88 | bwd_inner_microstep: 1517.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817 [2024-06-10 05:30:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.14 | bwd_microstep: 1752.75 | bwd_inner_microstep: 1752.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184 [2024-06-10 05:30:42,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.02 | bwd_microstep: 953.05 | bwd_inner_microstep: 953.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 05:30:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 05:30:44,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 2179.18 | bwd_inner_microstep: 1616.14 | bwd_allreduce_microstep: 562.99 | step_microstep: 38.35 [2024-06-10 05:30:44,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16263.85 | bwd: 44108.12 | bwd_inner: 43544.10 | bwd_allreduce: 563.28 | step: 39.92 {'loss': 1.3306, 'learning_rate': 3.82279110718543e-05, 'epoch': 0.16} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3449 [2024-06-10 05:30:46,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.96 | bwd_microstep: 1475.14 | bwd_inner_microstep: 1475.08 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871 [2024-06-10 05:30:48,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.62 | bwd_microstep: 1463.19 | bwd_inner_microstep: 1463.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 05:30:50,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.06 | bwd_microstep: 1489.16 | bwd_inner_microstep: 1489.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 05:30:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1390.07 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 05:30:54,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.86 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 05:30:56,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 05:30:58,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.15 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 05:30:59,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 792.99 | bwd_inner_microstep: 792.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613 [2024-06-10 05:31:01,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.15 | bwd_microstep: 1216.69 | bwd_inner_microstep: 1216.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 05:31:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.05 | bwd_inner_microstep: 1385.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495 [2024-06-10 05:31:04,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1410.39 | bwd_inner_microstep: 1410.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2681 [2024-06-10 05:31:06,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.36 | bwd_microstep: 1075.11 | bwd_inner_microstep: 1075.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3419 [2024-06-10 05:31:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.54 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3626 [2024-06-10 05:31:10,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.76 | bwd_microstep: 1636.23 | bwd_inner_microstep: 1636.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 05:31:12,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.64 | bwd_microstep: 1618.43 | bwd_inner_microstep: 1618.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-10 05:31:14,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 05:31:16,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.07 | bwd_microstep: 1483.83 | bwd_inner_microstep: 1483.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 05:31:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989 [2024-06-10 05:31:20,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.27 | bwd_microstep: 896.96 | bwd_inner_microstep: 896.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 05:31:22,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1552.76 | bwd_inner_microstep: 1552.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 05:31:24,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1500.30 | bwd_inner_microstep: 1500.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817 [2024-06-10 05:31:26,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.92 | bwd_microstep: 1579.81 | bwd_inner_microstep: 1579.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 05:31:28,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.76 | bwd_microstep: 1284.25 | bwd_inner_microstep: 1284.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3552 [2024-06-10 05:31:30,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.08 | bwd_microstep: 1341.66 | bwd_inner_microstep: 1341.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3628 [2024-06-10 05:31:32,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1346.84 | bwd_inner_microstep: 1346.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 05:31:34,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1547.57 | bwd_inner_microstep: 1547.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718 [2024-06-10 05:31:36,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.61 | bwd_microstep: 1336.66 | bwd_inner_microstep: 1336.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 05:31:37,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 974.32 | bwd_inner_microstep: 974.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 05:31:39,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.75 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 05:31:41,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1505.17 | bwd_inner_microstep: 1505.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 05:31:43,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.41 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 05:31:49,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.62 [2024-06-10 05:31:49,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.85 | bwd_microstep: 5829.98 | bwd_inner_microstep: 1450.02 | bwd_allreduce_microstep: 4379.84 | step_microstep: 39.54 [2024-06-10 05:31:49,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16435.14 | bwd: 48334.90 | bwd_inner: 43954.03 | bwd_allreduce: 4380.15 | step: 41.13 {'loss': 1.3171, 'learning_rate': 3.821243257516188e-05, 'epoch': 0.16} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1865 [2024-06-10 05:31:50,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.13 | bwd_microstep: 763.35 | bwd_inner_microstep: 763.21 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 05:31:52,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1272.35 | bwd_inner_microstep: 1272.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878 [2024-06-10 05:31:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1579.28 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 05:31:57,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.61 | bwd_microstep: 1548.23 | bwd_inner_microstep: 1548.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502 [2024-06-10 05:31:59,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1416.95 | bwd_inner_microstep: 1416.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 05:32:00,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.92 | bwd_microstep: 696.77 | bwd_inner_microstep: 696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 05:32:02,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 05:32:03,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1378.13 | bwd_inner_microstep: 1378.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897 [2024-06-10 05:32:04,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 713.48 | bwd_inner_microstep: 713.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 05:32:06,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1410.91 | bwd_inner_microstep: 1410.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442 [2024-06-10 05:32:08,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.97 | bwd_microstep: 1190.38 | bwd_inner_microstep: 1190.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2397 [2024-06-10 05:32:09,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.97 | bwd_microstep: 931.93 | bwd_inner_microstep: 931.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1339 [2024-06-10 05:32:10,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 211.11 | bwd_microstep: 546.28 | bwd_inner_microstep: 546.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1981 [2024-06-10 05:32:11,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.79 | bwd_microstep: 895.26 | bwd_inner_microstep: 895.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379 [2024-06-10 05:32:13,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.31 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 05:32:15,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1489.04 | bwd_inner_microstep: 1489.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 05:32:17,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1276.09 | bwd_inner_microstep: 1276.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 05:32:19,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1356.17 | bwd_inner_microstep: 1356.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 05:32:21,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 05:32:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.13 | bwd_microstep: 1642.62 | bwd_inner_microstep: 1642.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 05:32:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3859 [2024-06-10 05:32:28,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.94 | bwd_microstep: 1696.10 | bwd_inner_microstep: 1696.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 05:32:30,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1644.91 | bwd_inner_microstep: 1644.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 05:32:32,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.94 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 05:32:34,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1312.08 | bwd_inner_microstep: 1312.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 05:32:36,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1393.82 | bwd_inner_microstep: 1393.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 05:32:38,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.08 | bwd_microstep: 1601.90 | bwd_inner_microstep: 1601.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-10 05:32:40,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1289.43 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 05:32:42,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.23 | bwd_microstep: 1542.90 | bwd_inner_microstep: 1542.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 05:32:44,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.56 | bwd_microstep: 1302.58 | bwd_inner_microstep: 1302.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 05:32:45,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1294.13 | bwd_inner_microstep: 1294.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772 [2024-06-10 05:32:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.38 | optimizer_step: 6.61 [2024-06-10 05:32:50,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.04 | bwd_microstep: 3859.09 | bwd_inner_microstep: 1630.91 | bwd_allreduce_microstep: 2228.10 | step_microstep: 39.50 [2024-06-10 05:32:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15736.75 | bwd: 44341.39 | bwd_inner: 42112.25 | bwd_allreduce: 2228.40 | step: 41.14 {'loss': 1.2995, 'learning_rate': 3.8196889934376617e-05, 'epoch': 0.16} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4264 [2024-06-10 05:32:52,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.93 | bwd_microstep: 1490.21 | bwd_inner_microstep: 1490.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 05:32:54,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.20 | bwd_microstep: 1478.07 | bwd_inner_microstep: 1478.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 05:32:56,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1648.52 | bwd_inner_microstep: 1648.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 05:32:58,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1181.66 | bwd_inner_microstep: 1181.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-10 05:32:59,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.77 | bwd_microstep: 875.38 | bwd_inner_microstep: 875.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 05:33:01,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.83 | bwd_microstep: 1180.84 | bwd_inner_microstep: 1180.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 05:33:03,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:33:05,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 05:33:06,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1248.60 | bwd_inner_microstep: 1248.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669 [2024-06-10 05:33:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1523.84 | bwd_inner_microstep: 1523.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-10 05:33:11,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1633.03 | bwd_inner_microstep: 1633.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422 [2024-06-10 05:33:13,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.16 | bwd_microstep: 1374.29 | bwd_inner_microstep: 1374.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-10 05:33:15,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.67 | bwd_microstep: 1639.49 | bwd_inner_microstep: 1639.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 05:33:17,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1247.48 | bwd_inner_microstep: 1247.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3505 [2024-06-10 05:33:18,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.84 | bwd_microstep: 1337.96 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050 [2024-06-10 05:33:20,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.96 | bwd_microstep: 912.86 | bwd_inner_microstep: 912.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3711 [2024-06-10 05:33:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.05 | bwd_microstep: 1550.66 | bwd_inner_microstep: 1550.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138 [2024-06-10 05:33:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.94 | bwd_microstep: 928.31 | bwd_inner_microstep: 928.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 05:33:25,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.74 | bwd_microstep: 1509.01 | bwd_inner_microstep: 1508.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 05:33:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1454.72 | bwd_inner_microstep: 1454.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958 [2024-06-10 05:33:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.96 | bwd_microstep: 703.44 | bwd_inner_microstep: 703.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 05:33:30,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 05:33:32,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.04 | bwd_microstep: 1557.56 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3613 [2024-06-10 05:33:34,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.78 | bwd_microstep: 1312.99 | bwd_inner_microstep: 1312.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3549 [2024-06-10 05:33:36,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1429.42 | bwd_inner_microstep: 1429.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 05:33:38,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3606 [2024-06-10 05:33:40,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1576.12 | bwd_inner_microstep: 1576.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2233 [2024-06-10 05:33:42,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.29 | bwd_microstep: 897.67 | bwd_inner_microstep: 897.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 05:33:44,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1555.72 | bwd_inner_microstep: 1555.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 05:33:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1408.47 | bwd_inner_microstep: 1408.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 05:33:48,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.28 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 05:33:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-10 05:33:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1466.56 | bwd_allreduce_microstep: 8.41 | step_microstep: 38.32 [2024-06-10 05:33:50,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.37 | bwd: 43215.96 | bwd_inner: 43206.64 | bwd_allreduce: 8.64 | step: 40.03 {'loss': 1.3246, 'learning_rate': 3.81812832042396e-05, 'epoch': 0.16} 17<27:44:16, 68.87s/it] 16%|█▌ | 276/1726 [4:51:17<27:44:16, 68.87s/it] 16%|█▌ | 277/1726 [4:52:20<27:03:01, 67.21s/it] 16%|█▌ | 277/1726 [4:52:20<27:03:01, 67.21s/it] 16%|█▌ | 278/1726 [4:53:21<26:14:57, 65.26s/it] 16%|█▌ | 278/1726 [4:53:21<26:14:57, 65.26s/it] 16%|█▌ | 279/1726 [4:54:26<26:12:54, 65.22s/it] 16%|█▌ | 279/1726 [4:54:26<26:12:54, 65.22s/it] 16%|█▌ | 280/1726 [4:55:27<25:37:10, 63.78s/it] 16%|█▌ | 280/1726 [4:55:27<25:37:10, 63.78s/it] 16%|█▋ | 281/1726 [4:56:26<25:06:46, 62.56s/it] 16%|█▋ | 281/1726 [4:56:26<25:06:46, 62.56s/itdynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448 [2024-06-10 05:33:52,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1408.81 | bwd_inner_microstep: 1408.73 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3475 [2024-06-10 05:33:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.54 | bwd_microstep: 1444.29 | bwd_inner_microstep: 1444.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 05:33:55,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.28 | bwd_inner_microstep: 1379.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 05:33:57,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.54 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 05:33:59,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.52 | bwd_microstep: 1351.64 | bwd_inner_microstep: 1351.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759 [2024-06-10 05:34:02,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.36 | bwd_microstep: 1636.31 | bwd_inner_microstep: 1636.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420 [2024-06-10 05:34:03,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.79 | bwd_microstep: 1216.04 | bwd_inner_microstep: 1216.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2476 [2024-06-10 05:34:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.95 | bwd_microstep: 982.34 | bwd_inner_microstep: 982.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 05:34:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 05:34:09,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1392.67 | bwd_inner_microstep: 1392.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3687 [2024-06-10 05:34:11,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.11 | bwd_microstep: 1674.38 | bwd_inner_microstep: 1674.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2016 [2024-06-10 05:34:12,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.84 | bwd_microstep: 777.00 | bwd_inner_microstep: 776.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487 [2024-06-10 05:34:14,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.36 | bwd_microstep: 1580.44 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3489 [2024-06-10 05:34:16,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1348.54 | bwd_inner_microstep: 1348.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3596 [2024-06-10 05:34:18,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.99 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 05:34:20,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.32 | bwd_microstep: 1392.31 | bwd_inner_microstep: 1392.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3471 [2024-06-10 05:34:22,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1532.66 | bwd_inner_microstep: 1532.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 05:34:24,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.10 | bwd_microstep: 1492.84 | bwd_inner_microstep: 1492.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 05:34:26,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1408.42 | bwd_inner_microstep: 1408.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 05:34:28,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1394.78 | bwd_inner_microstep: 1394.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3857 [2024-06-10 05:34:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.56 | bwd_microstep: 1568.01 | bwd_inner_microstep: 1567.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 05:34:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1294.72 | bwd_inner_microstep: 1294.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 05:34:34,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2079 [2024-06-10 05:34:35,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.47 | bwd_microstep: 1012.70 | bwd_inner_microstep: 1012.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3536 [2024-06-10 05:34:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1591.24 | bwd_inner_microstep: 1591.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-10 05:34:39,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.67 | bwd_microstep: 1360.89 | bwd_inner_microstep: 1360.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434 [2024-06-10 05:34:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1374.43 | bwd_inner_microstep: 1374.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559 [2024-06-10 05:34:43,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.26 | bwd_microstep: 1428.13 | bwd_inner_microstep: 1428.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3119 [2024-06-10 05:34:45,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.65 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 05:34:47,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1452.39 | bwd_inner_microstep: 1452.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 05:34:49,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1397.71 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3428 [2024-06-10 05:34:52,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 05:34:52,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.94 | bwd_microstep: 3017.28 | bwd_inner_microstep: 1609.75 | bwd_allreduce_microstep: 1407.48 | step_microstep: 38.70 [2024-06-10 05:34:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16610.54 | bwd: 45947.77 | bwd_inner: 44539.31 | bwd_allreduce: 1407.75 | step: 40.35 {'loss': 1.3765, 'learning_rate': 3.816561243971765e-05, 'epoch': 0.16} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1993 [2024-06-10 05:34:54,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.33 | bwd_microstep: 857.89 | bwd_inner_microstep: 857.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938 [2024-06-10 05:34:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 696.29 | bwd_inner_microstep: 696.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3420 [2024-06-10 05:34:57,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1411.63 | bwd_inner_microstep: 1411.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 05:34:59,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 05:35:00,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.81 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485 [2024-06-10 05:35:02,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1345.39 | bwd_inner_microstep: 1345.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 05:35:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 05:35:06,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.30 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 05:35:08,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.64 | bwd_microstep: 1385.59 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 05:35:10,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940 [2024-06-10 05:35:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.94 | bwd_microstep: 889.38 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729 [2024-06-10 05:35:13,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1626.82 | bwd_inner_microstep: 1626.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 05:35:15,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2612 [2024-06-10 05:35:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.67 | bwd_microstep: 1110.28 | bwd_inner_microstep: 1110.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 05:35:19,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1483.08 | bwd_inner_microstep: 1483.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-10 05:35:21,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.22 | bwd_microstep: 1707.44 | bwd_inner_microstep: 1707.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491 [2024-06-10 05:35:23,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-10 05:35:24,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.81 | bwd_microstep: 680.88 | bwd_inner_microstep: 680.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653 [2024-06-10 05:35:26,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.45 | bwd_microstep: 1326.43 | bwd_inner_microstep: 1326.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3826 [2024-06-10 05:35:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1389.35 | bwd_inner_microstep: 1389.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 05:35:29,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1291.33 | bwd_inner_microstep: 1291.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 05:35:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1405.86 | bwd_inner_microstep: 1405.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-10 05:35:33,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1425.99 | bwd_inner_microstep: 1425.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 05:35:36,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.51 | bwd_microstep: 1663.44 | bwd_inner_microstep: 1663.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 05:35:37,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1411.13 | bwd_inner_microstep: 1411.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291 [2024-06-10 05:35:39,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.90 | bwd_microstep: 882.20 | bwd_inner_microstep: 882.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-10 05:35:41,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.92 | bwd_microstep: 1359.45 | bwd_inner_microstep: 1359.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577 [2024-06-10 05:35:43,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1423.88 | bwd_inner_microstep: 1423.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2384 [2024-06-10 05:35:44,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.14 | bwd_microstep: 1126.37 | bwd_inner_microstep: 1126.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 05:35:46,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.17 | bwd_microstep: 1345.07 | bwd_inner_microstep: 1345.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1924 [2024-06-10 05:35:47,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.38 | bwd_microstep: 776.30 | bwd_inner_microstep: 776.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433 [2024-06-10 05:35:55,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.39 | optimizer_step: 6.59 [2024-06-10 05:35:55,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.48 | bwd_microstep: 7231.55 | bwd_inner_microstep: 1812.24 | bwd_allreduce_microstep: 5419.24 | step_microstep: 39.68 [2024-06-10 05:35:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15330.94 | bwd: 46668.79 | bwd_inner: 41248.49 | bwd_allreduce: 5419.55 | step: 41.25 {'loss': 1.332, 'learning_rate': 3.814987769600312e-05, 'epoch': 0.16} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3417 [2024-06-10 05:35:57,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1396.83 | bwd_inner_microstep: 1396.73 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393 [2024-06-10 05:35:59,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851 [2024-06-10 05:36:01,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.65 | bwd_microstep: 1556.43 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 05:36:03,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1456.50 | bwd_inner_microstep: 1456.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 05:36:05,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.80 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 05:36:06,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1247.12 | bwd_inner_microstep: 1247.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 05:36:08,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.93 | bwd_microstep: 793.63 | bwd_inner_microstep: 793.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 05:36:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422 [2024-06-10 05:36:11,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1375.15 | bwd_inner_microstep: 1375.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-10 05:36:13,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.37 | bwd_microstep: 1636.07 | bwd_inner_microstep: 1636.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569 [2024-06-10 05:36:15,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083 [2024-06-10 05:36:17,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.56 | bwd_microstep: 916.86 | bwd_inner_microstep: 916.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 05:36:19,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1431.43 | bwd_inner_microstep: 1431.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3436 [2024-06-10 05:36:21,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 05:36:23,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1391.84 | bwd_inner_microstep: 1391.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 05:36:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1508.43 | bwd_inner_microstep: 1508.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 05:36:27,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1408.65 | bwd_inner_microstep: 1408.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 05:36:28,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 05:36:30,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.91 | bwd_microstep: 1394.99 | bwd_inner_microstep: 1394.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506 [2024-06-10 05:36:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1194.40 | bwd_inner_microstep: 1194.14 | bwd_allreduce_microstep: 0.17 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 05:36:33,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 797.57 | bwd_inner_microstep: 797.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185 [2024-06-10 05:36:34,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.26 | bwd_microstep: 889.99 | bwd_inner_microstep: 889.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 05:36:37,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.56 | bwd_microstep: 1560.31 | bwd_inner_microstep: 1560.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 05:36:39,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3558 [2024-06-10 05:36:41,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1463.39 | bwd_inner_microstep: 1463.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771 [2024-06-10 05:36:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.51 | bwd_microstep: 1740.96 | bwd_inner_microstep: 1740.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-10 05:36:45,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1404.71 | bwd_inner_microstep: 1404.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-10 05:36:47,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1363.33 | bwd_inner_microstep: 1363.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3580 [2024-06-10 05:36:49,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.73 | bwd_microstep: 1242.17 | bwd_inner_microstep: 1242.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402 [2024-06-10 05:36:50,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.38 | bwd_microstep: 1374.03 | bwd_inner_microstep: 1374.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2982 [2024-06-10 05:36:52,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.77 | bwd_microstep: 1336.73 | bwd_inner_microstep: 1336.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3606 [2024-06-10 05:36:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-10 05:36:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.43 | bwd_microstep: 2287.03 | bwd_inner_microstep: 1823.12 | bwd_allreduce_microstep: 463.86 | step_microstep: 38.48 [2024-06-10 05:36:55,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16191.97 | bwd: 43842.25 | bwd_inner: 43377.18 | bwd_allreduce: 464.30 | step: 40.15 {'loss': 1.3204, 'learning_rate': 3.8134079028513705e-05, 'epoch': 0.16} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 05:36:57,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.50 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 05:36:59,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.15 | bwd_microstep: 1279.16 | bwd_inner_microstep: 1279.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-10 05:37:01,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1281.16 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-10 05:37:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1216.58 | bwd_inner_microstep: 1216.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 05:37:04,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1455.52 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 05:37:05,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.89 | bwd_microstep: 797.82 | bwd_inner_microstep: 797.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732 [2024-06-10 05:37:07,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1432.40 | bwd_inner_microstep: 1432.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929 [2024-06-10 05:37:08,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.07 | bwd_microstep: 729.68 | bwd_inner_microstep: 729.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 05:37:10,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.24 | bwd_microstep: 793.02 | bwd_inner_microstep: 792.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-10 05:37:12,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3704 [2024-06-10 05:37:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1485.03 | bwd_inner_microstep: 1485.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417 [2024-06-10 05:37:16,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1372.94 | bwd_inner_microstep: 1372.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3795 [2024-06-10 05:37:18,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.41 | bwd_microstep: 1717.35 | bwd_inner_microstep: 1717.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 05:37:20,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.39 | bwd_microstep: 1381.90 | bwd_inner_microstep: 1381.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514 [2024-06-10 05:37:22,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1448.67 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425 [2024-06-10 05:37:23,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.54 | bwd_microstep: 1217.99 | bwd_inner_microstep: 1217.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-10 05:37:25,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 891.55 | bwd_inner_microstep: 891.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 05:37:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1300.18 | bwd_inner_microstep: 1300.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 05:37:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1353.09 | bwd_inner_microstep: 1353.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558 [2024-06-10 05:37:30,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1524.80 | bwd_inner_microstep: 1524.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 05:37:32,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1280.76 | bwd_inner_microstep: 1280.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 05:37:35,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1635.40 | bwd_inner_microstep: 1635.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 05:37:37,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1557.34 | bwd_inner_microstep: 1557.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 05:37:38,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 797.67 | bwd_inner_microstep: 797.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415 [2024-06-10 05:37:39,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.60 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-10 05:37:41,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1449.47 | bwd_inner_microstep: 1449.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-10 05:37:43,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.35 | bwd_microstep: 859.29 | bwd_inner_microstep: 859.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285 [2024-06-10 05:37:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.36 | bwd_microstep: 786.39 | bwd_inner_microstep: 786.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 05:37:46,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1404.77 | bwd_inner_microstep: 1404.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 05:37:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777 [2024-06-10 05:37:50,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1544.22 | bwd_inner_microstep: 1544.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-10 05:37:55,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.61 [2024-06-10 05:37:55,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 4800.80 | bwd_inner_microstep: 1811.01 | bwd_allreduce_microstep: 2989.74 | step_microstep: 38.79 [2024-06-10 05:37:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15400.87 | bwd: 44235.29 | bwd_inner: 41244.64 | bwd_allreduce: 2989.97 | step: 40.63 {'loss': 1.2906, 'learning_rate': 3.811821649289221e-05, 'epoch': 0.17} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 05:37:57,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1362.47 | bwd_inner_microstep: 1362.38 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 05:37:59,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.03 | bwd_microstep: 1273.10 | bwd_inner_microstep: 1273.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 05:38:01,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.68 | bwd_microstep: 1474.95 | bwd_inner_microstep: 1474.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 05:38:03,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1455.89 | bwd_inner_microstep: 1455.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778 [2024-06-10 05:38:05,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.00 | bwd_microstep: 1347.41 | bwd_inner_microstep: 1347.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 05:38:07,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.62 | bwd_microstep: 1530.22 | bwd_inner_microstep: 1530.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 05:38:09,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-10 05:38:10,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.58 | bwd_microstep: 1154.12 | bwd_inner_microstep: 1154.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3402 [2024-06-10 05:38:12,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1392.17 | bwd_inner_microstep: 1392.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 05:38:14,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137 [2024-06-10 05:38:15,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.48 | bwd_microstep: 927.82 | bwd_inner_microstep: 927.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 05:38:17,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1415.96 | bwd_inner_microstep: 1415.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627 [2024-06-10 05:38:20,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1606.35 | bwd_inner_microstep: 1606.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383 [2024-06-10 05:38:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1243.95 | bwd_inner_microstep: 1243.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 05:38:23,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1349.67 | bwd_inner_microstep: 1349.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 05:38:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.77 | bwd_microstep: 1409.82 | bwd_inner_microstep: 1409.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 05:38:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.17 | bwd_inner_microstep: 1283.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 05:38:29,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1509.83 | bwd_inner_microstep: 1509.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 05:38:31,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1318.82 | bwd_inner_microstep: 1318.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 05:38:33,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.35 | bwd_inner_microstep: 1297.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167 [2024-06-10 05:38:34,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.98 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 05:38:36,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3670 [2024-06-10 05:38:37,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.42 | bwd_microstep: 1389.11 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 05:38:39,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1289.62 | bwd_inner_microstep: 1289.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 05:38:41,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1390.28 | bwd_inner_microstep: 1390.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2038 [2024-06-10 05:38:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.13 | bwd_microstep: 809.82 | bwd_inner_microstep: 809.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3598 [2024-06-10 05:38:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.43 | bwd_microstep: 1674.59 | bwd_inner_microstep: 1674.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-10 05:38:46,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2070 [2024-06-10 05:38:48,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.85 | bwd_microstep: 946.81 | bwd_inner_microstep: 946.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3609 [2024-06-10 05:38:50,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.71 | bwd_microstep: 1771.59 | bwd_inner_microstep: 1771.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 05:38:52,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.60 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2888 [2024-06-10 05:38:57,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.35 | optimizer_step: 6.60 [2024-06-10 05:38:57,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.45 | bwd_microstep: 4655.40 | bwd_inner_microstep: 1303.66 | bwd_allreduce_microstep: 3351.66 | step_microstep: 39.46 [2024-06-10 05:38:57,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15892.33 | bwd: 45790.82 | bwd_inner: 42438.15 | bwd_allreduce: 3351.95 | step: 41.21 {'loss': 1.3258, 'learning_rate': 3.810229014500643e-05, 'epoch': 0.17} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 05:38:59,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1463.92 | bwd_inner_microstep: 1463.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887 [2024-06-10 05:39:02,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.64 | bwd_microstep: 1681.83 | bwd_inner_microstep: 1681.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 05:39:04,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.15 | bwd_microstep: 1656.80 | bwd_inner_microstep: 1656.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 05:39:06,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4136 [2024-06-10 05:39:08,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.10 | bwd_microstep: 1643.12 | bwd_inner_microstep: 1643.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 05:39:10,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.21 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754 [2024-06-10 05:39:12,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.28 | bwd_microstep: 1443.83 | bwd_inner_microstep: 1443.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3922 [2024-06-10 05:39:14,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.51 | bwd_microstep: 1594.03 | bwd_inner_microstep: 1594.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 05:39:16,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.53 | bwd_microstep: 1256.63 | bwd_inner_microstep: 1256.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-10 05:39:17,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 701.65 | bwd_inner_microstep: 701.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 05:39:18,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.12 | bwd_microstep: 1153.02 | bwd_inner_microstep: 1152.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 05:39:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.08 | bwd_microstep: 1192.22 | bwd_inner_microstep: 1192.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 05:39:22,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.23 | bwd_microstep: 1407.62 | bwd_inner_microstep: 1407.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583 [2024-06-10 05:39:24,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.17 | bwd_microstep: 1465.25 | bwd_inner_microstep: 1465.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399 [2024-06-10 05:39:26,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.06 | bwd_microstep: 1392.07 | bwd_inner_microstep: 1392.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3203 [2024-06-10 05:39:27,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.65 | bwd_microstep: 1143.41 | bwd_inner_microstep: 1143.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3614 [2024-06-10 05:39:29,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1276.44 | bwd_inner_microstep: 1276.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-10 05:39:31,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.48 | bwd_microstep: 923.05 | bwd_inner_microstep: 923.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 05:39:32,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.76 | bwd_microstep: 1256.55 | bwd_inner_microstep: 1256.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820 [2024-06-10 05:39:34,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.72 | bwd_microstep: 1590.30 | bwd_inner_microstep: 1590.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891 [2024-06-10 05:39:35,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.29 | bwd_microstep: 720.60 | bwd_inner_microstep: 720.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 05:39:37,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1402.22 | bwd_inner_microstep: 1402.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 05:39:39,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 05:39:41,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1560.81 | bwd_inner_microstep: 1560.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604 [2024-06-10 05:39:44,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1642.42 | bwd_inner_microstep: 1642.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-10 05:39:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.95 | bwd_microstep: 1422.85 | bwd_inner_microstep: 1422.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-10 05:39:48,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.10 | bwd_microstep: 1431.60 | bwd_inner_microstep: 1431.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-10 05:39:50,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1445.69 | bwd_inner_microstep: 1445.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2101 [2024-06-10 05:39:51,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.62 | bwd_microstep: 922.98 | bwd_inner_microstep: 922.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3588 [2024-06-10 05:39:53,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.30 | bwd_microstep: 1637.37 | bwd_inner_microstep: 1637.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805 [2024-06-10 05:39:56,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.18 | bwd_microstep: 1752.00 | bwd_inner_microstep: 1751.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 05:39:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.16 | optimizer_step: 6.58 [2024-06-10 05:39:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1948.86 | bwd_inner_microstep: 1398.87 | bwd_allreduce_microstep: 549.95 | step_microstep: 38.51 [2024-06-10 05:39:58,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16279.02 | bwd: 44164.41 | bwd_inner: 43613.52 | bwd_allreduce: 550.18 | step: 40.20 {'loss': 1.3549, 'learning_rate': 3.8086300040948854e-05, 'epoch': 0.17} ] 16%|█▋ | 282/1726 [4:57:29<25:08:18, 62.67s/it] 16%|█▋ | 282/1726 [4:57:29<25:08:18, 62.67s/it] 16%|█▋ | 283/1726 [4:58:32<25:04:54, 62.57s/it] 16%|█▋ | 283/1726 [4:58:32<25:04:54, 62.57s/it] 16%|█▋ | 284/1726 [4:59:32<24:48:09, 61.92s/it] 16%|█▋ | 284/1726 [4:59:32<24:48:09, 61.92s/it] 17%|█▋ | 285/1726 [5:00:32<24:33:20, 61.35s/it] 17%|█▋ | 285/1726 [5:00:32<24:33:20, 61.35s/it] 17%|█▋ | 286/1726 [5:01:34<24:37:18, 61.55s/it] 17%|█▋ | 286/1726 [5:01:34<24:37:18, 61.55s/it] 17%|█▋ | 287/1726 [5:02:35<24:30:48, 61.33s/it] 17%|█▋ |dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519 [2024-06-10 05:40:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1582.20 | bwd_inner_microstep: 1582.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 05:40:02,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1395.32 | bwd_inner_microstep: 1395.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3887 [2024-06-10 05:40:04,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1387.21 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3404 [2024-06-10 05:40:06,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1198.61 | bwd_inner_microstep: 1198.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240 [2024-06-10 05:40:07,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.03 | bwd_microstep: 961.36 | bwd_inner_microstep: 961.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-10 05:40:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1533.35 | bwd_inner_microstep: 1533.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484 [2024-06-10 05:40:11,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1316.00 | bwd_inner_microstep: 1315.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 05:40:13,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1389.26 | bwd_inner_microstep: 1389.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:40:15,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1392.32 | bwd_inner_microstep: 1392.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 05:40:17,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 05:40:19,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1395.99 | bwd_inner_microstep: 1395.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3513 [2024-06-10 05:40:20,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.25 | bwd_microstep: 1224.77 | bwd_inner_microstep: 1224.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695 [2024-06-10 05:40:22,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.42 | bwd_microstep: 1465.31 | bwd_inner_microstep: 1465.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-10 05:40:24,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 05:40:26,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512 [2024-06-10 05:40:28,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.75 | bwd_microstep: 1588.67 | bwd_inner_microstep: 1588.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-10 05:40:30,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1607.09 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2824 [2024-06-10 05:40:32,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.91 | bwd_microstep: 1161.03 | bwd_inner_microstep: 1161.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-10 05:40:34,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1290.26 | bwd_inner_microstep: 1290.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3838 [2024-06-10 05:40:36,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.75 | bwd_microstep: 1362.53 | bwd_inner_microstep: 1362.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977 [2024-06-10 05:40:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.42 | bwd_microstep: 706.28 | bwd_inner_microstep: 706.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 05:40:39,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 05:40:41,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.88 | bwd_microstep: 1452.69 | bwd_inner_microstep: 1452.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 05:40:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.13 | bwd_microstep: 699.88 | bwd_inner_microstep: 699.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3819 [2024-06-10 05:40:43,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.13 | bwd_microstep: 1260.50 | bwd_inner_microstep: 1260.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 05:40:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1400.77 | bwd_inner_microstep: 1400.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2933 [2024-06-10 05:40:47,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.91 | bwd_microstep: 1096.09 | bwd_inner_microstep: 1096.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590 [2024-06-10 05:40:49,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.08 | bwd_microstep: 1711.36 | bwd_inner_microstep: 1711.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3585 [2024-06-10 05:40:51,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1460.92 | bwd_inner_microstep: 1460.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 05:40:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1446.59 | bwd_inner_microstep: 1446.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 05:40:55,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.01 | bwd_microstep: 1589.95 | bwd_inner_microstep: 1589.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584 [2024-06-10 05:40:58,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.59 [2024-06-10 05:40:58,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 2393.43 | bwd_inner_microstep: 1678.77 | bwd_allreduce_microstep: 714.61 | step_microstep: 38.40 [2024-06-10 05:40:58,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16120.96 | bwd: 43895.38 | bwd_inner: 43179.87 | bwd_allreduce: 714.83 | step: 40.02 {'loss': 1.2404, 'learning_rate': 3.807024623703655e-05, 'epoch': 0.17} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-10 05:41:01,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.30 | bwd_microstep: 1574.76 | bwd_inner_microstep: 1574.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 05:41:03,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1483.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796 [2024-06-10 05:41:05,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.34 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 05:41:06,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.44 | bwd_microstep: 781.52 | bwd_inner_microstep: 781.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1968 [2024-06-10 05:41:07,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.14 | bwd_microstep: 858.97 | bwd_inner_microstep: 858.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493 [2024-06-10 05:41:09,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.01 | bwd_microstep: 1217.25 | bwd_inner_microstep: 1217.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 05:41:11,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1316.86 | bwd_inner_microstep: 1316.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3909 [2024-06-10 05:41:13,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.96 | bwd_microstep: 1698.45 | bwd_inner_microstep: 1698.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433 [2024-06-10 05:41:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1286.33 | bwd_inner_microstep: 1286.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 05:41:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1347.02 | bwd_inner_microstep: 1346.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3618 [2024-06-10 05:41:18,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.74 | bwd_microstep: 1373.19 | bwd_inner_microstep: 1373.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509 [2024-06-10 05:41:21,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.27 | bwd_microstep: 1685.58 | bwd_inner_microstep: 1685.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3633 [2024-06-10 05:41:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.26 | bwd_microstep: 1678.08 | bwd_inner_microstep: 1678.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493 [2024-06-10 05:41:25,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1416.37 | bwd_inner_microstep: 1416.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 05:41:27,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.39 | bwd_microstep: 1295.71 | bwd_inner_microstep: 1295.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 05:41:29,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1397.41 | bwd_inner_microstep: 1397.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 05:41:30,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1277.89 | bwd_inner_microstep: 1277.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 05:41:32,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2338 [2024-06-10 05:41:34,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.02 | bwd_microstep: 954.45 | bwd_inner_microstep: 954.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2186 [2024-06-10 05:41:35,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.39 | bwd_microstep: 794.23 | bwd_inner_microstep: 794.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3724 [2024-06-10 05:41:37,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.33 | bwd_microstep: 1337.73 | bwd_inner_microstep: 1337.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822 [2024-06-10 05:41:39,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.56 | bwd_microstep: 1691.44 | bwd_inner_microstep: 1691.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 05:41:40,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 700.88 | bwd_inner_microstep: 700.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807 [2024-06-10 05:41:42,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1360.28 | bwd_inner_microstep: 1360.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 05:41:44,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1381.53 | bwd_inner_microstep: 1381.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 05:41:46,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1351.41 | bwd_inner_microstep: 1351.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 05:41:47,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1342.39 | bwd_inner_microstep: 1342.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3586 [2024-06-10 05:41:50,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1704.44 | bwd_inner_microstep: 1704.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2009 [2024-06-10 05:41:51,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.56 | bwd_microstep: 757.60 | bwd_inner_microstep: 757.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830 [2024-06-10 05:41:53,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.33 | bwd_microstep: 1754.86 | bwd_inner_microstep: 1754.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2698 [2024-06-10 05:41:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.25 | bwd_microstep: 1132.77 | bwd_inner_microstep: 1132.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799 [2024-06-10 05:42:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 05:42:00,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 4535.04 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 2901.45 | step_microstep: 38.67 [2024-06-10 05:42:00,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15808.87 | bwd: 45362.70 | bwd_inner: 42460.34 | bwd_allreduce: 2901.67 | step: 40.35 {'loss': 1.3264, 'learning_rate': 3.805412878981095e-05, 'epoch': 0.17} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 05:42:02,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.26 | bwd_microstep: 1268.61 | bwd_inner_microstep: 1268.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420 [2024-06-10 05:42:03,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1151.28 | bwd_inner_microstep: 1151.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4036 [2024-06-10 05:42:06,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.45 | bwd_microstep: 1720.19 | bwd_inner_microstep: 1720.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 05:42:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1382.35 | bwd_inner_microstep: 1382.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 05:42:09,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.02 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 05:42:11,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.27 | bwd_microstep: 1386.48 | bwd_inner_microstep: 1386.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3499 [2024-06-10 05:42:13,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1347.95 | bwd_inner_microstep: 1347.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3528 [2024-06-10 05:42:15,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1229.26 | bwd_inner_microstep: 1229.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868 [2024-06-10 05:42:17,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.21 | bwd_microstep: 1670.20 | bwd_inner_microstep: 1670.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 05:42:19,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1154.89 | bwd_inner_microstep: 1154.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705 [2024-06-10 05:42:21,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1459.15 | bwd_inner_microstep: 1459.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 05:42:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1255.66 | bwd_inner_microstep: 1255.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2093 [2024-06-10 05:42:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.09 | bwd_microstep: 759.48 | bwd_inner_microstep: 759.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 05:42:26,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 05:42:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1428.72 | bwd_inner_microstep: 1428.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1850 [2024-06-10 05:42:28,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.24 | bwd_microstep: 702.72 | bwd_inner_microstep: 702.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646 [2024-06-10 05:42:31,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.39 | bwd_microstep: 1710.87 | bwd_inner_microstep: 1710.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680 [2024-06-10 05:42:33,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.53 | bwd_microstep: 1723.00 | bwd_inner_microstep: 1722.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3624 [2024-06-10 05:42:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.33 | bwd_microstep: 1676.98 | bwd_inner_microstep: 1676.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2159 [2024-06-10 05:42:37,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.77 | bwd_microstep: 759.87 | bwd_inner_microstep: 759.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3835 [2024-06-10 05:42:39,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1423.58 | bwd_inner_microstep: 1423.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723 [2024-06-10 05:42:40,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1342.00 | bwd_inner_microstep: 1341.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 05:42:42,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1400.13 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-10 05:42:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1407.39 | bwd_inner_microstep: 1407.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-10 05:42:46,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.57 | bwd_microstep: 1201.45 | bwd_inner_microstep: 1201.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830 [2024-06-10 05:42:48,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.73 | bwd_microstep: 1489.35 | bwd_inner_microstep: 1489.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 05:42:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796 [2024-06-10 05:42:52,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.22 | bwd_microstep: 1648.46 | bwd_inner_microstep: 1648.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 05:42:54,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1351.44 | bwd_inner_microstep: 1351.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2240 [2024-06-10 05:42:55,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.11 | bwd_microstep: 1062.58 | bwd_inner_microstep: 1062.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 05:42:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3602 [2024-06-10 05:43:03,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.84 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 05:43:03,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.45 | bwd_microstep: 4612.06 | bwd_inner_microstep: 1936.49 | bwd_allreduce_microstep: 2675.51 | step_microstep: 40.77 [2024-06-10 05:43:03,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16249.76 | bwd: 46296.56 | bwd_inner: 43620.15 | bwd_allreduce: 2675.74 | step: 42.46 {'loss': 1.2414, 'learning_rate': 3.80379477560376e-05, 'epoch': 0.17} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971 [2024-06-10 05:43:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.81 | bwd_microstep: 887.80 | bwd_inner_microstep: 887.65 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2407 [2024-06-10 05:43:05,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.45 | bwd_microstep: 1001.68 | bwd_inner_microstep: 1001.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472 [2024-06-10 05:43:07,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.79 | bwd_microstep: 1214.11 | bwd_inner_microstep: 1214.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866 [2024-06-10 05:43:09,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.98 | bwd_microstep: 1659.22 | bwd_inner_microstep: 1659.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 05:43:11,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1250.40 | bwd_inner_microstep: 1250.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 05:43:13,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3733 [2024-06-10 05:43:15,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1400.53 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 05:43:17,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1284.42 | bwd_inner_microstep: 1284.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3435 [2024-06-10 05:43:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.79 | bwd_microstep: 1374.22 | bwd_inner_microstep: 1374.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476 [2024-06-10 05:43:21,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.68 | bwd_microstep: 1445.45 | bwd_inner_microstep: 1445.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3677 [2024-06-10 05:43:23,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.02 | bwd_microstep: 1827.17 | bwd_inner_microstep: 1827.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-10 05:43:25,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.25 | bwd_microstep: 1618.82 | bwd_inner_microstep: 1618.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 05:43:28,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.92 | bwd_microstep: 1534.92 | bwd_inner_microstep: 1534.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693 [2024-06-10 05:43:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.90 | bwd_microstep: 1728.15 | bwd_inner_microstep: 1728.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 05:43:32,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.02 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639 [2024-06-10 05:43:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1415.69 | bwd_inner_microstep: 1415.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3522 [2024-06-10 05:43:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1230.78 | bwd_inner_microstep: 1230.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2713 [2024-06-10 05:43:37,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.14 | bwd_microstep: 1006.55 | bwd_inner_microstep: 1006.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688 [2024-06-10 05:43:39,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103 [2024-06-10 05:43:40,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.35 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 05:43:42,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1431.45 | bwd_inner_microstep: 1431.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3858 [2024-06-10 05:43:44,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.14 | bwd_microstep: 1767.92 | bwd_inner_microstep: 1767.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549 [2024-06-10 05:43:46,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1198.56 | bwd_inner_microstep: 1198.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 05:43:48,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.30 | bwd_microstep: 1286.34 | bwd_inner_microstep: 1286.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3621 [2024-06-10 05:43:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.26 | bwd_microstep: 1248.33 | bwd_inner_microstep: 1248.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 05:43:52,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1556.19 | bwd_inner_microstep: 1556.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 05:43:54,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.21 | bwd_microstep: 1629.65 | bwd_inner_microstep: 1629.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2959 [2024-06-10 05:43:56,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.85 | bwd_microstep: 1200.20 | bwd_inner_microstep: 1200.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 05:43:57,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1342.94 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814 [2024-06-10 05:43:59,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1502.74 | bwd_inner_microstep: 1502.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 05:44:01,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.08 | bwd_microstep: 976.50 | bwd_inner_microstep: 976.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 05:44:04,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.63 [2024-06-10 05:44:04,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 2208.67 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 719.19 | step_microstep: 38.33 [2024-06-10 05:44:04,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16198.64 | bwd: 44126.37 | bwd_inner: 43406.17 | bwd_allreduce: 719.47 | step: 39.93 {'loss': 1.2998, 'learning_rate': 3.8021703192706023e-05, 'epoch': 0.17} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4038 [2024-06-10 05:44:06,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.64 | bwd_microstep: 1807.39 | bwd_inner_microstep: 1807.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 05:44:08,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478 [2024-06-10 05:44:10,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1243.32 | bwd_inner_microstep: 1243.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 05:44:11,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.95 | bwd_microstep: 1288.57 | bwd_inner_microstep: 1288.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 05:44:12,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 679.67 | bwd_inner_microstep: 679.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 05:44:14,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 05:44:16,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.82 | bwd_microstep: 1286.79 | bwd_inner_microstep: 1286.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1388 [2024-06-10 05:44:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 204.60 | bwd_microstep: 528.41 | bwd_inner_microstep: 528.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 05:44:18,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1430.69 | bwd_inner_microstep: 1430.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 05:44:20,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.14 | bwd_microstep: 1429.74 | bwd_inner_microstep: 1429.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 05:44:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1485.99 | bwd_inner_microstep: 1485.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 05:44:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-10 05:44:27,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.53 | bwd_microstep: 1721.70 | bwd_inner_microstep: 1721.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 05:44:29,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1250.75 | bwd_inner_microstep: 1250.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3635 [2024-06-10 05:44:31,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.59 | bwd_microstep: 1376.55 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2119 [2024-06-10 05:44:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.32 | bwd_microstep: 827.14 | bwd_inner_microstep: 827.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-10 05:44:33,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1300.93 | bwd_inner_microstep: 1300.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 05:44:36,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1496.83 | bwd_inner_microstep: 1496.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 05:44:37,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1309.99 | bwd_inner_microstep: 1309.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-10 05:44:39,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.48 | bwd_microstep: 1199.78 | bwd_inner_microstep: 1199.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 05:44:41,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.42 | bwd_inner_microstep: 1497.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1992 [2024-06-10 05:44:42,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.78 | bwd_microstep: 709.02 | bwd_inner_microstep: 708.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3550 [2024-06-10 05:44:44,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.44 | bwd_microstep: 1202.88 | bwd_inner_microstep: 1202.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2032 [2024-06-10 05:44:45,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.61 | bwd_microstep: 747.69 | bwd_inner_microstep: 747.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-10 05:44:47,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3601 [2024-06-10 05:44:49,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4151 [2024-06-10 05:44:51,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.53 | bwd_microstep: 1554.47 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 05:44:53,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1615.17 | bwd_inner_microstep: 1615.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-10 05:44:55,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1557.95 | bwd_inner_microstep: 1557.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377 [2024-06-10 05:44:57,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.69 | bwd_microstep: 1437.66 | bwd_inner_microstep: 1437.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3644 [2024-06-10 05:44:59,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.01 | bwd_microstep: 1680.29 | bwd_inner_microstep: 1680.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424 [2024-06-10 05:45:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.35 | optimizer_step: 6.65 [2024-06-10 05:45:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.85 | bwd_microstep: 4665.84 | bwd_inner_microstep: 1485.71 | bwd_allreduce_microstep: 3180.07 | step_microstep: 39.11 [2024-06-10 05:45:05,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15667.52 | bwd: 45069.22 | bwd_inner: 41888.24 | bwd_allreduce: 3180.29 | step: 40.77 {'loss': 1.3716, 'learning_rate': 3.800539515702949e-05, 'epoch': 0.17} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472 [2024-06-10 05:45:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.41 | bwd_microstep: 1570.11 | bwd_inner_microstep: 1570.02 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933 [2024-06-10 05:45:09,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1594.26 | bwd_inner_microstep: 1594.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 05:45:11,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1309.22 | bwd_inner_microstep: 1309.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 05:45:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1444.16 | bwd_inner_microstep: 1444.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2030 [2024-06-10 05:45:14,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.91 | bwd_microstep: 714.36 | bwd_inner_microstep: 714.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 05:45:16,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1458.22 | bwd_inner_microstep: 1458.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3500 [2024-06-10 05:45:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.08 | bwd_microstep: 1222.93 | bwd_inner_microstep: 1222.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3737 [2024-06-10 05:45:20,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.30 | bwd_microstep: 1467.32 | bwd_inner_microstep: 1467.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484 [2024-06-10 05:45:21,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.23 | bwd_microstep: 1313.66 | bwd_inner_microstep: 1313.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443 [2024-06-10 05:45:23,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.94 | bwd_microstep: 1289.84 | bwd_inner_microstep: 1289.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 05:45:25,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993 [2024-06-10 05:45:26,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.78 | bwd_microstep: 899.84 | bwd_inner_microstep: 899.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966 [2024-06-10 05:45:28,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.72 | bwd_microstep: 855.47 | bwd_inner_microstep: 855.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 05:45:30,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.84 | bwd_microstep: 1420.76 | bwd_inner_microstep: 1420.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 05:45:32,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.72 | bwd_microstep: 1504.10 | bwd_inner_microstep: 1504.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 05:45:34,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1629.89 | bwd_inner_microstep: 1629.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513 [2024-06-10 05:45:36,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.95 | bwd_microstep: 1192.98 | bwd_inner_microstep: 1192.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 05:45:38,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1399.74 | bwd_inner_microstep: 1399.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 05:45:40,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.45 | bwd_microstep: 1635.16 | bwd_inner_microstep: 1635.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 05:45:42,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1393.22 | bwd_inner_microstep: 1393.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951 [2024-06-10 05:45:43,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.23 | bwd_microstep: 730.82 | bwd_inner_microstep: 730.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2203 [2024-06-10 05:45:44,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.32 | bwd_microstep: 866.17 | bwd_inner_microstep: 866.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 05:45:46,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1637.65 | bwd_inner_microstep: 1637.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 05:45:48,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.24 | bwd_microstep: 1287.57 | bwd_inner_microstep: 1287.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1979 [2024-06-10 05:45:49,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.36 | bwd_microstep: 828.36 | bwd_inner_microstep: 828.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2281 [2024-06-10 05:45:50,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.06 | bwd_microstep: 937.86 | bwd_inner_microstep: 937.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 05:45:52,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1352.11 | bwd_inner_microstep: 1352.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046 [2024-06-10 05:45:54,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.44 | bwd_microstep: 904.40 | bwd_inner_microstep: 904.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806 [2024-06-10 05:45:56,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.83 | bwd_microstep: 1610.48 | bwd_inner_microstep: 1610.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-10 05:45:58,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1453.41 | bwd_inner_microstep: 1453.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 05:46:00,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1648.39 | bwd_inner_microstep: 1648.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073 [2024-06-10 05:46:06,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.60 [2024-06-10 05:46:06,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.04 | bwd_microstep: 5199.33 | bwd_inner_microstep: 1157.30 | bwd_allreduce_microstep: 4041.95 | step_microstep: 39.28 [2024-06-10 05:46:06,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15375.16 | bwd: 45257.04 | bwd_inner: 41214.08 | bwd_allreduce: 4042.25 | step: 40.94 287/1726 [5:02:35<24:30:48, 61.33s/it] 17%|█▋ | 288/1726 [5:03:35<24:22:55, 61.04s/it] 17%|█▋ | 288/1726 [5:03:35<24:22:55, 61.04s/it] 17%|█▋ | 289/1726 [5:04:37<24:25:22, 61.18s/it] 17%|█▋ | 289/1726 [5:04:37<24:25:22, 61.18s/it] 17%|█▋ | 290/1726 [5:05:40<24:36:40, 61.70s/it] 17%|█▋ | 290/1726 [5:05:40<24:36:40, 61.70s/it] 17%|█▋ | 291/1726 [5:06:40<24:28:17, 61.39s/it] 17%|█▋ | 291/1726 [5:06:40<24:28:17, 61.39s/it] 17%|█▋ | 292/1726 [5:07:41<24:25:05, 61.30s/it] 17%|█▋ | 292/1726 [5:07:41<24:25:05, 61.30s/it] 17%|█▋ | 293/1726 [5:08:42<24:21:46, 61.21s/it] {'loss': 1.3045, 'learning_rate': 3.798902370644482e-05, 'epoch': 0.17} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-10 05:46:07,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.47 | bwd_microstep: 1193.94 | bwd_inner_microstep: 1193.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3973 [2024-06-10 05:46:10,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.94 | bwd_microstep: 1696.53 | bwd_inner_microstep: 1696.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 05:46:11,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 788.74 | bwd_inner_microstep: 788.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482 [2024-06-10 05:46:13,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1322.39 | bwd_inner_microstep: 1322.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 05:46:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1443.03 | bwd_inner_microstep: 1443.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-10 05:46:16,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.88 | bwd_microstep: 1211.26 | bwd_inner_microstep: 1211.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408 [2024-06-10 05:46:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.43 | bwd_microstep: 1277.84 | bwd_inner_microstep: 1277.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 05:46:19,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.09 | bwd_microstep: 799.48 | bwd_inner_microstep: 799.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 05:46:21,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1389.48 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415 [2024-06-10 05:46:23,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.58 | bwd_microstep: 1154.32 | bwd_inner_microstep: 1154.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1903 [2024-06-10 05:46:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.16 | bwd_microstep: 746.27 | bwd_inner_microstep: 746.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2191 [2024-06-10 05:46:25,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.53 | bwd_microstep: 1049.39 | bwd_inner_microstep: 1049.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 05:46:27,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1440.05 | bwd_inner_microstep: 1440.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536 [2024-06-10 05:46:29,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.35 | bwd_microstep: 1327.65 | bwd_inner_microstep: 1327.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3593 [2024-06-10 05:46:31,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.77 | bwd_microstep: 1574.62 | bwd_inner_microstep: 1574.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158 [2024-06-10 05:46:32,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.68 | bwd_microstep: 950.42 | bwd_inner_microstep: 950.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 05:46:35,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1513.03 | bwd_inner_microstep: 1513.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-10 05:46:37,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1592.10 | bwd_inner_microstep: 1592.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2081 [2024-06-10 05:46:38,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.26 | bwd_microstep: 725.26 | bwd_inner_microstep: 725.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 05:46:39,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.80 | bwd_microstep: 725.81 | bwd_inner_microstep: 725.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 05:46:41,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1559.68 | bwd_inner_microstep: 1559.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715 [2024-06-10 05:46:43,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1368.12 | bwd_inner_microstep: 1368.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 05:46:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.20 | bwd_microstep: 1298.05 | bwd_inner_microstep: 1298.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 05:46:46,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1350.88 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3473 [2024-06-10 05:46:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1366.00 | bwd_inner_microstep: 1365.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605 [2024-06-10 05:46:50,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.14 | bwd_microstep: 1312.32 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 05:46:52,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1503.47 | bwd_inner_microstep: 1503.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-10 05:46:53,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.51 | bwd_microstep: 880.47 | bwd_inner_microstep: 880.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3534 [2024-06-10 05:46:55,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.68 | bwd_microstep: 1199.67 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3127 [2024-06-10 05:46:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1406.26 | bwd_inner_microstep: 1406.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 05:46:59,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.22 | bwd_microstep: 1535.64 | bwd_inner_microstep: 1535.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-10 05:47:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60 [2024-06-10 05:47:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 7905.89 | bwd_inner_microstep: 1697.33 | bwd_allreduce_microstep: 6208.51 | step_microstep: 39.24 [2024-06-10 05:47:08,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.86 | bwd: 46608.07 | bwd_inner: 40398.65 | bwd_allreduce: 6208.74 | step: 40.84 {'loss': 1.2502, 'learning_rate': 3.797258889861216e-05, 'epoch': 0.17} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 05:47:09,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.24 | bwd_microstep: 1241.12 | bwd_inner_microstep: 1241.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3839 [2024-06-10 05:47:12,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.96 | bwd_microstep: 1513.22 | bwd_inner_microstep: 1513.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 05:47:13,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1243.29 | bwd_inner_microstep: 1243.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 05:47:15,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.24 | bwd_microstep: 1374.01 | bwd_inner_microstep: 1373.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3417 [2024-06-10 05:47:17,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.70 | bwd_microstep: 1184.04 | bwd_inner_microstep: 1184.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3460 [2024-06-10 05:47:19,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.23 | bwd_microstep: 1240.93 | bwd_inner_microstep: 1240.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881 [2024-06-10 05:47:19,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.24 | bwd_microstep: 712.65 | bwd_inner_microstep: 712.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 05:47:21,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1388.46 | bwd_inner_microstep: 1388.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 05:47:23,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-10 05:47:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1623.36 | bwd_inner_microstep: 1623.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3643 [2024-06-10 05:47:28,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1576.90 | bwd_inner_microstep: 1576.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1937 [2024-06-10 05:47:29,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.33 | bwd_microstep: 882.52 | bwd_inner_microstep: 882.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2096 [2024-06-10 05:47:30,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.02 | bwd_microstep: 1015.15 | bwd_inner_microstep: 1015.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3901 [2024-06-10 05:47:33,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.47 | bwd_microstep: 1783.24 | bwd_inner_microstep: 1783.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966 [2024-06-10 05:47:34,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.39 | bwd_microstep: 704.45 | bwd_inner_microstep: 704.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 05:47:36,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.86 | bwd_microstep: 1483.47 | bwd_inner_microstep: 1483.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2584 [2024-06-10 05:47:37,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.83 | bwd_microstep: 975.05 | bwd_inner_microstep: 975.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 05:47:38,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.13 | bwd_microstep: 796.01 | bwd_inner_microstep: 795.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 05:47:40,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1407.96 | bwd_inner_microstep: 1407.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3864 [2024-06-10 05:47:43,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.42 | bwd_microstep: 1666.64 | bwd_inner_microstep: 1666.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 05:47:44,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1390.47 | bwd_inner_microstep: 1390.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837 [2024-06-10 05:47:46,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1456.11 | bwd_inner_microstep: 1456.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2398 [2024-06-10 05:47:48,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.72 | bwd_microstep: 1004.33 | bwd_inner_microstep: 1004.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609 [2024-06-10 05:47:50,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.24 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583 [2024-06-10 05:47:52,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 05:47:54,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.82 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688 [2024-06-10 05:47:56,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.54 | bwd_microstep: 1531.17 | bwd_inner_microstep: 1531.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 05:47:58,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1402.99 | bwd_inner_microstep: 1402.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3567 [2024-06-10 05:48:00,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.00 | bwd_microstep: 1698.34 | bwd_inner_microstep: 1698.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3777 [2024-06-10 05:48:03,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.92 | bwd_microstep: 1590.23 | bwd_inner_microstep: 1590.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570 [2024-06-10 05:48:04,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.74 | bwd_microstep: 1423.25 | bwd_inner_microstep: 1423.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015 [2024-06-10 05:48:11,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 05:48:11,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.39 | bwd_microstep: 5671.72 | bwd_inner_microstep: 1024.60 | bwd_allreduce_microstep: 4647.07 | step_microstep: 38.74 [2024-06-10 05:48:11,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15728.61 | bwd: 46785.60 | bwd_inner: 42137.62 | bwd_allreduce: 4647.29 | step: 40.37 {'loss': 1.3508, 'learning_rate': 3.795609079141484e-05, 'epoch': 0.17} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576 [2024-06-10 05:48:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1590.58 | bwd_inner_microstep: 1590.52 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920 [2024-06-10 05:48:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.52 | bwd_microstep: 724.25 | bwd_inner_microstep: 724.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373 [2024-06-10 05:48:15,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.90 | bwd_microstep: 995.84 | bwd_inner_microstep: 995.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848 [2024-06-10 05:48:17,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.91 | bwd_microstep: 1465.94 | bwd_inner_microstep: 1465.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 05:48:19,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.46 | bwd_microstep: 1498.59 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 05:48:21,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3796 [2024-06-10 05:48:23,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.23 | bwd_microstep: 1600.91 | bwd_inner_microstep: 1600.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1443 [2024-06-10 05:48:24,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 209.70 | bwd_microstep: 540.07 | bwd_inner_microstep: 540.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 05:48:26,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1432.69 | bwd_inner_microstep: 1432.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434 [2024-06-10 05:48:28,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.21 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 05:48:29,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 05:48:32,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1492.55 | bwd_inner_microstep: 1492.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285 [2024-06-10 05:48:33,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.41 | bwd_microstep: 1075.63 | bwd_inner_microstep: 1075.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2971 [2024-06-10 05:48:35,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1332.64 | bwd_inner_microstep: 1332.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3394 [2024-06-10 05:48:37,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 05:48:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1507.52 | bwd_inner_microstep: 1507.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 05:48:41,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1495.55 | bwd_inner_microstep: 1495.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3850 [2024-06-10 05:48:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1590.62 | bwd_inner_microstep: 1590.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650 [2024-06-10 05:48:45,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.85 | bwd_microstep: 1612.70 | bwd_inner_microstep: 1612.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 05:48:48,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.72 | bwd_microstep: 1662.39 | bwd_inner_microstep: 1662.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 05:48:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.98 | bwd_microstep: 1631.78 | bwd_inner_microstep: 1631.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2005 [2024-06-10 05:48:51,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.72 | bwd_microstep: 712.02 | bwd_inner_microstep: 711.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 05:48:53,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1515.41 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 635 [2024-06-10 05:48:53,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 105.22 | bwd_microstep: 264.85 | bwd_inner_microstep: 264.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-10 05:48:55,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.75 | bwd_microstep: 1319.20 | bwd_inner_microstep: 1319.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 05:48:57,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.59 | bwd_microstep: 1401.26 | bwd_inner_microstep: 1401.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 05:48:59,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.71 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980 [2024-06-10 05:49:00,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.05 | bwd_microstep: 706.88 | bwd_inner_microstep: 706.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 05:49:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1506.88 | bwd_inner_microstep: 1506.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823 [2024-06-10 05:49:04,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1582.14 | bwd_inner_microstep: 1582.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2288 [2024-06-10 05:49:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.08 | bwd_microstep: 1040.40 | bwd_inner_microstep: 1040.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 05:49:13,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.26 | optimizer_step: 6.60 [2024-06-10 05:49:13,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 6992.14 | bwd_inner_microstep: 1752.87 | bwd_allreduce_microstep: 5239.22 | step_microstep: 38.92 [2024-06-10 05:49:13,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15454.61 | bwd: 46788.98 | bwd_inner: 41548.80 | bwd_allreduce: 5239.47 | step: 40.55 {'loss': 1.2527, 'learning_rate': 3.793952944295909e-05, 'epoch': 0.17} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 05:49:15,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.00 | bwd_microstep: 1333.49 | bwd_inner_microstep: 1333.40 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3953 [2024-06-10 05:49:17,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.41 | bwd_microstep: 1490.00 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868 [2024-06-10 05:49:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.75 | bwd_microstep: 1565.44 | bwd_inner_microstep: 1565.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 05:49:21,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1555.15 | bwd_inner_microstep: 1555.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 05:49:22,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.58 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 05:49:24,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 05:49:26,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.26 | bwd_microstep: 1284.30 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3489 [2024-06-10 05:49:28,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 05:49:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.41 | bwd_microstep: 792.22 | bwd_inner_microstep: 792.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 05:49:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.33 | bwd_microstep: 1649.79 | bwd_inner_microstep: 1649.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 05:49:32,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.25 | bwd_microstep: 798.66 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 05:49:34,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1281.87 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 05:49:36,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.73 | bwd_inner_microstep: 1478.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3688 [2024-06-10 05:49:39,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.66 | bwd_microstep: 1723.05 | bwd_inner_microstep: 1723.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-10 05:49:41,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1444.28 | bwd_inner_microstep: 1444.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-10 05:49:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.35 | bwd_microstep: 1545.63 | bwd_inner_microstep: 1545.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 05:49:45,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.00 | bwd_microstep: 1375.08 | bwd_inner_microstep: 1375.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997 [2024-06-10 05:49:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.00 | bwd_microstep: 709.61 | bwd_inner_microstep: 709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3528 [2024-06-10 05:49:47,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1353.90 | bwd_inner_microstep: 1353.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670 [2024-06-10 05:49:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1526.29 | bwd_inner_microstep: 1526.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041 [2024-06-10 05:49:51,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.75 | bwd_microstep: 845.64 | bwd_inner_microstep: 845.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 05:49:53,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1554.14 | bwd_inner_microstep: 1554.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2300 [2024-06-10 05:49:54,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.89 | bwd_microstep: 975.74 | bwd_inner_microstep: 975.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 05:49:56,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.08 | bwd_microstep: 1260.47 | bwd_inner_microstep: 1260.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 05:49:58,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 05:50:00,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1397.33 | bwd_inner_microstep: 1397.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 05:50:02,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 05:50:04,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.19 | bwd_microstep: 1756.59 | bwd_inner_microstep: 1756.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1996 [2024-06-10 05:50:06,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.86 | bwd_microstep: 897.51 | bwd_inner_microstep: 897.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 05:50:08,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1544.17 | bwd_inner_microstep: 1544.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235 [2024-06-10 05:50:09,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.20 | bwd_microstep: 963.96 | bwd_inner_microstep: 963.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462 [2024-06-10 05:50:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.26 | optimizer_step: 6.59 [2024-06-10 05:50:15,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.22 | bwd_microstep: 5089.58 | bwd_inner_microstep: 1505.76 | bwd_allreduce_microstep: 3583.77 | step_microstep: 38.77 [2024-06-10 05:50:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15626.60 | bwd: 45478.18 | bwd_inner: 41893.43 | bwd_allreduce: 3584.04 | step: 40.45 {'loss': 1.3392, 'learning_rate': 3.7922904911573903e-05, 'epoch': 0.17} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4245 [2024-06-10 05:50:17,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.26 | bwd_microstep: 1742.27 | bwd_inner_microstep: 1742.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2405 [2024-06-10 05:50:18,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.02 | bwd_microstep: 1002.88 | bwd_inner_microstep: 1002.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901 [2024-06-10 05:50:21,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.07 | bwd_microstep: 1587.58 | bwd_inner_microstep: 1587.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 4068 [2024-06-10 05:50:23,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1386.42 | bwd_inner_microstep: 1386.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 05:50:24,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1247.83 | bwd_inner_microstep: 1247.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 05:50:25,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 791.81 | bwd_inner_microstep: 791.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 05:50:27,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1254.46 | bwd_inner_microstep: 1254.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501 [2024-06-10 05:50:29,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.06 | bwd_microstep: 1320.27 | bwd_inner_microstep: 1320.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2207 [2024-06-10 05:50:30,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.00 | bwd_microstep: 1054.74 | bwd_inner_microstep: 1054.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509 [2024-06-10 05:50:32,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.01 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124 [2024-06-10 05:50:34,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 926.78 | bwd_inner_microstep: 926.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 05:50:35,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1291.31 | bwd_inner_microstep: 1291.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534 [2024-06-10 05:50:37,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.33 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982 [2024-06-10 05:50:39,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 826.83 | bwd_inner_microstep: 826.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 05:50:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1340.76 | bwd_inner_microstep: 1340.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3530 [2024-06-10 05:50:43,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.74 | bwd_microstep: 1687.67 | bwd_inner_microstep: 1687.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 05:50:45,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1348.96 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 05:50:47,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1385.26 | bwd_inner_microstep: 1385.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627 [2024-06-10 05:50:48,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1317.37 | bwd_inner_microstep: 1317.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 05:50:50,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.53 | bwd_microstep: 1345.23 | bwd_inner_microstep: 1345.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 05:50:52,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077 [2024-06-10 05:50:53,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.06 | bwd_microstep: 819.83 | bwd_inner_microstep: 819.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725 [2024-06-10 05:50:55,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.18 | bwd_microstep: 1338.07 | bwd_inner_microstep: 1338.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 05:50:56,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 878.39 | bwd_inner_microstep: 878.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 05:50:58,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.47 | bwd_microstep: 1383.50 | bwd_inner_microstep: 1383.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 05:51:00,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.07 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512 [2024-06-10 05:51:02,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.79 | bwd_microstep: 1194.05 | bwd_inner_microstep: 1194.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 05:51:04,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.67 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 05:51:06,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.27 | bwd_microstep: 1349.59 | bwd_inner_microstep: 1349.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 05:51:08,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.83 | bwd_microstep: 1297.84 | bwd_inner_microstep: 1297.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 05:51:10,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.43 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 05:51:15,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.23 | optimizer_step: 6.63 [2024-06-10 05:51:15,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.81 | bwd_microstep: 4528.79 | bwd_inner_microstep: 1803.33 | bwd_allreduce_microstep: 2725.41 | step_microstep: 38.79 [2024-06-10 05:51:15,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15516.13 | bwd: 44307.55 | bwd_inner: 41581.24 | bwd_allreduce: 2725.63 | step: 40.44 {'loss': 1.3538, 'learning_rate': 3.790621725581079e-05, 'epoch': 0.17} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-10 05:51:17,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1444.26 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2026 [2024-06-10 05:51:18,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.71 | bwd_microstep: 715.29 | bwd_inner_microstep: 715.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 05:51:20,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2338 [2024-06-10 05:51:21,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.69 | bwd_microstep: 825.96 | bwd_inner_microstep: 825.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-10 05:51:22,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.24 | bwd_microstep: 1184.28 | bwd_inner_microstep: 1184.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 05:51:24,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1392.77 | bwd_inner_microstep: 1392.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 05:51:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1488.38 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219 [2024-06-10 05:51:28,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.27 | bwd_microstep: 962.20 | bwd_inner_microstep: 962.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 05:51:30,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1944 [2024-06-10 05:51:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.18 | bwd_microstep: 744.86 | bwd_inner_microstep: 744.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 05:51:33,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1352.31 | bwd_inner_microstep: 1352.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-10 05:51:35,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1510.95 | bwd_inner_microstep: 1510.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 05:51:37,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.85 | bwd_microstep: 1627.27 | bwd_inner_microstep: 1627.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 05:51:39,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1380.79 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3673 [2024-06-10 05:51:41,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.19 | bwd_microstep: 1654.98 | bwd_inner_microstep: 1654.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653 [2024-06-10 05:51:43,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1328.94 | bwd_inner_microstep: 1328.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 05:51:45,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.64 | bwd_microstep: 1405.55 | bwd_inner_microstep: 1405.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 05:51:47,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1493.00 | bwd_inner_microstep: 1492.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 05:51:49,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.02 | bwd_microstep: 1303.00 | bwd_inner_microstep: 1302.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 05:51:51,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.35 | bwd_microstep: 1287.44 | bwd_inner_microstep: 1287.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678 [2024-06-10 05:51:53,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1330.62 | bwd_inner_microstep: 1330.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 05:51:54,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.23 | bwd_microstep: 1402.01 | bwd_inner_microstep: 1401.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 05:51:57,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.64 | bwd_inner_microstep: 1556.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 05:51:59,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 1436.07 | bwd_inner_microstep: 1436.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-10 05:52:01,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1391.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-10 05:52:02,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.28 | bwd_microstep: 1345.56 | bwd_inner_microstep: 1345.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 05:52:04,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.98 | bwd_microstep: 1384.60 | bwd_inner_microstep: 1384.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384 [2024-06-10 05:52:06,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.60 | bwd_microstep: 1242.13 | bwd_inner_microstep: 1242.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 05:52:08,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1558.69 | bwd_inner_microstep: 1558.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581 [2024-06-10 05:52:11,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.74 | bwd_microstep: 1697.79 | bwd_inner_microstep: 1697.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 05:52:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.91 | bwd_microstep: 1549.59 | bwd_inner_microstep: 1549.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 05:52:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 05:52:18,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 4366.67 | bwd_inner_microstep: 1767.12 | bwd_allreduce_microstep: 2599.49 | step_microstep: 38.67 [2024-06-10 05:52:18,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16266.15 | bwd: 46222.19 | bwd_inner: 43621.73 | bwd_allreduce: 2599.76 | step: 40.36 17%|█▋ | 293/1726 [5:08:42<24:21:46, 61.21s/it] 17%|█▋ | 294/1726 [5:09:44<24:26:56, 61.46s/it] 17%|█▋ | 294/1726 [5:09:44<24:26:56, 61.46s/it] 17%|█▋ | 295/1726 [5:10:47<24:35:56, 61.88s/it] 17%|█▋ | 295/1726 [5:10:47<24:35:56, 61.88s/it] 17%|█▋ | 296/1726 [5:11:50<24:39:59, 62.10s/it] 17%|█▋ | 296/1726 [5:11:50<24:39:59, 62.10s/it] 17%|█▋ | 297/1726 [5:12:51<24:34:26, 61.91s/it] 17%|█▋ | 297/1726 [5:12:51<24:34:26, 61.91s/it] 17%|█▋ | 298/1726 [5:13:52<24:21:01, 61.39s/it] 17%|█▋ | 298/1726 [5:13:52<24:21:01, 61.39s/it] 17%|█▋ | 299/1726 [5:14:54<24:30:23, 61.82s/it] {'loss': 1.3184, 'learning_rate': 3.788946653444359e-05, 'epoch': 0.17} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 05:52:20,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1482.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 05:52:21,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1275.70 | bwd_inner_microstep: 1275.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1921 [2024-06-10 05:52:23,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.56 | bwd_microstep: 849.79 | bwd_inner_microstep: 849.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 05:52:25,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.73 | bwd_microstep: 1550.96 | bwd_inner_microstep: 1550.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881 [2024-06-10 05:52:26,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.03 | bwd_microstep: 680.07 | bwd_inner_microstep: 680.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738 [2024-06-10 05:52:28,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.76 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 05:52:29,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.69 | bwd_microstep: 1288.19 | bwd_inner_microstep: 1288.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-10 05:52:31,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1530.37 | bwd_inner_microstep: 1530.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 05:52:33,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.65 | bwd_microstep: 1353.38 | bwd_inner_microstep: 1353.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-10 05:52:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.97 | bwd_microstep: 1526.20 | bwd_inner_microstep: 1526.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118 [2024-06-10 05:52:37,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.60 | bwd_microstep: 831.55 | bwd_inner_microstep: 831.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496 [2024-06-10 05:52:38,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 1316.09 | bwd_inner_microstep: 1316.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010 [2024-06-10 05:52:40,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.09 | bwd_microstep: 802.79 | bwd_inner_microstep: 802.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687 [2024-06-10 05:52:42,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.25 | bwd_microstep: 1721.24 | bwd_inner_microstep: 1721.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3683 [2024-06-10 05:52:44,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.78 | bwd_microstep: 1823.33 | bwd_inner_microstep: 1823.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4025 [2024-06-10 05:52:47,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.68 | bwd_microstep: 1657.20 | bwd_inner_microstep: 1657.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 05:52:49,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.02 | bwd_microstep: 1505.80 | bwd_inner_microstep: 1505.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-10 05:52:50,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 727.61 | bwd_inner_microstep: 727.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2412 [2024-06-10 05:52:51,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.29 | bwd_microstep: 1033.98 | bwd_inner_microstep: 1033.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-10 05:52:53,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1417.39 | bwd_inner_microstep: 1417.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 05:52:55,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1530.16 | bwd_inner_microstep: 1530.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-10 05:52:57,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.15 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2019 [2024-06-10 05:52:58,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.42 | bwd_microstep: 854.31 | bwd_inner_microstep: 854.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598 [2024-06-10 05:53:00,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1608.54 | bwd_inner_microstep: 1608.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912 [2024-06-10 05:53:01,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.31 | bwd_microstep: 718.86 | bwd_inner_microstep: 718.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945 [2024-06-10 05:53:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.25 | bwd_microstep: 732.95 | bwd_inner_microstep: 732.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 05:53:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1459.61 | bwd_inner_microstep: 1459.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-10 05:53:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.28 | bwd_microstep: 1007.28 | bwd_inner_microstep: 1007.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-10 05:53:08,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.60 | bwd_microstep: 1632.33 | bwd_inner_microstep: 1632.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-10 05:53:10,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806 [2024-06-10 05:53:12,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.07 | bwd_microstep: 1686.76 | bwd_inner_microstep: 1686.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-10 05:53:18,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 05:53:18,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 5274.87 | bwd_inner_microstep: 1102.46 | bwd_allreduce_microstep: 4172.36 | step_microstep: 38.66 [2024-06-10 05:53:18,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15176.54 | bwd: 44906.69 | bwd_inner: 40733.33 | bwd_allreduce: 4172.64 | step: 40.24 {'loss': 1.3219, 'learning_rate': 3.787265280646825e-05, 'epoch': 0.17} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-10 05:53:20,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.14 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2821 [2024-06-10 05:53:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.48 | bwd_microstep: 1112.62 | bwd_inner_microstep: 1112.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 05:53:23,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1248.35 | bwd_inner_microstep: 1248.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 05:53:25,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1249.01 | bwd_inner_microstep: 1248.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 05:53:27,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1353.32 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 05:53:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.98 | bwd_inner_microstep: 1653.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 05:53:31,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1289.11 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 05:53:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.19 | bwd_microstep: 793.22 | bwd_inner_microstep: 793.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884 [2024-06-10 05:53:33,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.53 | bwd_microstep: 685.79 | bwd_inner_microstep: 685.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 05:53:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.67 | bwd_microstep: 794.46 | bwd_inner_microstep: 794.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 05:53:36,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.50 | bwd_microstep: 1348.08 | bwd_inner_microstep: 1348.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3950 [2024-06-10 05:53:38,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 1531.21 | bwd_inner_microstep: 1531.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937 [2024-06-10 05:53:39,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.46 | bwd_microstep: 698.58 | bwd_inner_microstep: 698.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3691 [2024-06-10 05:53:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.20 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1413.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660 [2024-06-10 05:53:43,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1419.81 | bwd_inner_microstep: 1419.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 05:53:45,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1460.90 | bwd_inner_microstep: 1460.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 05:53:47,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1394.08 | bwd_inner_microstep: 1394.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 05:53:49,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1259.00 | bwd_inner_microstep: 1258.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 05:53:50,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.28 | bwd_microstep: 1254.76 | bwd_inner_microstep: 1254.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 05:53:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.61 | bwd_microstep: 1659.91 | bwd_inner_microstep: 1659.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667 [2024-06-10 05:53:55,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.16 | bwd_microstep: 1624.42 | bwd_inner_microstep: 1624.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293 [2024-06-10 05:53:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.83 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 05:53:58,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.72 | bwd_microstep: 1186.98 | bwd_inner_microstep: 1186.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 05:54:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.80 | bwd_microstep: 1441.28 | bwd_inner_microstep: 1441.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715 [2024-06-10 05:54:02,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1368.70 | bwd_inner_microstep: 1368.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 05:54:04,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1348.20 | bwd_inner_microstep: 1348.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431 [2024-06-10 05:54:05,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.42 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 05:54:08,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.29 | bwd_microstep: 1548.85 | bwd_inner_microstep: 1548.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2205 [2024-06-10 05:54:09,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.40 | bwd_microstep: 867.84 | bwd_inner_microstep: 867.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817 [2024-06-10 05:54:11,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.23 | bwd_microstep: 1620.52 | bwd_inner_microstep: 1620.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399 [2024-06-10 05:54:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1441.27 | bwd_inner_microstep: 1441.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816 [2024-06-10 05:54:21,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.57 [2024-06-10 05:54:21,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.98 | bwd_microstep: 7714.77 | bwd_inner_microstep: 1988.65 | bwd_allreduce_microstep: 5726.07 | step_microstep: 38.88 [2024-06-10 05:54:21,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15549.80 | bwd: 47475.36 | bwd_inner: 41748.34 | bwd_allreduce: 5726.32 | step: 40.46 {'loss': 1.3533, 'learning_rate': 3.785577613110264e-05, 'epoch': 0.17} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 05:54:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.97 | bwd_microstep: 784.07 | bwd_inner_microstep: 783.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 05:54:24,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1342.21 | bwd_inner_microstep: 1342.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2373 [2024-06-10 05:54:26,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.70 | bwd_microstep: 980.33 | bwd_inner_microstep: 980.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 05:54:28,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.05 | bwd_microstep: 1281.37 | bwd_inner_microstep: 1281.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 05:54:29,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1385.53 | bwd_inner_microstep: 1385.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 05:54:31,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3560 [2024-06-10 05:54:33,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.80 | bwd_microstep: 1235.29 | bwd_inner_microstep: 1235.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 05:54:35,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.55 | bwd_microstep: 1536.16 | bwd_inner_microstep: 1536.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708 [2024-06-10 05:54:37,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.44 | bwd_microstep: 1527.67 | bwd_inner_microstep: 1527.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 05:54:39,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685 [2024-06-10 05:54:41,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.30 | bwd_microstep: 1520.13 | bwd_inner_microstep: 1520.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-10 05:54:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.23 | bwd_microstep: 1620.64 | bwd_inner_microstep: 1620.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3671 [2024-06-10 05:54:46,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1480.63 | bwd_inner_microstep: 1480.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 05:54:47,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1315.95 | bwd_inner_microstep: 1315.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657 [2024-06-10 05:54:50,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.54 | bwd_microstep: 1715.90 | bwd_inner_microstep: 1715.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-10 05:54:52,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1589.08 | bwd_inner_microstep: 1589.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682 [2024-06-10 05:54:54,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1420.34 | bwd_inner_microstep: 1420.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 05:54:56,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.50 | bwd_microstep: 1613.63 | bwd_inner_microstep: 1613.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-10 05:54:58,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.58 | bwd_microstep: 1611.31 | bwd_inner_microstep: 1611.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 05:55:00,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.63 | bwd_microstep: 1525.16 | bwd_inner_microstep: 1525.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3712 [2024-06-10 05:55:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1237.74 | bwd_inner_microstep: 1237.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3527 [2024-06-10 05:55:04,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966 [2024-06-10 05:55:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 703.04 | bwd_inner_microstep: 703.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007 [2024-06-10 05:55:06,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.46 | bwd_microstep: 711.14 | bwd_inner_microstep: 711.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3743 [2024-06-10 05:55:08,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.07 | bwd_microstep: 1272.75 | bwd_inner_microstep: 1272.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 05:55:10,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1295.96 | bwd_inner_microstep: 1295.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 05:55:12,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1404.15 | bwd_inner_microstep: 1404.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 05:55:13,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1405.63 | bwd_inner_microstep: 1405.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047 [2024-06-10 05:55:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.17 | bwd_microstep: 814.61 | bwd_inner_microstep: 814.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 05:55:16,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.54 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244 [2024-06-10 05:55:17,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.07 | bwd_microstep: 872.47 | bwd_inner_microstep: 872.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467 [2024-06-10 05:55:22,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.26 | optimizer_step: 6.63 [2024-06-10 05:55:22,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.45 | bwd_microstep: 4134.54 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 2549.80 | step_microstep: 38.77 [2024-06-10 05:55:22,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15587.29 | bwd: 44246.37 | bwd_inner: 41695.55 | bwd_allreduce: 2550.08 | step: 40.40 {'loss': 1.3319, 'learning_rate': 3.783883656778631e-05, 'epoch': 0.17} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 05:55:24,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1484.22 | bwd_inner_microstep: 1484.16 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 05:55:25,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.04 | bwd_microstep: 1240.45 | bwd_inner_microstep: 1240.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811 [2024-06-10 05:55:27,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.97 | bwd_microstep: 1351.97 | bwd_inner_microstep: 1351.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 05:55:29,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.43 | bwd_microstep: 971.49 | bwd_inner_microstep: 971.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 05:55:30,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.94 | bwd_microstep: 1352.21 | bwd_inner_microstep: 1352.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 05:55:33,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.72 | bwd_microstep: 1529.53 | bwd_inner_microstep: 1529.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725 [2024-06-10 05:55:34,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.74 | bwd_microstep: 1334.43 | bwd_inner_microstep: 1334.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 05:55:36,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2057 [2024-06-10 05:55:37,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.96 | bwd_microstep: 816.56 | bwd_inner_microstep: 816.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 05:55:40,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.60 | bwd_microstep: 1627.19 | bwd_inner_microstep: 1627.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 05:55:42,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.95 | bwd_microstep: 1387.80 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941 [2024-06-10 05:55:43,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.40 | bwd_microstep: 823.68 | bwd_inner_microstep: 823.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-10 05:55:45,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.44 | bwd_microstep: 1525.07 | bwd_inner_microstep: 1525.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 05:55:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 05:55:49,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.65 | bwd_microstep: 1606.72 | bwd_inner_microstep: 1606.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953 [2024-06-10 05:55:50,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.40 | bwd_microstep: 822.26 | bwd_inner_microstep: 822.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 05:55:51,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 790.88 | bwd_inner_microstep: 790.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 05:55:53,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1551.99 | bwd_inner_microstep: 1551.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824 [2024-06-10 05:55:55,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1360.37 | bwd_inner_microstep: 1360.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 05:55:57,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1280.18 | bwd_inner_microstep: 1280.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 05:55:59,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.03 | bwd_microstep: 1417.32 | bwd_inner_microstep: 1417.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 05:56:01,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 05:56:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1255.57 | bwd_inner_microstep: 1255.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2022 [2024-06-10 05:56:04,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.52 | bwd_microstep: 715.57 | bwd_inner_microstep: 715.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 05:56:05,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1294.97 | bwd_inner_microstep: 1294.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 05:56:07,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.33 | bwd_microstep: 1452.38 | bwd_inner_microstep: 1452.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 05:56:10,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.34 | bwd_microstep: 1659.37 | bwd_inner_microstep: 1659.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3604 [2024-06-10 05:56:11,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1275.79 | bwd_inner_microstep: 1275.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2236 [2024-06-10 05:56:12,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 776.72 | bwd_inner_microstep: 776.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732 [2024-06-10 05:56:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.86 | bwd_microstep: 1536.87 | bwd_inner_microstep: 1536.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3535 [2024-06-10 05:56:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1522.93 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3467 [2024-06-10 05:56:23,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.83 | optimizer_step: 6.58 [2024-06-10 05:56:23,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.52 | bwd_microstep: 5860.14 | bwd_inner_microstep: 1360.02 | bwd_allreduce_microstep: 4500.05 | step_microstep: 39.46 [2024-06-10 05:56:23,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15435.44 | bwd: 45664.25 | bwd_inner: 41163.21 | bwd_allreduce: 4500.32 | step: 41.06 {'loss': 1.3211, 'learning_rate': 3.7821834176180336e-05, 'epoch': 0.18} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462 [2024-06-10 05:56:25,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.39 | bwd_microstep: 1327.12 | bwd_inner_microstep: 1327.03 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919 [2024-06-10 05:56:26,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.98 | bwd_microstep: 716.51 | bwd_inner_microstep: 716.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 05:56:27,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 787.10 | bwd_inner_microstep: 787.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 05:56:28,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.28 | bwd_microstep: 970.36 | bwd_inner_microstep: 970.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3828 [2024-06-10 05:56:30,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.46 | bwd_microstep: 1512.50 | bwd_inner_microstep: 1512.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 05:56:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 794.76 | bwd_inner_microstep: 794.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 05:56:33,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1284.58 | bwd_inner_microstep: 1284.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 05:56:36,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.63 | bwd_microstep: 1635.93 | bwd_inner_microstep: 1635.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 05:56:37,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.59 | bwd_microstep: 791.21 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419 [2024-06-10 05:56:38,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.07 | bwd_microstep: 1279.70 | bwd_inner_microstep: 1279.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-10 05:56:40,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 780.10 | bwd_inner_microstep: 780.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 05:56:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1476.55 | bwd_inner_microstep: 1476.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-10 05:56:44,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1622.55 | bwd_inner_microstep: 1622.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3926 [2024-06-10 05:56:46,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1501.49 | bwd_inner_microstep: 1501.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096 [2024-06-10 05:56:47,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.10 | bwd_microstep: 823.39 | bwd_inner_microstep: 823.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2119 [2024-06-10 05:56:48,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.94 | bwd_microstep: 928.29 | bwd_inner_microstep: 928.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 05:56:50,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 05:56:52,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 05:56:54,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.25 | bwd_microstep: 1257.90 | bwd_inner_microstep: 1257.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619 [2024-06-10 05:56:56,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1437.79 | bwd_inner_microstep: 1437.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 05:56:58,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3617 [2024-06-10 05:56:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.96 | bwd_microstep: 1246.39 | bwd_inner_microstep: 1246.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3707 [2024-06-10 05:57:01,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1335.73 | bwd_inner_microstep: 1335.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804 [2024-06-10 05:57:03,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.82 | bwd_microstep: 1359.30 | bwd_inner_microstep: 1359.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3714 [2024-06-10 05:57:05,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.78 | bwd_microstep: 1497.15 | bwd_inner_microstep: 1497.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 05:57:07,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 05:57:10,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.64 | bwd_microstep: 1649.38 | bwd_inner_microstep: 1649.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3735 [2024-06-10 05:57:12,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.42 | bwd_microstep: 1604.81 | bwd_inner_microstep: 1604.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415 [2024-06-10 05:57:14,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.67 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 05:57:16,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2059 [2024-06-10 05:57:17,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.25 | bwd_microstep: 847.81 | bwd_inner_microstep: 847.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437 [2024-06-10 05:57:25,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.28 | optimizer_step: 6.59 [2024-06-10 05:57:25,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 7153.15 | bwd_inner_microstep: 2043.37 | bwd_allreduce_microstep: 5109.71 | step_microstep: 39.14 [2024-06-10 05:57:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15208.43 | bwd: 46178.88 | bwd_inner: 41068.16 | bwd_allreduce: 5110.00 | step: 40.81 {'loss': 1.2781, 'learning_rate': 3.7804769016167036e-05, 'epoch': 0.18} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 05:57:27,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1372.15 | bwd_inner_microstep: 1372.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876 [2024-06-10 05:57:28,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 708.16 | bwd_inner_microstep: 708.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 05:57:30,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1391.08 | bwd_inner_microstep: 1391.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-10 05:57:31,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.84 | bwd_microstep: 1241.65 | bwd_inner_microstep: 1241.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-10 05:57:32,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.56 | bwd_microstep: 794.02 | bwd_inner_microstep: 794.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 05:57:34,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1245.72 | bwd_inner_microstep: 1245.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3726 [2024-06-10 05:57:36,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1494.74 | bwd_inner_microstep: 1494.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 05:57:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1294.76 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-10 05:57:40,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1652.33 | bwd_inner_microstep: 1652.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 05:57:41,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.78 | bwd_microstep: 700.27 | bwd_inner_microstep: 700.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 05:57:43,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1388.63 | bwd_inner_microstep: 1388.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 05:57:44,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 795.68 | bwd_inner_microstep: 795.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3675 [2024-06-10 05:57:46,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.72 | bwd_microstep: 1262.23 | bwd_inner_microstep: 1262.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1971 [2024-06-10 05:57:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.39 | bwd_microstep: 826.67 | bwd_inner_microstep: 826.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970 [2024-06-10 05:57:48,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.30 | bwd_microstep: 887.76 | bwd_inner_microstep: 887.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450 [2024-06-10 05:57:50,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1410.51 | bwd_inner_microstep: 1410.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901 [2024-06-10 05:57:51,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.24 | bwd_microstep: 714.90 | bwd_inner_microstep: 714.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3947 [2024-06-10 05:57:53,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1527.98 | bwd_inner_microstep: 1527.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2124 [2024-06-10 05:57:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.70 | bwd_microstep: 862.47 | bwd_inner_microstep: 862.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3621 [2024-06-10 05:57:57,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.26 | bwd_microstep: 1541.30 | bwd_inner_microstep: 1541.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-10 05:57:59,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1417.21 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 05:58:01,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.81 | bwd_microstep: 1554.34 | bwd_inner_microstep: 1554.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-10 05:58:02,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.54 | bwd_microstep: 698.43 | bwd_inner_microstep: 698.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 05:58:04,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1292.32 | bwd_inner_microstep: 1292.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 05:58:06,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.64 | bwd_microstep: 1571.21 | bwd_inner_microstep: 1571.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555 [2024-06-10 05:58:08,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.65 | bwd_microstep: 1420.83 | bwd_inner_microstep: 1420.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3429 [2024-06-10 05:58:10,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 05:58:12,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1379.07 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 05:58:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1310.55 | bwd_inner_microstep: 1310.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 05:58:16,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.52 | bwd_microstep: 1648.38 | bwd_inner_microstep: 1648.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 05:58:18,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 05:58:25,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 05:58:25,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 6846.23 | bwd_inner_microstep: 1526.57 | bwd_allreduce_microstep: 5319.59 | step_microstep: 39.29 [2024-06-10 05:58:25,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14904.24 | bwd: 45161.45 | bwd_inner: 39840.89 | bwd_allreduce: 5319.86 | step: 40.94 17%|█▋ | 299/1726 [5:14:54<24:30:23, 61.82s/it] 17%|█▋ | 300/1726 [5:15:55<24:19:27, 61.41s/it] 17%|█▋ | 300/1726 [5:15:55<24:19:27, 61.41s/it] 17%|█▋ | 301/1726 [5:16:58<24:32:25, 62.00s/it] 17%|█▋ | 301/1726 [5:16:58<24:32:25, 62.00s/it] 17%|█▋ | 302/1726 [5:17:58<24:18:27, 61.45s/it] 17%|█▋ | 302/1726 [5:17:58<24:18:27, 61.45s/it] 18%|█▊ | 303/1726 [5:19:00<24:17:22, 61.45s/it] 18%|█▊ | 303/1726 [5:19:00<24:17:22, 61.45s/it] 18%|█▊ | 304/1726 [5:20:02<24:18:21, 61.53s/it] 18%|█▊ | 304/1726 [5:20:02<24:18:21, 61.53s/it] 18%|█▊ | 30{'loss': 1.2547, 'learning_rate': 3.7787641147849814e-05, 'epoch': 0.18} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 05:58:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.82 | bwd_microstep: 1482.51 | bwd_inner_microstep: 1482.44 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3990 [2024-06-10 05:58:29,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1603.29 | bwd_inner_microstep: 1603.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 05:58:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.56 | bwd_microstep: 1289.88 | bwd_inner_microstep: 1289.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 05:58:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1342.26 | bwd_inner_microstep: 1342.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938 [2024-06-10 05:58:34,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.71 | bwd_microstep: 727.06 | bwd_inner_microstep: 727.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 05:58:36,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1346.74 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 05:58:38,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1640.95 | bwd_inner_microstep: 1640.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 05:58:40,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-10 05:58:42,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3407 [2024-06-10 05:58:44,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.95 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723 [2024-06-10 05:58:46,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.72 | bwd_microstep: 1626.31 | bwd_inner_microstep: 1626.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 05:58:49,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.96 | bwd_microstep: 1719.17 | bwd_inner_microstep: 1719.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3649 [2024-06-10 05:58:51,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.99 | bwd_microstep: 1784.96 | bwd_inner_microstep: 1784.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3550 [2024-06-10 05:58:53,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.02 | bwd_microstep: 1546.67 | bwd_inner_microstep: 1546.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637 [2024-06-10 05:58:55,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3512 [2024-06-10 05:58:57,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.96 | bwd_microstep: 1368.91 | bwd_inner_microstep: 1368.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2972 [2024-06-10 05:58:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.55 | bwd_microstep: 1199.46 | bwd_inner_microstep: 1199.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-10 05:59:01,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.89 | bwd_microstep: 1557.31 | bwd_inner_microstep: 1557.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3624 [2024-06-10 05:59:03,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1345.75 | bwd_inner_microstep: 1345.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 05:59:05,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1281.24 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 05:59:07,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1394.16 | bwd_inner_microstep: 1394.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 05:59:09,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.34 | bwd_microstep: 1657.72 | bwd_inner_microstep: 1657.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 05:59:11,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.39 | bwd_microstep: 1160.92 | bwd_inner_microstep: 1160.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 05:59:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.77 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807 [2024-06-10 05:59:15,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.35 | bwd_microstep: 1612.09 | bwd_inner_microstep: 1612.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729 [2024-06-10 05:59:17,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1628.74 | bwd_inner_microstep: 1628.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 05:59:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-10 05:59:21,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1358.97 | bwd_inner_microstep: 1358.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 05:59:23,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1490.80 | bwd_inner_microstep: 1490.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-10 05:59:25,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1523.53 | bwd_inner_microstep: 1523.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-10 05:59:27,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1597.41 | bwd_inner_microstep: 1597.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2997 [2024-06-10 05:59:29,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.17 | optimizer_step: 6.62 [2024-06-10 05:59:29,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.72 | bwd_microstep: 1147.94 | bwd_inner_microstep: 1138.77 | bwd_allreduce_microstep: 9.12 | step_microstep: 38.34 [2024-06-10 05:59:29,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17168.24 | bwd: 46076.33 | bwd_inner: 46066.25 | bwd_allreduce: 9.37 | step: 39.94 {'loss': 1.2627, 'learning_rate': 3.7770450631552946e-05, 'epoch': 0.18} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931 [2024-06-10 05:59:30,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.60 | bwd_microstep: 823.33 | bwd_inner_microstep: 823.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3931 [2024-06-10 05:59:32,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.57 | bwd_microstep: 1593.17 | bwd_inner_microstep: 1593.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837 [2024-06-10 05:59:34,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.57 | bwd_microstep: 1657.55 | bwd_inner_microstep: 1657.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3838 [2024-06-10 05:59:37,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 05:59:38,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 05:59:40,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1389.68 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 05:59:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1285.77 | bwd_inner_microstep: 1285.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 05:59:44,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1152.13 | bwd_inner_microstep: 1152.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-10 05:59:45,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.64 | bwd_microstep: 1191.51 | bwd_inner_microstep: 1191.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 05:59:47,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1288.35 | bwd_inner_microstep: 1288.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 05:59:49,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1415.12 | bwd_inner_microstep: 1415.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431 [2024-06-10 05:59:51,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1377.09 | bwd_inner_microstep: 1377.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 05:59:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 05:59:55,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1478.36 | bwd_inner_microstep: 1478.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 05:59:57,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1378.30 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627 [2024-06-10 05:59:59,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 1580.23 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 06:00:01,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1284.76 | bwd_inner_microstep: 1284.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2294 [2024-06-10 06:00:02,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.16 | bwd_microstep: 1026.12 | bwd_inner_microstep: 1026.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2676 [2024-06-10 06:00:04,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.80 | bwd_microstep: 1025.76 | bwd_inner_microstep: 1025.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 06:00:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649 [2024-06-10 06:00:07,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1323.67 | bwd_inner_microstep: 1323.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 06:00:09,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1441.38 | bwd_inner_microstep: 1441.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3643 [2024-06-10 06:00:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.45 | bwd_microstep: 1252.03 | bwd_inner_microstep: 1252.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834 [2024-06-10 06:00:13,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1360.99 | bwd_inner_microstep: 1360.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2079 [2024-06-10 06:00:14,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.83 | bwd_microstep: 729.00 | bwd_inner_microstep: 728.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-10 06:00:15,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.89 | bwd_microstep: 983.01 | bwd_inner_microstep: 982.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3818 [2024-06-10 06:00:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.04 | bwd_microstep: 1266.82 | bwd_inner_microstep: 1266.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-10 06:00:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1338.62 | bwd_inner_microstep: 1338.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 06:00:21,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1277.94 | bwd_inner_microstep: 1277.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-10 06:00:23,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1592.39 | bwd_inner_microstep: 1592.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764 [2024-06-10 06:00:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1469.05 | bwd_inner_microstep: 1469.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3595 [2024-06-10 06:00:29,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 06:00:29,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.20 | bwd_microstep: 3561.10 | bwd_inner_microstep: 1930.96 | bwd_allreduce_microstep: 1630.08 | step_microstep: 76.73 [2024-06-10 06:00:29,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15904.41 | bwd: 44197.78 | bwd_inner: 42566.67 | bwd_allreduce: 1630.37 | step: 78.42 {'loss': 1.3195, 'learning_rate': 3.775319752782133e-05, 'epoch': 0.18} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 06:00:30,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 782.97 | bwd_inner_microstep: 782.84 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2360 [2024-06-10 06:00:32,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.49 | bwd_microstep: 989.74 | bwd_inner_microstep: 989.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3409 [2024-06-10 06:00:34,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.37 | bwd_microstep: 1278.33 | bwd_inner_microstep: 1278.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 06:00:35,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1280.98 | bwd_inner_microstep: 1280.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3504 [2024-06-10 06:00:37,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.21 | bwd_microstep: 1190.61 | bwd_inner_microstep: 1190.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 06:00:39,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 1532.03 | bwd_inner_microstep: 1532.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2201 [2024-06-10 06:00:40,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.55 | bwd_microstep: 956.53 | bwd_inner_microstep: 956.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 750 [2024-06-10 06:00:41,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 129.19 | bwd_microstep: 302.14 | bwd_inner_microstep: 302.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 06:00:42,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 803.13 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940 [2024-06-10 06:00:43,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 891.28 | bwd_inner_microstep: 891.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626 [2024-06-10 06:00:45,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1538.36 | bwd_inner_microstep: 1538.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-10 06:00:48,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1613.86 | bwd_inner_microstep: 1613.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657 [2024-06-10 06:00:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.67 | bwd_microstep: 1612.60 | bwd_inner_microstep: 1612.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432 [2024-06-10 06:00:52,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1285.10 | bwd_inner_microstep: 1285.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-10 06:00:53,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1287.84 | bwd_inner_microstep: 1287.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466 [2024-06-10 06:00:55,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.81 | bwd_microstep: 1213.66 | bwd_inner_microstep: 1213.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643 [2024-06-10 06:00:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.27 | bwd_microstep: 1515.30 | bwd_inner_microstep: 1515.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 06:00:59,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.43 | bwd_microstep: 1295.24 | bwd_inner_microstep: 1295.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 06:01:01,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.36 | bwd_microstep: 1184.47 | bwd_inner_microstep: 1184.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3612 [2024-06-10 06:01:02,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.33 | bwd_microstep: 1217.36 | bwd_inner_microstep: 1217.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817 [2024-06-10 06:01:04,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-10 06:01:06,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.32 | bwd_microstep: 1298.47 | bwd_inner_microstep: 1298.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 06:01:08,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1258.85 | bwd_inner_microstep: 1258.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 06:01:09,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1258.11 | bwd_inner_microstep: 1258.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 06:01:11,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1291.74 | bwd_inner_microstep: 1291.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 06:01:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.53 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 06:01:15,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1557.94 | bwd_inner_microstep: 1557.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 06:01:17,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1545.21 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757 [2024-06-10 06:01:20,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.67 | bwd_microstep: 1638.38 | bwd_inner_microstep: 1638.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 06:01:22,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1595.68 | bwd_inner_microstep: 1595.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3950 [2024-06-10 06:01:24,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.72 | bwd_microstep: 1825.74 | bwd_inner_microstep: 1825.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3698 [2024-06-10 06:01:32,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 06:01:32,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.38 | bwd_microstep: 6669.74 | bwd_inner_microstep: 1781.81 | bwd_allreduce_microstep: 4887.87 | step_microstep: 39.00 [2024-06-10 06:01:32,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15511.83 | bwd: 46385.00 | bwd_inner: 41496.12 | bwd_allreduce: 4888.15 | step: 40.71 {'loss': 1.3156, 'learning_rate': 3.7735881897420315e-05, 'epoch': 0.18} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 06:01:33,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.90 | bwd_microstep: 1275.38 | bwd_inner_microstep: 1275.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 06:01:35,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1375.74 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2730 [2024-06-10 06:01:37,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.57 | bwd_microstep: 994.04 | bwd_inner_microstep: 994.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 06:01:38,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.58 | bwd_microstep: 712.29 | bwd_inner_microstep: 712.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 06:01:39,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1346.26 | bwd_inner_microstep: 1346.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567 [2024-06-10 06:01:41,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1333.34 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3581 [2024-06-10 06:01:43,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.45 | bwd_microstep: 1208.15 | bwd_inner_microstep: 1208.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493 [2024-06-10 06:01:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1191.79 | bwd_inner_microstep: 1191.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430 [2024-06-10 06:01:46,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.96 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 06:01:47,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.68 | bwd_microstep: 792.64 | bwd_inner_microstep: 792.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416 [2024-06-10 06:01:49,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.80 | bwd_microstep: 1213.34 | bwd_inner_microstep: 1213.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140 [2024-06-10 06:01:50,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.24 | bwd_microstep: 927.97 | bwd_inner_microstep: 927.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 06:01:52,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.23 | bwd_inner_microstep: 1251.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-10 06:01:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 06:01:56,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1451.20 | bwd_inner_microstep: 1451.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639 [2024-06-10 06:01:58,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.15 | bwd_microstep: 1577.50 | bwd_inner_microstep: 1577.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 06:02:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 06:02:03,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.00 | bwd_microstep: 1613.65 | bwd_inner_microstep: 1613.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3644 [2024-06-10 06:02:04,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917 [2024-06-10 06:02:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.82 | bwd_microstep: 688.45 | bwd_inner_microstep: 688.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-10 06:02:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.59 | bwd_microstep: 1430.27 | bwd_inner_microstep: 1430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 06:02:09,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.23 | bwd_inner_microstep: 1279.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 06:02:11,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1438.72 | bwd_inner_microstep: 1438.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3602 [2024-06-10 06:02:13,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1369.16 | bwd_inner_microstep: 1369.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 06:02:15,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1454.32 | bwd_inner_microstep: 1454.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 06:02:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.16 | bwd_microstep: 1405.20 | bwd_inner_microstep: 1405.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 06:02:19,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1376.30 | bwd_inner_microstep: 1376.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 06:02:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1476.79 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 06:02:23,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.13 | bwd_microstep: 1288.04 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3066 [2024-06-10 06:02:24,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.97 | bwd_microstep: 1140.13 | bwd_inner_microstep: 1140.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426 [2024-06-10 06:02:26,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.60 | bwd_microstep: 1316.25 | bwd_inner_microstep: 1316.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811 [2024-06-10 06:02:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 06:02:32,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.50 | bwd_microstep: 5104.37 | bwd_inner_microstep: 1987.75 | bwd_allreduce_microstep: 3116.56 | step_microstep: 38.88 [2024-06-10 06:02:32,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15500.15 | bwd: 44509.87 | bwd_inner: 41392.35 | bwd_allreduce: 3116.81 | step: 40.50 {'loss': 1.3333, 'learning_rate': 3.771850380133545e-05, 'epoch': 0.18} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 06:02:34,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1372.66 | bwd_inner_microstep: 1372.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068 [2024-06-10 06:02:35,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.98 | bwd_microstep: 816.84 | bwd_inner_microstep: 816.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2447 [2024-06-10 06:02:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.53 | bwd_microstep: 1014.43 | bwd_inner_microstep: 1014.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746 [2024-06-10 06:02:39,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.56 | bwd_microstep: 1638.31 | bwd_inner_microstep: 1638.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3741 [2024-06-10 06:02:41,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1465.23 | bwd_inner_microstep: 1465.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 06:02:43,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1388.64 | bwd_inner_microstep: 1388.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-10 06:02:44,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.68 | bwd_microstep: 1159.57 | bwd_inner_microstep: 1159.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 06:02:46,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 06:02:48,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.31 | bwd_microstep: 1634.48 | bwd_inner_microstep: 1634.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 06:02:50,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1417.06 | bwd_inner_microstep: 1417.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 06:02:52,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1405.10 | bwd_inner_microstep: 1405.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524 [2024-06-10 06:02:54,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1326.37 | bwd_inner_microstep: 1326.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 06:02:56,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1485.47 | bwd_inner_microstep: 1485.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 06:02:58,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1447.79 | bwd_inner_microstep: 1447.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 06:03:00,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1241.52 | bwd_inner_microstep: 1241.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 06:03:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.74 | bwd_microstep: 1487.84 | bwd_inner_microstep: 1487.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3680 [2024-06-10 06:03:04,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1450.19 | bwd_inner_microstep: 1450.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3611 [2024-06-10 06:03:06,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.01 | bwd_microstep: 1371.24 | bwd_inner_microstep: 1371.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3492 [2024-06-10 06:03:08,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.12 | bwd_microstep: 1714.74 | bwd_inner_microstep: 1714.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 06:03:10,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.42 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 06:03:11,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 976.74 | bwd_inner_microstep: 976.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 06:03:13,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.99 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744 [2024-06-10 06:03:15,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.56 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 06:03:17,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1498.77 | bwd_inner_microstep: 1498.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3610 [2024-06-10 06:03:19,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1311.53 | bwd_inner_microstep: 1311.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 06:03:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.03 | bwd_microstep: 1287.58 | bwd_inner_microstep: 1287.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777 [2024-06-10 06:03:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.07 | bwd_microstep: 1352.07 | bwd_inner_microstep: 1352.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2054 [2024-06-10 06:03:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.29 | bwd_microstep: 867.00 | bwd_inner_microstep: 866.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 06:03:26,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.76 | bwd_microstep: 1649.10 | bwd_inner_microstep: 1649.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 06:03:28,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1492.81 | bwd_inner_microstep: 1492.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3482 [2024-06-10 06:03:30,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.56 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 06:03:34,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.62 [2024-06-10 06:03:34,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.49 | bwd_microstep: 2908.84 | bwd_inner_microstep: 2012.43 | bwd_allreduce_microstep: 896.37 | step_microstep: 38.43 [2024-06-10 06:03:34,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16396.50 | bwd: 45134.01 | bwd_inner: 44236.68 | bwd_allreduce: 896.63 | step: 40.03 {'loss': 1.2753, 'learning_rate': 3.770106330077231e-05, 'epoch': 0.18} 5/1726 [5:21:02<24:09:20, 61.20s/it] 18%|█▊ | 305/1726 [5:21:02<24:09:20, 61.20s/it] 18%|█▊ | 306/1726 [5:22:06<24:25:24, 61.92s/it] 18%|█▊ | 306/1726 [5:22:06<24:25:24, 61.92s/it] 18%|█▊ | 307/1726 [5:23:06<24:14:15, 61.49s/it] 18%|█▊ | 307/1726 [5:23:06<24:14:15, 61.49s/it] 18%|█▊ | 308/1726 [5:24:08<24:18:36, 61.72s/it] 18%|█▊ | 308/1726 [5:24:08<24:18:36, 61.72s/it] 18%|█▊ | 309/1726 [5:25:09<24:07:54, 61.31s/it] 18%|█▊ | 309/1726 [5:25:09<24:07:54, 61.31s/it] 18%|█▊ | 310/1726 [5:26:11<24:10:56, 61.48s/it] 18%|█▊ | 310/1726 [5:26:11<24:10:dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959 [2024-06-10 06:03:35,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.80 | bwd_microstep: 891.17 | bwd_inner_microstep: 891.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 06:03:37,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1375.29 | bwd_inner_microstep: 1375.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858 [2024-06-10 06:03:39,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.11 | bwd_microstep: 1562.04 | bwd_inner_microstep: 1562.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-10 06:03:41,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1439.89 | bwd_inner_microstep: 1439.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 06:03:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 06:03:45,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.64 | bwd_microstep: 1633.51 | bwd_inner_microstep: 1633.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873 [2024-06-10 06:03:46,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.51 | bwd_microstep: 681.13 | bwd_inner_microstep: 681.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3420 [2024-06-10 06:03:48,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.74 | bwd_microstep: 1214.01 | bwd_inner_microstep: 1213.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-10 06:03:49,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.71 | bwd_microstep: 1154.79 | bwd_inner_microstep: 1154.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 06:03:52,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.39 | bwd_microstep: 1627.37 | bwd_inner_microstep: 1627.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 06:03:54,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.38 | bwd_microstep: 1523.97 | bwd_inner_microstep: 1523.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678 [2024-06-10 06:03:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.96 | bwd_microstep: 1450.90 | bwd_inner_microstep: 1450.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933 [2024-06-10 06:03:57,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.38 | bwd_microstep: 1204.11 | bwd_inner_microstep: 1204.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 06:03:59,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.00 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3408 [2024-06-10 06:04:01,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.77 | bwd_microstep: 1370.77 | bwd_inner_microstep: 1370.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3887 [2024-06-10 06:04:04,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.34 | bwd_microstep: 1890.92 | bwd_inner_microstep: 1890.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 06:04:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2665 [2024-06-10 06:04:07,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.01 | bwd_microstep: 1025.34 | bwd_inner_microstep: 1025.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 06:04:09,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384 [2024-06-10 06:04:11,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1243.48 | bwd_inner_microstep: 1243.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 06:04:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.92 | bwd_microstep: 1352.62 | bwd_inner_microstep: 1352.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3810 [2024-06-10 06:04:15,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1596.90 | bwd_inner_microstep: 1596.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-10 06:04:17,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1432.06 | bwd_inner_microstep: 1432.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 06:04:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 804.04 | bwd_inner_microstep: 804.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3789 [2024-06-10 06:04:20,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1618.53 | bwd_inner_microstep: 1618.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570 [2024-06-10 06:04:22,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.81 | bwd_microstep: 1433.01 | bwd_inner_microstep: 1432.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192 [2024-06-10 06:04:23,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.58 | bwd_microstep: 797.08 | bwd_inner_microstep: 797.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 06:04:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3807 [2024-06-10 06:04:27,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 06:04:29,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.24 | bwd_microstep: 1504.22 | bwd_inner_microstep: 1504.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 06:04:31,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1287.17 | bwd_inner_microstep: 1287.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575 [2024-06-10 06:04:35,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 06:04:35,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 3673.04 | bwd_inner_microstep: 1683.65 | bwd_allreduce_microstep: 1989.34 | step_microstep: 38.85 [2024-06-10 06:04:35,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16029.77 | bwd: 45007.10 | bwd_inner: 43016.81 | bwd_allreduce: 1989.59 | step: 40.52 {'loss': 1.2932, 'learning_rate': 3.768356045715624e-05, 'epoch': 0.18} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 06:04:37,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1491.14 | bwd_inner_microstep: 1490.98 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860 [2024-06-10 06:04:39,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.95 | bwd_microstep: 1561.66 | bwd_inner_microstep: 1561.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2338 [2024-06-10 06:04:41,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.23 | bwd_microstep: 985.69 | bwd_inner_microstep: 985.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795 [2024-06-10 06:04:43,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.65 | bwd_microstep: 1448.12 | bwd_inner_microstep: 1448.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 06:04:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.95 | bwd_microstep: 1633.77 | bwd_inner_microstep: 1633.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-10 06:04:47,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.23 | bwd_microstep: 1636.10 | bwd_inner_microstep: 1636.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 06:04:49,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 06:04:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1419.30 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 06:04:53,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1291.94 | bwd_inner_microstep: 1291.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 06:04:55,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1346.80 | bwd_inner_microstep: 1346.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2214 [2024-06-10 06:04:56,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.90 | bwd_microstep: 896.19 | bwd_inner_microstep: 896.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 06:04:58,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1346.72 | bwd_inner_microstep: 1346.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 06:05:00,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519 [2024-06-10 06:05:02,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.37 | bwd_microstep: 1588.53 | bwd_inner_microstep: 1588.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2922 [2024-06-10 06:05:03,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.92 | bwd_microstep: 1129.02 | bwd_inner_microstep: 1128.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 06:05:04,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.99 | bwd_microstep: 729.54 | bwd_inner_microstep: 729.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 06:05:06,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1291.10 | bwd_inner_microstep: 1291.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077 [2024-06-10 06:05:07,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.52 | bwd_microstep: 823.74 | bwd_inner_microstep: 823.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 06:05:09,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.98 | bwd_microstep: 1255.79 | bwd_inner_microstep: 1255.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1584 [2024-06-10 06:05:10,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.00 | bwd_microstep: 572.26 | bwd_inner_microstep: 572.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3538 [2024-06-10 06:05:12,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 1329.90 | bwd_inner_microstep: 1329.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-10 06:05:14,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1445.65 | bwd_inner_microstep: 1445.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 06:05:15,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 698.82 | bwd_inner_microstep: 698.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2398 [2024-06-10 06:05:16,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.82 | bwd_microstep: 1125.56 | bwd_inner_microstep: 1125.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3821 [2024-06-10 06:05:18,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.50 | bwd_microstep: 1691.10 | bwd_inner_microstep: 1691.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 06:05:21,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.10 | bwd_microstep: 1659.87 | bwd_inner_microstep: 1659.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3704 [2024-06-10 06:05:23,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.97 | bwd_microstep: 1677.76 | bwd_inner_microstep: 1677.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282 [2024-06-10 06:05:25,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.09 | bwd_microstep: 1075.54 | bwd_inner_microstep: 1075.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-10 06:05:27,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1459.93 | bwd_inner_microstep: 1459.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422 [2024-06-10 06:05:29,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1540.42 | bwd_inner_microstep: 1540.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777 [2024-06-10 06:05:31,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.13 | bwd_microstep: 1548.85 | bwd_inner_microstep: 1548.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930 [2024-06-10 06:05:38,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 06:05:38,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 6557.09 | bwd_inner_microstep: 786.86 | bwd_allreduce_microstep: 5770.17 | step_microstep: 39.32 [2024-06-10 06:05:38,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15326.15 | bwd: 46856.70 | bwd_inner: 41085.48 | bwd_allreduce: 5770.48 | step: 40.95 {'loss': 1.3306, 'learning_rate': 3.766599533213218e-05, 'epoch': 0.18} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 06:05:40,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.59 | bwd_microstep: 1462.67 | bwd_inner_microstep: 1462.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913 [2024-06-10 06:05:42,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1587.25 | bwd_inner_microstep: 1587.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 06:05:43,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.41 | bwd_microstep: 786.26 | bwd_inner_microstep: 786.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 06:05:45,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.16 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 06:05:47,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 06:05:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1381.95 | bwd_inner_microstep: 1381.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-10 06:05:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.28 | bwd_microstep: 700.31 | bwd_inner_microstep: 700.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053 [2024-06-10 06:05:51,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.81 | bwd_microstep: 817.96 | bwd_inner_microstep: 817.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 06:05:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2206 [2024-06-10 06:05:54,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.58 | bwd_microstep: 961.89 | bwd_inner_microstep: 961.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3690 [2024-06-10 06:05:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.77 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3627 [2024-06-10 06:05:58,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.20 | bwd_microstep: 1373.98 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 06:06:00,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785 [2024-06-10 06:06:02,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1610.88 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3463 [2024-06-10 06:06:04,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.71 | bwd_microstep: 1502.87 | bwd_inner_microstep: 1502.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3649 [2024-06-10 06:06:06,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3530 [2024-06-10 06:06:08,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 1559.46 | bwd_inner_microstep: 1559.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093 [2024-06-10 06:06:09,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.68 | bwd_microstep: 921.79 | bwd_inner_microstep: 921.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 06:06:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1279.30 | bwd_inner_microstep: 1279.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527 [2024-06-10 06:06:13,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.41 | bwd_microstep: 1328.56 | bwd_inner_microstep: 1328.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3657 [2024-06-10 06:06:15,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.00 | bwd_microstep: 1354.17 | bwd_inner_microstep: 1354.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2132 [2024-06-10 06:06:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.93 | bwd_microstep: 931.02 | bwd_inner_microstep: 930.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085 [2024-06-10 06:06:17,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.85 | bwd_microstep: 853.06 | bwd_inner_microstep: 853.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 06:06:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3609 [2024-06-10 06:06:21,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.77 | bwd_microstep: 1588.30 | bwd_inner_microstep: 1588.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 06:06:23,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1510.28 | bwd_inner_microstep: 1510.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 06:06:25,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.14 | bwd_microstep: 1484.07 | bwd_inner_microstep: 1484.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625 [2024-06-10 06:06:27,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1536.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-10 06:06:29,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 795.33 | bwd_inner_microstep: 795.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675 [2024-06-10 06:06:31,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1543.67 | bwd_inner_microstep: 1543.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597 [2024-06-10 06:06:33,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1309.17 | bwd_inner_microstep: 1309.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3803 [2024-06-10 06:06:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.26 | optimizer_step: 6.62 [2024-06-10 06:06:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 6692.21 | bwd_inner_microstep: 1524.32 | bwd_allreduce_microstep: 5167.83 | step_microstep: 39.04 [2024-06-10 06:06:40,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15373.58 | bwd: 46338.18 | bwd_inner: 41169.44 | bwd_allreduce: 5168.06 | step: 40.65 {'loss': 1.3176, 'learning_rate': 3.764836798756439e-05, 'epoch': 0.18} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 06:06:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 1497.04 | bwd_inner_microstep: 1497.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414 [2024-06-10 06:06:44,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.31 | bwd_microstep: 1206.96 | bwd_inner_microstep: 1206.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 06:06:45,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.82 | bwd_microstep: 1282.09 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 06:06:47,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1243.21 | bwd_inner_microstep: 1243.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-10 06:06:49,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1531.31 | bwd_inner_microstep: 1531.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-10 06:06:50,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.90 | bwd_microstep: 779.68 | bwd_inner_microstep: 779.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 06:06:52,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1146.79 | bwd_inner_microstep: 1146.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 06:06:53,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.09 | bwd_microstep: 680.13 | bwd_inner_microstep: 680.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3713 [2024-06-10 06:06:55,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1560.42 | bwd_inner_microstep: 1560.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 06:06:57,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.26 | bwd_microstep: 1293.40 | bwd_inner_microstep: 1293.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408 [2024-06-10 06:06:58,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1309.48 | bwd_inner_microstep: 1309.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 06:07:00,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587 [2024-06-10 06:07:03,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1574.00 | bwd_inner_microstep: 1573.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3497 [2024-06-10 06:07:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.43 | bwd_microstep: 1645.36 | bwd_inner_microstep: 1645.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 06:07:07,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1350.98 | bwd_inner_microstep: 1350.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969 [2024-06-10 06:07:08,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.70 | bwd_microstep: 703.16 | bwd_inner_microstep: 703.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 06:07:09,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 795.43 | bwd_inner_microstep: 795.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492 [2024-06-10 06:07:10,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1189.29 | bwd_inner_microstep: 1189.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 06:07:12,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1426.64 | bwd_inner_microstep: 1426.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3619 [2024-06-10 06:07:14,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1373.58 | bwd_inner_microstep: 1373.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-10 06:07:16,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.94 | bwd_microstep: 1289.24 | bwd_inner_microstep: 1289.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-10 06:07:18,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 06:07:20,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.14 | bwd_microstep: 1296.26 | bwd_inner_microstep: 1296.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 06:07:22,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3820 [2024-06-10 06:07:24,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1401.35 | bwd_inner_microstep: 1401.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-10 06:07:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1504.68 | bwd_inner_microstep: 1504.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3806 [2024-06-10 06:07:28,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.52 | bwd_microstep: 1687.07 | bwd_inner_microstep: 1687.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2045 [2024-06-10 06:07:29,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.17 | bwd_microstep: 1003.52 | bwd_inner_microstep: 1003.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 06:07:31,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1355.26 | bwd_inner_microstep: 1355.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588 [2024-06-10 06:07:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1702.99 | bwd_inner_microstep: 1702.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563 [2024-06-10 06:07:36,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1422.26 | bwd_inner_microstep: 1422.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 06:07:40,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.26 | optimizer_step: 6.61 [2024-06-10 06:07:40,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 4120.18 | bwd_inner_microstep: 1742.20 | bwd_allreduce_microstep: 2377.92 | step_microstep: 39.00 [2024-06-10 06:07:40,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.06 | bwd: 44550.00 | bwd_inner: 42171.17 | bwd_allreduce: 2378.15 | step: 40.63 {'loss': 1.3222, 'learning_rate': 3.763067848553629e-05, 'epoch': 0.18} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2911 [2024-06-10 06:07:42,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.99 | bwd_microstep: 1177.96 | bwd_inner_microstep: 1177.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1935 [2024-06-10 06:07:43,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.41 | bwd_microstep: 850.33 | bwd_inner_microstep: 850.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2305 [2024-06-10 06:07:44,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 881.46 | bwd_inner_microstep: 881.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931 [2024-06-10 06:07:46,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.53 | bwd_microstep: 819.64 | bwd_inner_microstep: 819.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895 [2024-06-10 06:07:47,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.00 | bwd_microstep: 683.29 | bwd_inner_microstep: 683.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2450 [2024-06-10 06:07:48,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.13 | bwd_microstep: 977.64 | bwd_inner_microstep: 977.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 06:07:50,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 06:07:52,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.85 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408 [2024-06-10 06:07:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.36 | bwd_microstep: 1294.62 | bwd_inner_microstep: 1294.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 06:07:55,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1343.44 | bwd_inner_microstep: 1343.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478 [2024-06-10 06:07:57,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.49 | bwd_microstep: 1412.69 | bwd_inner_microstep: 1412.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-10 06:07:58,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.75 | bwd_microstep: 891.00 | bwd_inner_microstep: 890.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489 [2024-06-10 06:08:01,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1575.25 | bwd_inner_microstep: 1575.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3945 [2024-06-10 06:08:03,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.70 | bwd_microstep: 1690.13 | bwd_inner_microstep: 1690.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 06:08:05,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.85 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 06:08:06,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.70 | bwd_microstep: 1288.05 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 06:08:08,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 06:08:10,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1376.14 | bwd_inner_microstep: 1376.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-10 06:08:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1416.93 | bwd_inner_microstep: 1416.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821 [2024-06-10 06:08:14,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1421.19 | bwd_inner_microstep: 1421.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535 [2024-06-10 06:08:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1450.92 | bwd_inner_microstep: 1450.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2522 [2024-06-10 06:08:18,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.95 | bwd_microstep: 1025.89 | bwd_inner_microstep: 1025.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681 [2024-06-10 06:08:20,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 06:08:22,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.22 | bwd_microstep: 1722.04 | bwd_inner_microstep: 1722.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3825 [2024-06-10 06:08:24,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 06:08:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1492.26 | bwd_inner_microstep: 1492.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 06:08:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.03 | bwd_microstep: 1451.38 | bwd_inner_microstep: 1451.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3729 [2024-06-10 06:08:30,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.06 | bwd_microstep: 1565.67 | bwd_inner_microstep: 1565.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-10 06:08:33,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.39 | bwd_microstep: 1537.80 | bwd_inner_microstep: 1537.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 06:08:35,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1403.62 | bwd_inner_microstep: 1403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-10 06:08:37,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.49 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058 [2024-06-10 06:08:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.26 | optimizer_step: 6.59 [2024-06-10 06:08:41,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.02 | bwd_microstep: 3794.27 | bwd_inner_microstep: 1044.63 | bwd_allreduce_microstep: 2749.59 | step_microstep: 40.78 [2024-06-10 06:08:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15571.40 | bwd: 44383.21 | bwd_inner: 41632.67 | bwd_allreduce: 2749.82 | step: 42.45 {'loss': 1.2503, 'learning_rate': 3.7612926888350216e-05, 'epoch': 0.18} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4670 [2024-06-10 06:08:43,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 732.69 | bwd_microstep: 1957.39 | bwd_inner_microstep: 1957.17 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 06:08:45,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 06:08:48,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.72 | bwd_microstep: 1651.00 | bwd_inner_microstep: 1650.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 06:08:50,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1482.92 | bwd_inner_microstep: 1482.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 06:08:51,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2217 [2024-06-10 06:08:53,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 893.29 | bwd_inner_microstep: 893.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 06:08:55,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.98 | bwd_microstep: 1430.16 | bwd_inner_microstep: 1430.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4230 [2024-06-10 06:08:57,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1562.49 | bwd_inner_microstep: 1562.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 06:08:58,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.61 | bwd_microstep: 679.64 | bwd_inner_microstep: 679.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507 [2024-06-10 06:08:59,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.93 | bwd_microstep: 1195.39 | bwd_inner_microstep: 1195.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569 [2024-06-10 06:09:01,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.39 | bwd_microstep: 1206.45 | bwd_inner_microstep: 1206.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 06:09:03,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975 [2024-06-10 06:09:04,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 889.82 | bwd_inner_microstep: 889.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513 [2024-06-10 06:09:06,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1446.72 | bwd_inner_microstep: 1446.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-10 06:09:09,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.97 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147 [2024-06-10 06:09:10,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 950.38 | bwd_inner_microstep: 950.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 06:09:12,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.55 | bwd_microstep: 1485.97 | bwd_inner_microstep: 1485.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385 [2024-06-10 06:09:14,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.11 | bwd_microstep: 1243.30 | bwd_inner_microstep: 1243.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 06:09:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1451.84 | bwd_inner_microstep: 1451.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983 [2024-06-10 06:09:17,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.86 | bwd_microstep: 896.19 | bwd_inner_microstep: 896.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3715 [2024-06-10 06:09:19,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.19 | bwd_microstep: 1271.80 | bwd_inner_microstep: 1271.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1983 [2024-06-10 06:09:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.58 | bwd_microstep: 751.69 | bwd_inner_microstep: 751.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3709 [2024-06-10 06:09:21,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1333.84 | bwd_inner_microstep: 1333.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-10 06:09:23,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.75 | bwd_microstep: 1358.43 | bwd_inner_microstep: 1358.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 06:09:26,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1559.61 | bwd_inner_microstep: 1559.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054 [2024-06-10 06:09:27,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.55 | bwd_microstep: 917.18 | bwd_inner_microstep: 917.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053 [2024-06-10 06:09:28,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.35 | bwd_microstep: 915.24 | bwd_inner_microstep: 915.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2189 [2024-06-10 06:09:29,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.68 | bwd_microstep: 810.89 | bwd_inner_microstep: 810.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615 [2024-06-10 06:09:31,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1445.11 | bwd_inner_microstep: 1445.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 06:09:33,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1550.66 | bwd_inner_microstep: 1550.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3737 [2024-06-10 06:09:35,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.70 | bwd_microstep: 1340.59 | bwd_inner_microstep: 1340.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3755 [2024-06-10 06:09:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.27 | optimizer_step: 6.59 [2024-06-10 06:09:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 4903.69 | bwd_inner_microstep: 1549.07 | bwd_allreduce_microstep: 3354.57 | step_microstep: 38.90 [2024-06-10 06:09:41,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15295.68 | bwd: 44294.47 | bwd_inner: 40938.83 | bwd_allreduce: 3354.89 | step: 40.63 {'loss': 1.3758, 'learning_rate': 3.7595113258527206e-05, 'epoch': 0.18} 56, 61.48s/it] 18%|█▊ | 311/1726 [5:27:12<24:09:13, 61.45s/it] 18%|█▊ | 311/1726 [5:27:12<24:09:13, 61.45s/it] 18%|█▊ | 312/1726 [5:28:14<24:15:51, 61.78s/it] 18%|█▊ | 312/1726 [5:28:14<24:15:51, 61.78s/it] 18%|█▊ | 313/1726 [5:29:17<24:16:48, 61.86s/it] 18%|█▊ | 313/1726 [5:29:17<24:16:48, 61.86s/it] 18%|█▊ | 314/1726 [5:30:17<24:07:04, 61.49s/it] 18%|█▊ | 314/1726 [5:30:17<24:07:04, 61.49s/it] 18%|█▊ | 315/1726 [5:31:17<23:57:43, 61.14s/it] 18%|█▊ | 315/1726 [5:31:17<23:57:43, 61.14s/it] 18%|█▊ | 316/1726 [5:32:17<23:48:18, 60.78s/it] 18%|�dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 06:09:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 789.20 | bwd_inner_microstep: 789.05 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 06:09:44,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1393.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 06:09:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1382.21 | bwd_inner_microstep: 1382.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 06:09:48,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1453.54 | bwd_inner_microstep: 1453.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 06:09:49,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1250.48 | bwd_inner_microstep: 1250.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 06:09:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1283.37 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580 [2024-06-10 06:09:53,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1304.15 | bwd_inner_microstep: 1304.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 06:09:55,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 06:09:56,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1222.40 | bwd_inner_microstep: 1222.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966 [2024-06-10 06:09:58,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.46 | bwd_microstep: 825.60 | bwd_inner_microstep: 825.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 06:10:00,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1522.83 | bwd_inner_microstep: 1522.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-10 06:10:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.69 | bwd_microstep: 1519.52 | bwd_inner_microstep: 1519.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491 [2024-06-10 06:10:04,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.96 | bwd_microstep: 1584.44 | bwd_inner_microstep: 1584.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3651 [2024-06-10 06:10:06,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.31 | bwd_microstep: 1685.64 | bwd_inner_microstep: 1685.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 06:10:08,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1507.05 | bwd_inner_microstep: 1507.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3555 [2024-06-10 06:10:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.36 | bwd_microstep: 1663.88 | bwd_inner_microstep: 1663.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 06:10:12,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1283.63 | bwd_inner_microstep: 1283.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 06:10:14,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1515.33 | bwd_inner_microstep: 1515.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 06:10:16,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1354.55 | bwd_inner_microstep: 1354.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 06:10:18,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304 [2024-06-10 06:10:20,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.13 | bwd_microstep: 980.52 | bwd_inner_microstep: 980.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 06:10:22,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.30 | bwd_microstep: 1660.47 | bwd_inner_microstep: 1660.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 06:10:23,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.50 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-10 06:10:25,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.05 | bwd_microstep: 975.84 | bwd_inner_microstep: 975.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 06:10:27,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.91 | bwd_microstep: 1399.18 | bwd_inner_microstep: 1399.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 06:10:29,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.06 | bwd_microstep: 1624.74 | bwd_inner_microstep: 1624.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3597 [2024-06-10 06:10:31,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1557.60 | bwd_inner_microstep: 1557.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3815 [2024-06-10 06:10:33,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1440.57 | bwd_inner_microstep: 1440.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678 [2024-06-10 06:10:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.14 | bwd_microstep: 1456.98 | bwd_inner_microstep: 1456.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-10 06:10:36,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 877.73 | bwd_inner_microstep: 877.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 06:10:38,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1543.30 | bwd_inner_microstep: 1543.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767 [2024-06-10 06:10:42,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 06:10:42,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.55 | bwd_microstep: 3323.73 | bwd_inner_microstep: 1786.17 | bwd_allreduce_microstep: 1537.51 | step_microstep: 38.73 [2024-06-10 06:10:42,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16197.77 | bwd: 45024.63 | bwd_inner: 43486.09 | bwd_allreduce: 1537.79 | step: 40.45 {'loss': 1.2901, 'learning_rate': 3.757723765880677e-05, 'epoch': 0.18} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 06:10:44,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1480.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4193 [2024-06-10 06:10:47,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.33 | bwd_microstep: 1751.70 | bwd_inner_microstep: 1751.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3841 [2024-06-10 06:10:49,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.98 | bwd_microstep: 1392.89 | bwd_inner_microstep: 1392.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 06:10:51,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.11 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3774 [2024-06-10 06:10:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1502.69 | bwd_inner_microstep: 1502.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 06:10:55,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.98 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 06:10:56,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1248.06 | bwd_inner_microstep: 1248.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 06:10:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1387.46 | bwd_inner_microstep: 1387.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410 [2024-06-10 06:11:00,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.55 | bwd_microstep: 1373.04 | bwd_inner_microstep: 1373.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 06:11:02,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3007 [2024-06-10 06:11:04,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.96 | bwd_microstep: 1201.92 | bwd_inner_microstep: 1201.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 06:11:06,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.45 | bwd_microstep: 1489.96 | bwd_inner_microstep: 1489.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1967 [2024-06-10 06:11:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.38 | bwd_microstep: 828.13 | bwd_inner_microstep: 828.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 06:11:09,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1345.12 | bwd_inner_microstep: 1345.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 06:11:11,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 06:11:13,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1408.13 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2123 [2024-06-10 06:11:14,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.71 | bwd_microstep: 941.75 | bwd_inner_microstep: 941.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678 [2024-06-10 06:11:16,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.14 | bwd_microstep: 1421.69 | bwd_inner_microstep: 1421.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 06:11:18,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1375.98 | bwd_inner_microstep: 1375.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680 [2024-06-10 06:11:20,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1620.03 | bwd_inner_microstep: 1620.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-10 06:11:22,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.37 | bwd_microstep: 1613.71 | bwd_inner_microstep: 1613.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-10 06:11:24,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1186.59 | bwd_inner_microstep: 1186.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 06:11:26,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.05 | bwd_microstep: 1395.27 | bwd_inner_microstep: 1395.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3748 [2024-06-10 06:11:28,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.89 | bwd_microstep: 1542.89 | bwd_inner_microstep: 1542.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-10 06:11:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 915.25 | bwd_inner_microstep: 915.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-10 06:11:31,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.90 | bwd_microstep: 1347.59 | bwd_inner_microstep: 1347.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755 [2024-06-10 06:11:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.49 | bwd_microstep: 1346.33 | bwd_inner_microstep: 1346.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766 [2024-06-10 06:11:35,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 1347.87 | bwd_inner_microstep: 1347.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567 [2024-06-10 06:11:37,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.44 | bwd_microstep: 1334.85 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 06:11:39,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.84 | bwd_microstep: 1655.13 | bwd_inner_microstep: 1655.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 06:11:41,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1282.47 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 06:11:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 06:11:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.99 | bwd_microstep: 3693.44 | bwd_inner_microstep: 1703.25 | bwd_allreduce_microstep: 1990.14 | step_microstep: 38.55 [2024-06-10 06:11:45,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16432.07 | bwd: 45969.88 | bwd_inner: 43978.80 | bwd_allreduce: 1990.37 | step: 40.20 {'loss': 1.3099, 'learning_rate': 3.7559300152146665e-05, 'epoch': 0.18} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 06:11:47,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1278.10 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457 [2024-06-10 06:11:49,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.57 | bwd_microstep: 1341.21 | bwd_inner_microstep: 1341.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798 [2024-06-10 06:11:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.50 | bwd_microstep: 1651.22 | bwd_inner_microstep: 1651.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 06:11:53,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.72 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 06:11:55,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.47 | bwd_microstep: 1404.57 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735 [2024-06-10 06:11:57,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.69 | bwd_microstep: 1430.35 | bwd_inner_microstep: 1430.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 06:11:59,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 06:12:00,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1253.66 | bwd_inner_microstep: 1253.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3417 [2024-06-10 06:12:02,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1397.10 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1970 [2024-06-10 06:12:03,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.40 | bwd_microstep: 766.53 | bwd_inner_microstep: 766.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 06:12:05,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 1529.69 | bwd_inner_microstep: 1529.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671 [2024-06-10 06:12:07,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1551.32 | bwd_inner_microstep: 1551.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3422 [2024-06-10 06:12:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.74 | bwd_microstep: 1214.84 | bwd_inner_microstep: 1214.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 06:12:11,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1391.23 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645 [2024-06-10 06:12:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1574.58 | bwd_inner_microstep: 1574.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1940 [2024-06-10 06:12:15,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.91 | bwd_microstep: 888.63 | bwd_inner_microstep: 888.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3622 [2024-06-10 06:12:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1435.53 | bwd_inner_microstep: 1435.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3836 [2024-06-10 06:12:19,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.66 | bwd_microstep: 1522.34 | bwd_inner_microstep: 1522.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543 [2024-06-10 06:12:21,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1441.75 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 06:12:23,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.44 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844 [2024-06-10 06:12:25,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1468.18 | bwd_inner_microstep: 1468.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 06:12:27,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1354.04 | bwd_inner_microstep: 1354.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 06:12:29,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 06:12:30,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1280.34 | bwd_inner_microstep: 1280.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 06:12:32,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1392.39 | bwd_inner_microstep: 1392.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3444 [2024-06-10 06:12:34,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1401.74 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 06:12:36,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 06:12:38,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1509.48 | bwd_inner_microstep: 1509.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 06:12:40,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1396.84 | bwd_inner_microstep: 1396.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182 [2024-06-10 06:12:41,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.00 | bwd_microstep: 858.72 | bwd_inner_microstep: 858.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377 [2024-06-10 06:12:43,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.02 | bwd_microstep: 1433.64 | bwd_inner_microstep: 1433.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 06:12:49,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 06:12:49,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 5120.39 | bwd_inner_microstep: 1679.86 | bwd_allreduce_microstep: 3440.47 | step_microstep: 38.86 [2024-06-10 06:12:49,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16358.25 | bwd: 47202.86 | bwd_inner: 43761.40 | bwd_allreduce: 3440.74 | step: 40.50 {'loss': 1.339, 'learning_rate': 3.7541300801722715e-05, 'epoch': 0.18} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 06:12:51,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4044 [2024-06-10 06:12:53,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.57 | bwd_microstep: 1550.58 | bwd_inner_microstep: 1550.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4033 [2024-06-10 06:12:55,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.79 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2249 [2024-06-10 06:12:56,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.77 | bwd_microstep: 869.97 | bwd_inner_microstep: 869.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 06:12:58,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.04 | bwd_microstep: 1445.01 | bwd_inner_microstep: 1444.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 06:13:00,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734 [2024-06-10 06:13:02,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1532.20 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4067 [2024-06-10 06:13:04,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.56 | bwd_microstep: 1622.30 | bwd_inner_microstep: 1622.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963 [2024-06-10 06:13:05,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.20 | bwd_microstep: 702.79 | bwd_inner_microstep: 702.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-10 06:13:06,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.05 | bwd_microstep: 709.41 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975 [2024-06-10 06:13:07,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.96 | bwd_microstep: 706.33 | bwd_inner_microstep: 706.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 06:13:09,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.81 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 06:13:11,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 06:13:13,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.48 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 06:13:15,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 06:13:17,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3670 [2024-06-10 06:13:19,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.12 | bwd_microstep: 1673.97 | bwd_inner_microstep: 1673.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2175 [2024-06-10 06:13:20,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.75 | bwd_microstep: 889.53 | bwd_inner_microstep: 889.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3575 [2024-06-10 06:13:22,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.53 | bwd_microstep: 1206.99 | bwd_inner_microstep: 1206.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-10 06:13:24,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 06:13:25,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3681 [2024-06-10 06:13:27,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1358.47 | bwd_inner_microstep: 1358.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 06:13:29,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-10 06:13:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.66 | bwd_microstep: 1358.51 | bwd_inner_microstep: 1358.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-10 06:13:32,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.85 | bwd_microstep: 809.85 | bwd_inner_microstep: 809.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 06:13:34,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.36 | bwd_microstep: 1290.58 | bwd_inner_microstep: 1290.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 06:13:36,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771 [2024-06-10 06:13:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.97 | bwd_microstep: 1347.66 | bwd_inner_microstep: 1347.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475 [2024-06-10 06:13:40,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1313.72 | bwd_inner_microstep: 1313.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 06:13:42,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.45 | bwd_microstep: 1380.94 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2800 [2024-06-10 06:13:43,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.03 | bwd_microstep: 1087.56 | bwd_inner_microstep: 1087.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-10 06:13:50,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.26 | optimizer_step: 6.59 [2024-06-10 06:13:50,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 6019.42 | bwd_inner_microstep: 1741.13 | bwd_allreduce_microstep: 4278.24 | step_microstep: 38.75 [2024-06-10 06:13:50,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.04 | bwd: 45188.52 | bwd_inner: 40909.37 | bwd_allreduce: 4278.47 | step: 40.37 {'loss': 1.318, 'learning_rate': 3.752323967092853e-05, 'epoch': 0.19} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 06:13:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.49 | bwd_microstep: 1371.32 | bwd_inner_microstep: 1371.19 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3561 [2024-06-10 06:13:54,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1358.06 | bwd_inner_microstep: 1358.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2373 [2024-06-10 06:13:55,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.48 | bwd_microstep: 838.22 | bwd_inner_microstep: 838.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-10 06:13:57,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1316.93 | bwd_inner_microstep: 1316.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 06:13:59,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1446.24 | bwd_inner_microstep: 1446.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2435 [2024-06-10 06:14:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.34 | bwd_microstep: 946.71 | bwd_inner_microstep: 946.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 06:14:02,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.93 | bwd_microstep: 1455.93 | bwd_inner_microstep: 1455.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 06:14:04,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1345.76 | bwd_inner_microstep: 1345.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4071 [2024-06-10 06:14:06,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.71 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085 [2024-06-10 06:14:07,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.42 | bwd_microstep: 819.28 | bwd_inner_microstep: 819.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684 [2024-06-10 06:14:09,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.44 | bwd_microstep: 1533.40 | bwd_inner_microstep: 1533.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3655 [2024-06-10 06:14:11,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1483.58 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514 [2024-06-10 06:14:13,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1350.16 | bwd_inner_microstep: 1350.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3534 [2024-06-10 06:14:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1455.76 | bwd_inner_microstep: 1455.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 06:14:17,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3540 [2024-06-10 06:14:19,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1587.39 | bwd_inner_microstep: 1587.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478 [2024-06-10 06:14:21,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.85 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3423 [2024-06-10 06:14:23,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1469.04 | bwd_inner_microstep: 1469.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3429 [2024-06-10 06:14:25,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.08 | bwd_microstep: 1216.36 | bwd_inner_microstep: 1216.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3847 [2024-06-10 06:14:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-10 06:14:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.75 | bwd_microstep: 1186.72 | bwd_inner_microstep: 1186.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 06:14:30,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.73 | bwd_microstep: 1400.78 | bwd_inner_microstep: 1400.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-10 06:14:32,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1518.09 | bwd_inner_microstep: 1518.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 06:14:34,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1512.76 | bwd_inner_microstep: 1512.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 06:14:36,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1414.83 | bwd_inner_microstep: 1414.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 06:14:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1255.98 | bwd_inner_microstep: 1255.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 06:14:40,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1554.29 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 06:14:42,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.44 | bwd_microstep: 1160.33 | bwd_inner_microstep: 1160.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649 [2024-06-10 06:14:44,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1519.79 | bwd_inner_microstep: 1519.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 06:14:46,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1456.62 | bwd_inner_microstep: 1456.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3476 [2024-06-10 06:14:48,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1330.44 | bwd_inner_microstep: 1330.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 06:14:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.27 | optimizer_step: 6.56 [2024-06-10 06:14:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.99 | bwd_microstep: 4819.30 | bwd_inner_microstep: 1691.40 | bwd_allreduce_microstep: 3127.84 | step_microstep: 38.56 [2024-06-10 06:14:53,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.80 | bwd: 46695.15 | bwd_inner: 43566.27 | bwd_allreduce: 3128.13 | step: 40.24 {'loss': 1.3217, 'learning_rate': 3.750511682337531e-05, 'epoch': 0.19} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2492 [2024-06-10 06:14:54,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.58 | bwd_microstep: 917.75 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3964 [2024-06-10 06:14:56,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1401.09 | bwd_inner_microstep: 1401.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3846 [2024-06-10 06:14:59,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.02 | bwd_microstep: 1557.87 | bwd_inner_microstep: 1557.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-10 06:15:01,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1546.73 | bwd_inner_microstep: 1546.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 06:15:02,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.28 | bwd_microstep: 1285.92 | bwd_inner_microstep: 1285.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769 [2024-06-10 06:15:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.53 | bwd_microstep: 1643.28 | bwd_inner_microstep: 1643.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709 [2024-06-10 06:15:07,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.99 | bwd_microstep: 1628.41 | bwd_inner_microstep: 1628.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2498 [2024-06-10 06:15:08,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.93 | bwd_microstep: 1027.29 | bwd_inner_microstep: 1027.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687 [2024-06-10 06:15:10,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.11 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445 [2024-06-10 06:15:12,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.41 | bwd_microstep: 1219.95 | bwd_inner_microstep: 1219.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2086 [2024-06-10 06:15:13,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.49 | bwd_microstep: 1015.42 | bwd_inner_microstep: 1015.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 06:15:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1386.23 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452 [2024-06-10 06:15:17,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1447.69 | bwd_inner_microstep: 1447.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 06:15:19,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1500.98 | bwd_inner_microstep: 1500.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3425 [2024-06-10 06:15:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.53 | bwd_microstep: 1542.60 | bwd_inner_microstep: 1542.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2086 [2024-06-10 06:15:23,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.23 | bwd_microstep: 919.16 | bwd_inner_microstep: 919.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-10 06:15:25,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.42 | bwd_microstep: 1530.31 | bwd_inner_microstep: 1530.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 06:15:27,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1358.95 | bwd_inner_microstep: 1358.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 06:15:29,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1361.15 | bwd_inner_microstep: 1361.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3896 [2024-06-10 06:15:31,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.49 | bwd_microstep: 1692.52 | bwd_inner_microstep: 1692.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3777 [2024-06-10 06:15:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.39 | bwd_microstep: 1715.88 | bwd_inner_microstep: 1715.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 06:15:35,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.53 | bwd_microstep: 1409.95 | bwd_inner_microstep: 1409.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 06:15:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1513.08 | bwd_inner_microstep: 1513.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430 [2024-06-10 06:15:39,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 06:15:41,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.60 | bwd_inner_microstep: 1378.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747 [2024-06-10 06:15:43,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707 [2024-06-10 06:15:45,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.46 | bwd_microstep: 1730.11 | bwd_inner_microstep: 1730.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 06:15:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1507.26 | bwd_inner_microstep: 1507.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 06:15:50,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.15 | bwd_microstep: 1531.27 | bwd_inner_microstep: 1531.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 06:15:52,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.75 | bwd_microstep: 1425.91 | bwd_inner_microstep: 1425.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 06:15:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 06:15:56,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.35 | optimizer_gradients: 4.16 | optimizer_step: 6.58 [2024-06-10 06:15:56,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1471.54 | bwd_inner_microstep: 1338.10 | bwd_allreduce_microstep: 133.40 | step_microstep: 38.96 [2024-06-10 06:15:56,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16826.35 | bwd: 45234.79 | bwd_inner: 45100.39 | bwd_allreduce: 133.67 | step: 40.64 �▊ | 316/1726 [5:32:17<23:48:18, 60.78s/it] 18%|█▊ | 317/1726 [5:33:19<23:52:58, 61.02s/it] 18%|█▊ | 317/1726 [5:33:19<23:52:58, 61.02s/it] 18%|█▊ | 318/1726 [5:34:22<24:04:10, 61.54s/it] 18%|█▊ | 318/1726 [5:34:22<24:04:10, 61.54s/it] 18%|█▊ | 319/1726 [5:35:26<24:19:52, 62.25s/it] 18%|█▊ | 319/1726 [5:35:26<24:19:52, 62.25s/it] 19%|█▊ | 320/1726 [5:36:27<24:09:07, 61.84s/it] 19%|█▊ | 320/1726 [5:36:27<24:09:07, 61.84s/it] 19%|█▊ | 321/1726 [5:37:30<24:18:53, 62.30s/it] 19%|█▊ | 321/1726 [5:37:30<24:18:53, 62.30s/it] 19%|█▊ | 322/1726 [5:38:32<24:18:44, 62.34s/it] {'loss': 1.3029, 'learning_rate': 3.7486932322891646e-05, 'epoch': 0.19} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 06:15:57,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.68 | bwd_microstep: 792.03 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165 [2024-06-10 06:15:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1651.48 | bwd_inner_microstep: 1651.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 06:16:01,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 06:16:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.97 | bwd_microstep: 1357.28 | bwd_inner_microstep: 1357.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 06:16:05,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1285.33 | bwd_inner_microstep: 1285.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 06:16:06,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 06:16:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 06:16:10,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.26 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 06:16:12,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1376.23 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 06:16:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.06 | bwd_microstep: 1279.65 | bwd_inner_microstep: 1279.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070 [2024-06-10 06:16:15,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.74 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 06:16:17,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.76 | bwd_microstep: 1346.89 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499 [2024-06-10 06:16:19,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1412.68 | bwd_inner_microstep: 1412.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 06:16:20,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.45 | bwd_microstep: 895.33 | bwd_inner_microstep: 895.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3648 [2024-06-10 06:16:22,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.37 | bwd_microstep: 1650.21 | bwd_inner_microstep: 1650.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-10 06:16:24,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.38 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 06:16:26,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1286.69 | bwd_inner_microstep: 1286.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 06:16:28,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1558.24 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823 [2024-06-10 06:16:30,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.99 | bwd_microstep: 1518.58 | bwd_inner_microstep: 1518.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 06:16:32,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1259.90 | bwd_inner_microstep: 1259.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091 [2024-06-10 06:16:33,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.82 | bwd_microstep: 921.15 | bwd_inner_microstep: 921.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3720 [2024-06-10 06:16:35,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1498.15 | bwd_inner_microstep: 1498.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822 [2024-06-10 06:16:38,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1688.91 | bwd_inner_microstep: 1688.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1120 [2024-06-10 06:16:38,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 181.30 | bwd_microstep: 473.38 | bwd_inner_microstep: 473.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 06:16:40,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1404.77 | bwd_inner_microstep: 1404.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465 [2024-06-10 06:16:42,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.93 | bwd_microstep: 1186.34 | bwd_inner_microstep: 1186.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 06:16:44,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3448 [2024-06-10 06:16:46,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.95 | bwd_microstep: 1219.96 | bwd_inner_microstep: 1219.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813 [2024-06-10 06:16:48,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.75 | bwd_microstep: 1619.77 | bwd_inner_microstep: 1619.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 06:16:50,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1402.71 | bwd_inner_microstep: 1402.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 06:16:52,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.10 | bwd_microstep: 1718.93 | bwd_inner_microstep: 1718.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 06:16:58,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.39 | optimizer_step: 6.58 [2024-06-10 06:16:58,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.16 | bwd_microstep: 5115.49 | bwd_inner_microstep: 1698.78 | bwd_allreduce_microstep: 3416.63 | step_microstep: 39.61 [2024-06-10 06:16:58,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15893.86 | bwd: 45970.50 | bwd_inner: 42552.83 | bwd_allreduce: 3416.92 | step: 41.24 {'loss': 1.3184, 'learning_rate': 3.746868623352325e-05, 'epoch': 0.19} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 06:17:00,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.46 | bwd_microstep: 1471.96 | bwd_inner_microstep: 1471.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907 [2024-06-10 06:17:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.25 | bwd_microstep: 1483.81 | bwd_inner_microstep: 1483.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 06:17:04,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1483.03 | bwd_inner_microstep: 1483.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 06:17:06,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1902 [2024-06-10 06:17:07,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.31 | bwd_microstep: 715.56 | bwd_inner_microstep: 715.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 06:17:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1281.96 | bwd_inner_microstep: 1281.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588 [2024-06-10 06:17:10,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1307.89 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-10 06:17:12,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.59 | bwd_microstep: 1157.03 | bwd_inner_microstep: 1157.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-10 06:17:14,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1317.12 | bwd_inner_microstep: 1317.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666 [2024-06-10 06:17:16,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1446.83 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 06:17:18,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 06:17:20,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1353.40 | bwd_inner_microstep: 1353.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 06:17:21,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.09 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836 [2024-06-10 06:17:23,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.31 | bwd_microstep: 1752.87 | bwd_inner_microstep: 1752.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421 [2024-06-10 06:17:25,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.37 | bwd_microstep: 1210.86 | bwd_inner_microstep: 1210.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 06:17:27,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1490.07 | bwd_inner_microstep: 1490.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-10 06:17:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1429.23 | bwd_inner_microstep: 1429.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-10 06:17:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1422.04 | bwd_inner_microstep: 1422.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 06:17:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.01 | bwd_microstep: 1482.96 | bwd_inner_microstep: 1482.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 06:17:35,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.27 | bwd_microstep: 1373.06 | bwd_inner_microstep: 1373.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3598 [2024-06-10 06:17:37,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.87 | bwd_microstep: 1568.72 | bwd_inner_microstep: 1568.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-10 06:17:39,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1294.82 | bwd_inner_microstep: 1294.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 06:17:41,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1405.70 | bwd_inner_microstep: 1405.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469 [2024-06-10 06:17:42,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1313.14 | bwd_inner_microstep: 1313.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477 [2024-06-10 06:17:44,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.21 | bwd_microstep: 1346.16 | bwd_inner_microstep: 1346.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 06:17:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1458.42 | bwd_inner_microstep: 1458.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554 [2024-06-10 06:17:48,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.86 | bwd_microstep: 1335.03 | bwd_inner_microstep: 1335.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 06:17:50,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.45 | bwd_microstep: 1194.60 | bwd_inner_microstep: 1194.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-10 06:17:51,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 802.68 | bwd_inner_microstep: 802.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869 [2024-06-10 06:17:53,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.97 | bwd_microstep: 1667.88 | bwd_inner_microstep: 1667.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 06:17:55,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.99 | bwd_microstep: 1289.45 | bwd_inner_microstep: 1289.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 06:17:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.31 | optimizer_step: 6.59 [2024-06-10 06:17:59,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 3389.42 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 1822.43 | step_microstep: 38.98 [2024-06-10 06:17:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16091.83 | bwd: 44797.82 | bwd_inner: 42974.46 | bwd_allreduce: 1822.65 | step: 40.59 {'loss': 1.3123, 'learning_rate': 3.745037861953274e-05, 'epoch': 0.19} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 06:18:01,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.23 | bwd_microstep: 1332.10 | bwd_inner_microstep: 1331.93 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903 [2024-06-10 06:18:03,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-10 06:18:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.88 | bwd_microstep: 1309.53 | bwd_inner_microstep: 1309.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 06:18:07,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1348.16 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2269 [2024-06-10 06:18:08,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.24 | bwd_microstep: 904.74 | bwd_inner_microstep: 904.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474 [2024-06-10 06:18:10,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.96 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1869 [2024-06-10 06:18:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.10 | bwd_microstep: 744.52 | bwd_inner_microstep: 744.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 06:18:13,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 06:18:15,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1398.00 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 06:18:16,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.60 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 06:18:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.37 | bwd_microstep: 703.97 | bwd_inner_microstep: 703.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 06:18:19,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1401.78 | bwd_inner_microstep: 1401.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 06:18:20,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.68 | bwd_microstep: 715.53 | bwd_inner_microstep: 715.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 06:18:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.04 | bwd_microstep: 893.54 | bwd_inner_microstep: 893.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2762 [2024-06-10 06:18:23,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.55 | bwd_microstep: 1238.74 | bwd_inner_microstep: 1238.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158 [2024-06-10 06:18:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 949.22 | bwd_inner_microstep: 949.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515 [2024-06-10 06:18:27,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1421.26 | bwd_inner_microstep: 1421.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699 [2024-06-10 06:18:28,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1332.56 | bwd_inner_microstep: 1332.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 06:18:30,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1410.55 | bwd_inner_microstep: 1410.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 06:18:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 06:18:34,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.52 | bwd_microstep: 1254.68 | bwd_inner_microstep: 1254.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461 [2024-06-10 06:18:36,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.26 | bwd_microstep: 1443.54 | bwd_inner_microstep: 1443.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 06:18:38,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1398.99 | bwd_inner_microstep: 1398.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 06:18:40,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 06:18:42,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.87 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 06:18:44,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.65 | bwd_microstep: 1637.30 | bwd_inner_microstep: 1637.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 06:18:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.20 | bwd_microstep: 1282.64 | bwd_inner_microstep: 1282.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-10 06:18:48,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1341.08 | bwd_inner_microstep: 1341.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-10 06:18:50,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1511.23 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 06:18:52,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3424 [2024-06-10 06:18:53,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1298.93 | bwd_inner_microstep: 1298.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577 [2024-06-10 06:19:00,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.38 | optimizer_step: 6.58 [2024-06-10 06:19:00,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 6102.93 | bwd_inner_microstep: 1530.22 | bwd_allreduce_microstep: 4572.64 | step_microstep: 39.73 [2024-06-10 06:19:00,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15325.66 | bwd: 45412.93 | bwd_inner: 40839.21 | bwd_allreduce: 4572.95 | step: 41.37 {'loss': 1.3225, 'learning_rate': 3.743200954539945e-05, 'epoch': 0.19} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-10 06:19:02,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.10 | bwd_microstep: 1404.55 | bwd_inner_microstep: 1404.38 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4117 [2024-06-10 06:19:04,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.85 | bwd_microstep: 1734.09 | bwd_inner_microstep: 1734.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 06:19:07,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.71 | bwd_microstep: 1479.98 | bwd_inner_microstep: 1479.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2308 [2024-06-10 06:19:08,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.27 | bwd_microstep: 790.64 | bwd_inner_microstep: 790.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 06:19:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1382.26 | bwd_inner_microstep: 1382.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 06:19:11,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 06:19:13,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 06:19:15,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1248.23 | bwd_inner_microstep: 1248.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 06:19:17,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 06:19:19,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1310.53 | bwd_inner_microstep: 1310.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 06:19:20,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.87 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-10 06:19:23,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.62 | bwd_microstep: 1616.24 | bwd_inner_microstep: 1616.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657 [2024-06-10 06:19:25,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1474.44 | bwd_inner_microstep: 1474.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-10 06:19:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1455.89 | bwd_inner_microstep: 1455.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512 [2024-06-10 06:19:29,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1418.74 | bwd_inner_microstep: 1418.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 06:19:30,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1389.04 | bwd_inner_microstep: 1389.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 06:19:32,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1419.13 | bwd_inner_microstep: 1419.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 06:19:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1277.72 | bwd_inner_microstep: 1277.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 06:19:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1494.10 | bwd_inner_microstep: 1494.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-10 06:19:38,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1348.64 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-10 06:19:40,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.55 | bwd_microstep: 1326.17 | bwd_inner_microstep: 1326.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3480 [2024-06-10 06:19:42,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1429.22 | bwd_inner_microstep: 1429.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 06:19:44,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1408.49 | bwd_inner_microstep: 1408.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-10 06:19:46,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.52 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2285 [2024-06-10 06:19:47,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.76 | bwd_microstep: 1072.24 | bwd_inner_microstep: 1072.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3814 [2024-06-10 06:19:50,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.73 | bwd_microstep: 1506.46 | bwd_inner_microstep: 1506.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 06:19:51,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1416.66 | bwd_inner_microstep: 1416.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452 [2024-06-10 06:19:53,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1384.81 | bwd_inner_microstep: 1384.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 06:19:55,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191 [2024-06-10 06:19:57,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.47 | bwd_microstep: 956.45 | bwd_inner_microstep: 956.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3570 [2024-06-10 06:19:59,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.63 | bwd_microstep: 1478.11 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572 [2024-06-10 06:20:02,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 06:20:02,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 2447.17 | bwd_inner_microstep: 1734.29 | bwd_allreduce_microstep: 712.83 | step_microstep: 38.69 [2024-06-10 06:20:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16460.14 | bwd: 44720.62 | bwd_inner: 44006.73 | bwd_allreduce: 713.14 | step: 40.39 {'loss': 1.2968, 'learning_rate': 3.7413579075819166e-05, 'epoch': 0.19} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522 [2024-06-10 06:20:04,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.16 | bwd_microstep: 1322.29 | bwd_inner_microstep: 1322.20 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 06:20:05,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 799.49 | bwd_inner_microstep: 799.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3995 [2024-06-10 06:20:07,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.22 | bwd_microstep: 1656.19 | bwd_inner_microstep: 1656.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4289 [2024-06-10 06:20:09,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.13 | bwd_microstep: 1625.53 | bwd_inner_microstep: 1625.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 06:20:11,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1488.93 | bwd_inner_microstep: 1488.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058 [2024-06-10 06:20:12,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.11 | bwd_microstep: 815.72 | bwd_inner_microstep: 815.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 06:20:14,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2071 [2024-06-10 06:20:15,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.02 | bwd_microstep: 881.56 | bwd_inner_microstep: 881.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 06:20:16,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 796.09 | bwd_inner_microstep: 796.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 06:20:19,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1533.48 | bwd_inner_microstep: 1533.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 06:20:20,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 06:20:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.26 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995 [2024-06-10 06:20:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.95 | bwd_microstep: 896.09 | bwd_inner_microstep: 896.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947 [2024-06-10 06:20:25,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.51 | bwd_microstep: 824.12 | bwd_inner_microstep: 824.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105 [2024-06-10 06:20:26,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.77 | bwd_microstep: 921.85 | bwd_inner_microstep: 921.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 06:20:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.07 | bwd_microstep: 1523.12 | bwd_inner_microstep: 1523.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3624 [2024-06-10 06:20:30,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.96 | bwd_microstep: 1277.58 | bwd_inner_microstep: 1277.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-10 06:20:32,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.37 | bwd_microstep: 1517.94 | bwd_inner_microstep: 1517.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 06:20:34,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 06:20:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1396.77 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003 [2024-06-10 06:20:37,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.60 | bwd_microstep: 900.26 | bwd_inner_microstep: 900.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 06:20:39,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1475.13 | bwd_inner_microstep: 1475.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 06:20:41,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1257.77 | bwd_inner_microstep: 1257.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 06:20:42,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.91 | bwd_microstep: 805.98 | bwd_inner_microstep: 805.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462 [2024-06-10 06:20:44,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.78 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 06:20:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1183.97 | bwd_inner_microstep: 1183.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 06:20:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.59 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 06:20:49,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.60 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441 [2024-06-10 06:20:51,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 06:20:53,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1506.11 | bwd_inner_microstep: 1506.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743 [2024-06-10 06:20:56,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.23 | bwd_microstep: 1737.89 | bwd_inner_microstep: 1737.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-10 06:21:03,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.26 | optimizer_step: 6.58 [2024-06-10 06:21:03,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.32 | bwd_microstep: 7021.06 | bwd_inner_microstep: 1753.42 | bwd_allreduce_microstep: 5267.59 | step_microstep: 39.01 [2024-06-10 06:21:03,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15209.54 | bwd: 45980.10 | bwd_inner: 40711.52 | bwd_allreduce: 5267.86 | step: 40.78 {'loss': 1.3274, 'learning_rate': 3.73950872757039e-05, 'epoch': 0.19} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-10 06:21:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1439.26 | bwd_inner_microstep: 1439.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 06:21:07,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1505.50 | bwd_inner_microstep: 1505.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4281 [2024-06-10 06:21:10,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.17 | bwd_microstep: 1769.12 | bwd_inner_microstep: 1769.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834 [2024-06-10 06:21:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1654.91 | bwd_inner_microstep: 1654.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 06:21:14,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.33 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 06:21:16,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1284.01 | bwd_inner_microstep: 1283.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 06:21:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1384.90 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.72 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 06:21:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.48 | bwd_microstep: 1532.43 | bwd_inner_microstep: 1532.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 06:21:22,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.50 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 06:21:24,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1284.20 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 06:21:25,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-10 06:21:27,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1413.10 | bwd_inner_microstep: 1413.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 06:21:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.24 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 06:21:31,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.86 | bwd_microstep: 1262.65 | bwd_inner_microstep: 1262.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 06:21:33,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1481.70 | bwd_inner_microstep: 1481.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 06:21:35,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3571 [2024-06-10 06:21:37,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1461.00 | bwd_inner_microstep: 1460.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-10 06:21:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1485.47 | bwd_inner_microstep: 1485.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 06:21:41,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1391.59 | bwd_inner_microstep: 1391.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529 [2024-06-10 06:21:43,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1324.96 | bwd_inner_microstep: 1324.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932 [2024-06-10 06:21:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.02 | bwd_microstep: 700.12 | bwd_inner_microstep: 700.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2403 [2024-06-10 06:21:45,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.30 | bwd_microstep: 843.17 | bwd_inner_microstep: 843.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 06:21:47,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1523.48 | bwd_inner_microstep: 1523.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 06:21:49,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1502.35 | bwd_inner_microstep: 1502.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2116 [2024-06-10 06:21:51,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.05 | bwd_microstep: 972.79 | bwd_inner_microstep: 972.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068 [2024-06-10 06:21:52,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.75 | bwd_microstep: 915.30 | bwd_inner_microstep: 915.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 06:21:54,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1396.28 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 06:21:56,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1420.18 | bwd_inner_microstep: 1420.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3755 [2024-06-10 06:21:58,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-10 06:22:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.56 | bwd_microstep: 1497.65 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 06:22:02,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1416.39 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3730 [2024-06-10 06:22:04,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.65 [2024-06-10 06:22:04,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.45 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 7.61 | step_microstep: 38.25 [2024-06-10 06:22:04,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16441.96 | bwd: 43955.78 | bwd_inner: 43947.20 | bwd_allreduce: 7.88 | step: 41.64 19%|█▊ | 322/1726 [5:38:32<24:18:44, 62.34s/it] 19%|█▊ | 323/1726 [5:39:35<24:16:47, 62.30s/it] 19%|█▊ | 323/1726 [5:39:35<24:16:47, 62.30s/it] 19%|█▉ | 324/1726 [5:40:36<24:08:17, 61.98s/it] 19%|█▉ | 324/1726 [5:40:36<24:08:17, 61.98s/it] 19%|█▉ | 325/1726 [5:41:37<24:01:00, 61.71s/it] 19%|█▉ | 325/1726 [5:41:37<24:01:00, 61.71s/it] 19%|█▉ | 326/1726 [5:42:38<23:58:46, 61.66s/it] 19%|█▉ | 326/1726 [5:42:38<23:58:46, 61.66s/it] 19%|█▉ | 327/1726 [5:43:40<23:56:56, 61.63s/it] 19%|█▉ | 327/1726 [5:43:40<23:56:56, 61.63s/it] 19%|█▉ | 328/1726 [5:44:41<23:49:52,{'loss': 1.3188, 'learning_rate': 3.737653421018168e-05, 'epoch': 0.19} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 06:22:06,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.57 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 06:22:08,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.94 | bwd_microstep: 1251.50 | bwd_inner_microstep: 1251.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841 [2024-06-10 06:22:10,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.88 | bwd_microstep: 1658.46 | bwd_inner_microstep: 1658.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 06:22:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.23 | bwd_microstep: 1278.72 | bwd_inner_microstep: 1278.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 06:22:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.58 | bwd_microstep: 1540.09 | bwd_inner_microstep: 1540.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 06:22:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.24 | bwd_microstep: 796.62 | bwd_inner_microstep: 796.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 06:22:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1550.19 | bwd_inner_microstep: 1550.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 06:22:19,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.89 | bwd_microstep: 1552.59 | bwd_inner_microstep: 1552.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 06:22:21,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.66 | bwd_microstep: 1148.86 | bwd_inner_microstep: 1148.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 06:22:23,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1289.69 | bwd_inner_microstep: 1289.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174 [2024-06-10 06:22:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.45 | bwd_microstep: 884.96 | bwd_inner_microstep: 884.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 06:22:26,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.29 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 06:22:27,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.75 | bwd_microstep: 788.30 | bwd_inner_microstep: 788.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-10 06:22:28,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.17 | bwd_microstep: 895.15 | bwd_inner_microstep: 895.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 06:22:30,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1476.34 | bwd_inner_microstep: 1476.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3658 [2024-06-10 06:22:32,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.50 | bwd_microstep: 1562.77 | bwd_inner_microstep: 1562.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-10 06:22:34,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.57 | bwd_microstep: 1617.67 | bwd_inner_microstep: 1617.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 06:22:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.10 | bwd_microstep: 1514.41 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2169 [2024-06-10 06:22:38,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.59 | bwd_microstep: 853.81 | bwd_inner_microstep: 853.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 06:22:39,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 801.09 | bwd_inner_microstep: 801.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-10 06:22:41,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1637.96 | bwd_inner_microstep: 1637.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3838 [2024-06-10 06:22:43,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1266.49 | bwd_inner_microstep: 1266.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465 [2024-06-10 06:22:45,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1216.26 | bwd_inner_microstep: 1216.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 06:22:47,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.62 | bwd_microstep: 1403.90 | bwd_inner_microstep: 1403.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 06:22:48,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.81 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284 [2024-06-10 06:22:50,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.22 | bwd_microstep: 910.23 | bwd_inner_microstep: 910.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 06:22:52,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1398.66 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 06:22:53,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1296.83 | bwd_inner_microstep: 1296.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 06:22:55,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1257.24 | bwd_inner_microstep: 1257.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2269 [2024-06-10 06:22:56,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.50 | bwd_microstep: 842.04 | bwd_inner_microstep: 842.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3582 [2024-06-10 06:22:59,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.68 | bwd_microstep: 1697.27 | bwd_inner_microstep: 1697.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 06:23:04,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.31 | optimizer_step: 6.62 [2024-06-10 06:23:04,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.62 | bwd_microstep: 5266.21 | bwd_inner_microstep: 1585.47 | bwd_allreduce_microstep: 3680.67 | step_microstep: 39.46 [2024-06-10 06:23:04,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15368.93 | bwd: 44721.96 | bwd_inner: 41040.29 | bwd_allreduce: 3680.96 | step: 41.16 {'loss': 1.3459, 'learning_rate': 3.7357919944596305e-05, 'epoch': 0.19} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 06:23:06,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.51 | bwd_microstep: 1391.79 | bwd_inner_microstep: 1391.71 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4321 [2024-06-10 06:23:09,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.61 | bwd_microstep: 1699.35 | bwd_inner_microstep: 1699.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2404 [2024-06-10 06:23:10,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.20 | bwd_microstep: 1002.55 | bwd_inner_microstep: 1002.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-10 06:23:11,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.06 | bwd_microstep: 871.48 | bwd_inner_microstep: 871.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 06:23:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1245.23 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 06:23:15,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1247.16 | bwd_inner_microstep: 1247.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426 [2024-06-10 06:23:16,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.20 | bwd_microstep: 1185.72 | bwd_inner_microstep: 1185.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3721 [2024-06-10 06:23:18,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1268.11 | bwd_inner_microstep: 1268.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 06:23:20,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 06:23:22,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.04 | bwd_microstep: 1484.47 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 06:23:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1480.93 | bwd_inner_microstep: 1480.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 06:23:26,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1982 [2024-06-10 06:23:27,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.37 | bwd_microstep: 856.93 | bwd_inner_microstep: 856.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 06:23:29,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1488.75 | bwd_inner_microstep: 1488.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 06:23:31,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1374.74 | bwd_inner_microstep: 1374.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3890 [2024-06-10 06:23:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.01 | bwd_microstep: 1890.63 | bwd_inner_microstep: 1890.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 06:23:36,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1276.84 | bwd_inner_microstep: 1276.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3393 [2024-06-10 06:23:38,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1437.46 | bwd_inner_microstep: 1437.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 06:23:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.21 | bwd_microstep: 1503.39 | bwd_inner_microstep: 1503.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534 [2024-06-10 06:23:42,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.71 | bwd_microstep: 1589.31 | bwd_inner_microstep: 1589.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442 [2024-06-10 06:23:44,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1411.01 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 06:23:45,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 972.73 | bwd_inner_microstep: 972.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3444 [2024-06-10 06:23:47,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.56 | bwd_microstep: 1189.46 | bwd_inner_microstep: 1189.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 06:23:49,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1463.93 | bwd_inner_microstep: 1463.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 06:23:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1262.53 | bwd_inner_microstep: 1262.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 06:23:52,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.64 | bwd_microstep: 1183.65 | bwd_inner_microstep: 1183.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 06:23:54,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1255.50 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 06:23:56,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1406.08 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585 [2024-06-10 06:23:58,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1606.59 | bwd_inner_microstep: 1606.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 06:24:00,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1499.76 | bwd_inner_microstep: 1499.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 06:24:02,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588 [2024-06-10 06:24:05,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 06:24:05,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 2044.71 | bwd_inner_microstep: 1717.96 | bwd_allreduce_microstep: 326.70 | step_microstep: 38.43 [2024-06-10 06:24:05,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16247.56 | bwd: 43811.63 | bwd_inner: 43483.96 | bwd_allreduce: 326.96 | step: 40.05 {'loss': 1.2818, 'learning_rate': 3.733924454450711e-05, 'epoch': 0.19} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 06:24:07,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1476.72 | bwd_inner_microstep: 1476.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 06:24:09,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1313.11 | bwd_inner_microstep: 1313.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 06:24:10,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 06:24:12,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1356.75 | bwd_inner_microstep: 1356.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795 [2024-06-10 06:24:14,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.38 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 06:24:16,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 06:24:18,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.57 | bwd_microstep: 1277.46 | bwd_inner_microstep: 1277.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 06:24:20,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.71 | bwd_inner_microstep: 1253.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096 [2024-06-10 06:24:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 822.80 | bwd_inner_microstep: 822.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469 [2024-06-10 06:24:22,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.20 | bwd_microstep: 1215.47 | bwd_inner_microstep: 1215.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 06:24:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1292.63 | bwd_inner_microstep: 1292.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 06:24:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.21 | bwd_microstep: 730.27 | bwd_inner_microstep: 730.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297 [2024-06-10 06:24:26,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.97 | bwd_microstep: 880.90 | bwd_inner_microstep: 880.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3496 [2024-06-10 06:24:29,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1532.82 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 06:24:31,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1513.37 | bwd_inner_microstep: 1513.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3609 [2024-06-10 06:24:33,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.84 | bwd_microstep: 1672.80 | bwd_inner_microstep: 1672.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435 [2024-06-10 06:24:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.50 | bwd_microstep: 1218.17 | bwd_inner_microstep: 1218.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490 [2024-06-10 06:24:36,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.38 | bwd_microstep: 1320.49 | bwd_inner_microstep: 1320.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 06:24:38,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 792.85 | bwd_inner_microstep: 792.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 06:24:40,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1561.20 | bwd_inner_microstep: 1561.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 06:24:42,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1401.06 | bwd_inner_microstep: 1401.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 06:24:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1415.14 | bwd_inner_microstep: 1415.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 06:24:46,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.54 | bwd_microstep: 1491.81 | bwd_inner_microstep: 1491.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 06:24:47,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.75 | bwd_inner_microstep: 1279.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 06:24:49,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.64 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 06:24:51,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3744 [2024-06-10 06:24:53,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.10 | bwd_microstep: 1598.12 | bwd_inner_microstep: 1598.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 06:24:55,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-10 06:24:56,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 06:24:58,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1395.73 | bwd_inner_microstep: 1395.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 06:25:00,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1348.79 | bwd_inner_microstep: 1348.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3731 [2024-06-10 06:25:05,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 06:25:05,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.86 | bwd_microstep: 3991.06 | bwd_inner_microstep: 1894.82 | bwd_allreduce_microstep: 2096.18 | step_microstep: 38.87 [2024-06-10 06:25:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15633.80 | bwd: 43894.36 | bwd_inner: 41797.23 | bwd_allreduce: 2096.41 | step: 40.47 {'loss': 1.2829, 'learning_rate': 3.732050807568878e-05, 'epoch': 0.19} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 06:25:06,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1245.57 | bwd_inner_microstep: 1245.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4002 [2024-06-10 06:25:08,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1410.19 | bwd_inner_microstep: 1410.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 06:25:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1380.58 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842 [2024-06-10 06:25:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1463.50 | bwd_inner_microstep: 1463.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 06:25:14,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1281.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 06:25:16,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.13 | bwd_microstep: 1534.44 | bwd_inner_microstep: 1534.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753 [2024-06-10 06:25:18,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.65 | bwd_microstep: 1469.73 | bwd_inner_microstep: 1469.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1868 [2024-06-10 06:25:19,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.17 | bwd_microstep: 744.10 | bwd_inner_microstep: 744.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432 [2024-06-10 06:25:21,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.35 | bwd_microstep: 1188.51 | bwd_inner_microstep: 1188.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 06:25:23,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.73 | bwd_microstep: 1294.82 | bwd_inner_microstep: 1294.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491 [2024-06-10 06:25:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.79 | bwd_microstep: 1322.13 | bwd_inner_microstep: 1322.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3689 [2024-06-10 06:25:27,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.38 | bwd_microstep: 1569.45 | bwd_inner_microstep: 1569.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 06:25:29,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.22 | bwd_microstep: 1490.24 | bwd_inner_microstep: 1490.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-10 06:25:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.72 | bwd_microstep: 1437.94 | bwd_inner_microstep: 1437.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2146 [2024-06-10 06:25:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.03 | bwd_microstep: 757.92 | bwd_inner_microstep: 757.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 06:25:34,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.51 | bwd_microstep: 1340.00 | bwd_inner_microstep: 1339.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 06:25:36,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424 [2024-06-10 06:25:38,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1409.97 | bwd_inner_microstep: 1409.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150 [2024-06-10 06:25:39,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.07 | bwd_microstep: 851.27 | bwd_inner_microstep: 851.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679 [2024-06-10 06:25:41,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1435.32 | bwd_inner_microstep: 1435.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 06:25:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1514.23 | bwd_inner_microstep: 1514.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 06:25:45,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1437.00 | bwd_inner_microstep: 1436.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2084 [2024-06-10 06:25:46,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.31 | bwd_microstep: 921.81 | bwd_inner_microstep: 921.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 06:25:48,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.44 | bwd_microstep: 1191.26 | bwd_inner_microstep: 1191.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 06:25:50,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.43 | bwd_microstep: 1287.43 | bwd_inner_microstep: 1287.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 06:25:52,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2039 [2024-06-10 06:25:53,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.34 | bwd_microstep: 969.58 | bwd_inner_microstep: 969.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560 [2024-06-10 06:25:55,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1426.46 | bwd_inner_microstep: 1426.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 06:25:57,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.51 | bwd_microstep: 1257.49 | bwd_inner_microstep: 1257.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3600 [2024-06-10 06:25:59,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.54 | bwd_microstep: 1450.07 | bwd_inner_microstep: 1450.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3430 [2024-06-10 06:26:00,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1313.08 | bwd_inner_microstep: 1313.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 06:26:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.38 | optimizer_step: 6.59 [2024-06-10 06:26:08,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.52 | bwd_microstep: 6483.70 | bwd_inner_microstep: 1659.02 | bwd_allreduce_microstep: 4824.62 | step_microstep: 39.64 [2024-06-10 06:26:08,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15710.66 | bwd: 46787.86 | bwd_inner: 41962.31 | bwd_allreduce: 4824.86 | step: 41.37 {'loss': 1.285, 'learning_rate': 3.730171060413103e-05, 'epoch': 0.19} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922 [2024-06-10 06:26:09,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.58 | bwd_microstep: 778.32 | bwd_inner_microstep: 778.20 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 06:26:11,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1374.86 | bwd_inner_microstep: 1374.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 06:26:13,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1476.63 | bwd_inner_microstep: 1476.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477 [2024-06-10 06:26:14,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1309.09 | bwd_inner_microstep: 1309.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3531 [2024-06-10 06:26:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.94 | bwd_microstep: 1338.92 | bwd_inner_microstep: 1338.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 06:26:18,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 06:26:19,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 699.35 | bwd_inner_microstep: 699.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1959 [2024-06-10 06:26:20,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.51 | bwd_microstep: 733.15 | bwd_inner_microstep: 733.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692 [2024-06-10 06:26:22,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1523.93 | bwd_inner_microstep: 1523.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 06:26:24,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.80 | bwd_microstep: 1286.96 | bwd_inner_microstep: 1286.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 06:26:26,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1296.44 | bwd_inner_microstep: 1296.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 06:26:28,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1287.99 | bwd_inner_microstep: 1287.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2099 [2024-06-10 06:26:29,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.00 | bwd_microstep: 1013.08 | bwd_inner_microstep: 1013.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 06:26:31,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.02 | bwd_inner_microstep: 1602.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3427 [2024-06-10 06:26:33,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.47 | bwd_microstep: 1542.02 | bwd_inner_microstep: 1541.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 06:26:35,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.74 | bwd_microstep: 1297.47 | bwd_inner_microstep: 1297.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 06:26:37,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1415.65 | bwd_inner_microstep: 1415.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 06:26:39,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1459.82 | bwd_inner_microstep: 1459.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 06:26:41,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833 [2024-06-10 06:26:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3614 [2024-06-10 06:26:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.27 | bwd_microstep: 1569.76 | bwd_inner_microstep: 1569.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 06:26:47,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1374.63 | bwd_inner_microstep: 1374.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2989 [2024-06-10 06:26:49,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.17 | bwd_microstep: 1142.69 | bwd_inner_microstep: 1142.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3535 [2024-06-10 06:26:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.82 | bwd_microstep: 1233.63 | bwd_inner_microstep: 1233.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-10 06:26:52,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.86 | bwd_microstep: 1441.94 | bwd_inner_microstep: 1441.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 06:26:55,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1518.66 | bwd_inner_microstep: 1518.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819 [2024-06-10 06:26:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.82 | bwd_microstep: 1682.36 | bwd_inner_microstep: 1682.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532 [2024-06-10 06:26:59,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1585.36 | bwd_inner_microstep: 1585.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585 [2024-06-10 06:27:01,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1306.18 | bwd_inner_microstep: 1306.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3692 [2024-06-10 06:27:03,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.01 | bwd_microstep: 1791.77 | bwd_inner_microstep: 1791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2397 [2024-06-10 06:27:05,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.47 | bwd_microstep: 1030.92 | bwd_inner_microstep: 1030.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 06:27:10,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-10 06:27:10,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.89 | bwd_microstep: 5082.04 | bwd_inner_microstep: 1638.96 | bwd_allreduce_microstep: 3443.03 | step_microstep: 38.72 [2024-06-10 06:27:10,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.41 | bwd: 46430.82 | bwd_inner: 42986.78 | bwd_allreduce: 3443.31 | step: 40.39 {'loss': 1.2688, 'learning_rate': 3.7282852196038495e-05, 'epoch': 0.19} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2951 [2024-06-10 06:27:12,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.63 | bwd_microstep: 1183.71 | bwd_inner_microstep: 1183.62 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-10 06:27:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.33 | bwd_microstep: 787.05 | bwd_inner_microstep: 787.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 06:27:14,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 790.39 | bwd_inner_microstep: 790.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 06:27:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1552.65 | bwd_inner_microstep: 1552.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 06:27:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1341.09 | bwd_inner_microstep: 1341.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3762 [2024-06-10 06:27:20,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.97 | bwd_microstep: 1307.70 | bwd_inner_microstep: 1307.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 06:27:22,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1393.65 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1863 [2024-06-10 06:27:23,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.26 | bwd_microstep: 747.02 | bwd_inner_microstep: 746.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708 [2024-06-10 06:27:25,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.93 | bwd_microstep: 1532.16 | bwd_inner_microstep: 1532.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413 [2024-06-10 06:27:27,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1445.15 | bwd_inner_microstep: 1445.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 06:27:29,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.45 | bwd_microstep: 1476.27 | bwd_inner_microstep: 1476.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 06:27:31,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1496.37 | bwd_inner_microstep: 1496.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492 [2024-06-10 06:27:33,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1535.18 | bwd_inner_microstep: 1535.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434 [2024-06-10 06:27:35,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 06:27:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1294.69 | bwd_inner_microstep: 1294.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983 [2024-06-10 06:27:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 897.37 | bwd_inner_microstep: 897.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 06:27:39,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.10 | bwd_microstep: 731.14 | bwd_inner_microstep: 731.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 06:27:41,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.66 | bwd_microstep: 1311.79 | bwd_inner_microstep: 1311.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2424 [2024-06-10 06:27:42,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 844.50 | bwd_inner_microstep: 844.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630 [2024-06-10 06:27:44,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1316.59 | bwd_inner_microstep: 1316.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 06:27:46,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1412.20 | bwd_inner_microstep: 1412.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 06:27:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1420.16 | bwd_inner_microstep: 1420.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 06:27:50,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 06:27:52,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1352.80 | bwd_inner_microstep: 1352.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 06:27:54,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.41 | bwd_microstep: 1354.60 | bwd_inner_microstep: 1354.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3735 [2024-06-10 06:27:56,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1470.55 | bwd_inner_microstep: 1470.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-10 06:27:58,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-10 06:27:59,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1966 [2024-06-10 06:28:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.59 | bwd_microstep: 734.23 | bwd_inner_microstep: 734.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3742 [2024-06-10 06:28:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1562.79 | bwd_inner_microstep: 1562.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 06:28:04,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.76 | bwd_microstep: 1253.97 | bwd_inner_microstep: 1253.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 06:28:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 06:28:13,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 7607.11 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 6062.67 | step_microstep: 39.35 [2024-06-10 06:28:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15187.01 | bwd: 46630.50 | bwd_inner: 40566.84 | bwd_allreduce: 6062.95 | step: 40.97 61.37s/it] 19%|█▉ | 328/1726 [5:44:41<23:49:52, 61.37s/it] 19%|█▉ | 329/1726 [5:45:41<23:42:27, 61.09s/it] 19%|█▉ | 329/1726 [5:45:41<23:42:27, 61.09s/it] 19%|█▉ | 330/1726 [5:46:42<23:36:41, 60.89s/it] 19%|█▉ | 330/1726 [5:46:42<23:36:41, 60.89s/it] 19%|█▉ | 331/1726 [5:47:41<23:28:36, 60.59s/it] 19%|█▉ | 331/1726 [5:47:41<23:28:36, 60.59s/it] 19%|█▉ | 332/1726 [5:48:44<23:43:24, 61.27s/it] 19%|█▉ | 332/1726 [5:48:44<23:43:24, 61.27s/it] 19%|█▉ | 333/1726 [5:49:47<23:53:22, 61.74s/it] 19%|█▉ | 333/1726 [5:49:47<23:53:22, 61.74s/it] 19%|█�{'loss': 1.2531, 'learning_rate': 3.726393291783036e-05, 'epoch': 0.19} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554 [2024-06-10 06:28:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.53 | bwd_microstep: 1590.68 | bwd_inner_microstep: 1590.56 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 06:28:16,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.71 | bwd_microstep: 1241.96 | bwd_inner_microstep: 1241.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 06:28:19,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1550.78 | bwd_inner_microstep: 1550.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 06:28:20,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-10 06:28:23,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.83 | bwd_microstep: 1639.79 | bwd_inner_microstep: 1639.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 06:28:24,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 06:28:26,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.13 | bwd_microstep: 1151.52 | bwd_inner_microstep: 1151.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 06:28:28,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1350.12 | bwd_inner_microstep: 1350.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 06:28:30,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 06:28:32,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1391.09 | bwd_inner_microstep: 1391.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2183 [2024-06-10 06:28:33,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.39 | bwd_microstep: 980.73 | bwd_inner_microstep: 980.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2186 [2024-06-10 06:28:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.67 | bwd_microstep: 1050.78 | bwd_inner_microstep: 1050.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 06:28:37,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1489.15 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121 [2024-06-10 06:28:38,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.14 | bwd_microstep: 828.88 | bwd_inner_microstep: 828.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-10 06:28:40,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.35 | bwd_microstep: 1628.56 | bwd_inner_microstep: 1628.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 06:28:42,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521 [2024-06-10 06:28:44,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.49 | bwd_microstep: 1583.28 | bwd_inner_microstep: 1583.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989 [2024-06-10 06:28:45,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 897.85 | bwd_inner_microstep: 897.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 06:28:47,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1350.62 | bwd_inner_microstep: 1350.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 06:28:49,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.41 | bwd_microstep: 1393.39 | bwd_inner_microstep: 1393.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3677 [2024-06-10 06:28:51,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.43 | bwd_microstep: 1690.75 | bwd_inner_microstep: 1690.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 06:28:54,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.78 | bwd_inner_microstep: 1557.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-10 06:28:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.52 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 06:28:56,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.67 | bwd_microstep: 698.38 | bwd_inner_microstep: 698.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3664 [2024-06-10 06:28:58,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1457.64 | bwd_inner_microstep: 1457.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558 [2024-06-10 06:29:00,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1560.18 | bwd_inner_microstep: 1560.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787 [2024-06-10 06:29:03,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1653.19 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 06:29:05,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2068 [2024-06-10 06:29:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.84 | bwd_microstep: 854.21 | bwd_inner_microstep: 854.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2879 [2024-06-10 06:29:08,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.40 | bwd_microstep: 1181.44 | bwd_inner_microstep: 1181.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591 [2024-06-10 06:29:10,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1433.88 | bwd_inner_microstep: 1433.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2295 [2024-06-10 06:29:14,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 06:29:14,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.93 | bwd_microstep: 3620.13 | bwd_inner_microstep: 1147.16 | bwd_allreduce_microstep: 2472.91 | step_microstep: 38.94 [2024-06-10 06:29:14,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15799.84 | bwd: 44940.22 | bwd_inner: 42466.29 | bwd_allreduce: 2473.20 | step: 40.56 {'loss': 1.3125, 'learning_rate': 3.724495283614024e-05, 'epoch': 0.19} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 06:29:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.07 | bwd_microstep: 797.27 | bwd_inner_microstep: 797.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 06:29:17,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1273.00 | bwd_inner_microstep: 1272.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3798 [2024-06-10 06:29:19,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.25 | bwd_microstep: 1743.34 | bwd_inner_microstep: 1743.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857 [2024-06-10 06:29:21,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.01 | bwd_microstep: 1660.55 | bwd_inner_microstep: 1660.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 06:29:23,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.63 | bwd_microstep: 1291.65 | bwd_inner_microstep: 1291.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134 [2024-06-10 06:29:25,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.99 | bwd_microstep: 1738.14 | bwd_inner_microstep: 1738.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971 [2024-06-10 06:29:26,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.03 | bwd_microstep: 705.77 | bwd_inner_microstep: 705.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 06:29:27,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 791.60 | bwd_inner_microstep: 791.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895 [2024-06-10 06:29:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.57 | bwd_microstep: 715.16 | bwd_inner_microstep: 715.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023 [2024-06-10 06:29:30,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 808.10 | bwd_inner_microstep: 808.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577 [2024-06-10 06:29:31,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1364.36 | bwd_inner_microstep: 1364.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567 [2024-06-10 06:29:34,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1526.95 | bwd_inner_microstep: 1526.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499 [2024-06-10 06:29:36,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1584.26 | bwd_inner_microstep: 1584.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3495 [2024-06-10 06:29:38,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.42 | bwd_microstep: 1646.61 | bwd_inner_microstep: 1646.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549 [2024-06-10 06:29:40,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1588.94 | bwd_inner_microstep: 1588.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4039 [2024-06-10 06:29:42,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.72 | bwd_microstep: 1651.21 | bwd_inner_microstep: 1651.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 06:29:43,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.80 | bwd_microstep: 700.94 | bwd_inner_microstep: 700.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 06:29:46,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.85 | bwd_microstep: 1484.02 | bwd_inner_microstep: 1484.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 06:29:47,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.54 | bwd_microstep: 1284.61 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648 [2024-06-10 06:29:50,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.41 | bwd_microstep: 1618.32 | bwd_inner_microstep: 1618.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947 [2024-06-10 06:29:51,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.51 | bwd_microstep: 730.63 | bwd_inner_microstep: 730.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3524 [2024-06-10 06:29:52,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.88 | bwd_microstep: 1198.98 | bwd_inner_microstep: 1198.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 06:29:54,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.60 | bwd_microstep: 1290.62 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 06:29:56,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1397.98 | bwd_inner_microstep: 1397.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 06:29:58,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1382.29 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080 [2024-06-10 06:29:59,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.90 | bwd_microstep: 915.85 | bwd_inner_microstep: 915.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 06:30:01,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.56 | bwd_microstep: 1377.70 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3531 [2024-06-10 06:30:03,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.52 | bwd_microstep: 1524.19 | bwd_inner_microstep: 1524.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3617 [2024-06-10 06:30:05,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1644.62 | bwd_inner_microstep: 1644.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-10 06:30:07,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1506.41 | bwd_inner_microstep: 1506.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819 [2024-06-10 06:30:10,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.19 | bwd_microstep: 1751.34 | bwd_inner_microstep: 1751.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2956 [2024-06-10 06:30:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 06:30:17,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.03 | bwd_microstep: 6963.57 | bwd_inner_microstep: 1354.03 | bwd_allreduce_microstep: 5609.49 | step_microstep: 38.85 [2024-06-10 06:30:17,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15659.75 | bwd: 47659.01 | bwd_inner: 42048.50 | bwd_allreduce: 5609.77 | step: 40.52 {'loss': 1.2894, 'learning_rate': 3.722591201781588e-05, 'epoch': 0.19} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 06:30:19,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1431.36 | bwd_inner_microstep: 1431.25 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 06:30:21,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1249.05 | bwd_inner_microstep: 1249.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 06:30:23,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 06:30:25,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 06:30:27,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.35 | bwd_microstep: 1480.55 | bwd_inner_microstep: 1480.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 06:30:28,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 792.30 | bwd_inner_microstep: 792.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 06:30:29,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 799.17 | bwd_inner_microstep: 799.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-10 06:30:30,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 801.14 | bwd_inner_microstep: 801.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-10 06:30:32,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.47 | bwd_microstep: 1163.38 | bwd_inner_microstep: 1163.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686 [2024-06-10 06:30:34,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1328.71 | bwd_inner_microstep: 1328.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-10 06:30:36,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1427.18 | bwd_inner_microstep: 1427.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-10 06:30:38,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.30 | bwd_microstep: 1423.33 | bwd_inner_microstep: 1423.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1942 [2024-06-10 06:30:39,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.31 | bwd_microstep: 850.13 | bwd_inner_microstep: 850.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3655 [2024-06-10 06:30:41,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.98 | bwd_microstep: 1719.07 | bwd_inner_microstep: 1719.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3504 [2024-06-10 06:30:43,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1367.19 | bwd_inner_microstep: 1367.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 06:30:45,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 06:30:47,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.84 | bwd_microstep: 1501.81 | bwd_inner_microstep: 1501.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 06:30:49,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1650.38 | bwd_inner_microstep: 1650.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523 [2024-06-10 06:30:51,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1423.83 | bwd_inner_microstep: 1423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 06:30:53,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.39 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 06:30:55,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667 [2024-06-10 06:30:57,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1327.30 | bwd_inner_microstep: 1327.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 06:30:59,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3554 [2024-06-10 06:31:01,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1548.57 | bwd_inner_microstep: 1548.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630 [2024-06-10 06:31:03,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1474.33 | bwd_inner_microstep: 1474.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 06:31:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824 [2024-06-10 06:31:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.73 | bwd_microstep: 1754.17 | bwd_inner_microstep: 1754.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2975 [2024-06-10 06:31:09,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1230.88 | bwd_inner_microstep: 1230.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3450 [2024-06-10 06:31:11,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1520.41 | bwd_inner_microstep: 1520.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 06:31:13,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.90 | bwd_microstep: 1595.21 | bwd_inner_microstep: 1595.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571 [2024-06-10 06:31:15,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.23 | bwd_microstep: 1433.06 | bwd_inner_microstep: 1433.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-10 06:31:18,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.59 [2024-06-10 06:31:18,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.24 | bwd_microstep: 2603.66 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 1142.37 | step_microstep: 38.34 [2024-06-10 06:31:18,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16206.79 | bwd: 44584.90 | bwd_inner: 43441.53 | bwd_allreduce: 1142.65 | step: 40.04 {'loss': 1.3149, 'learning_rate': 3.7206810529918935e-05, 'epoch': 0.2} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942 [2024-06-10 06:31:19,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.88 | bwd_microstep: 721.50 | bwd_inner_microstep: 721.35 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3885 [2024-06-10 06:31:22,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.46 | bwd_microstep: 1535.28 | bwd_inner_microstep: 1535.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852 [2024-06-10 06:31:24,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.33 | bwd_microstep: 1562.31 | bwd_inner_microstep: 1562.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 06:31:26,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.63 | bwd_microstep: 1344.72 | bwd_inner_microstep: 1344.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590 [2024-06-10 06:31:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1308.74 | bwd_inner_microstep: 1308.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423 [2024-06-10 06:31:29,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.75 | bwd_microstep: 1155.10 | bwd_inner_microstep: 1155.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 06:31:30,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.53 | bwd_microstep: 796.66 | bwd_inner_microstep: 796.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 06:31:32,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.12 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 06:31:34,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1400.00 | bwd_inner_microstep: 1399.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3700 [2024-06-10 06:31:36,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.31 | bwd_microstep: 1452.47 | bwd_inner_microstep: 1452.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 06:31:38,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.20 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502 [2024-06-10 06:31:40,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1437.48 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-10 06:31:41,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.73 | bwd_microstep: 892.70 | bwd_inner_microstep: 892.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129 [2024-06-10 06:31:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.30 | bwd_microstep: 926.27 | bwd_inner_microstep: 926.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-10 06:31:44,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.16 | bwd_microstep: 1310.63 | bwd_inner_microstep: 1310.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 06:31:46,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1349.42 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 06:31:48,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1529.98 | bwd_inner_microstep: 1529.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 06:31:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1405.56 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 06:31:52,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1290.19 | bwd_inner_microstep: 1290.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298 [2024-06-10 06:31:53,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.83 | bwd_microstep: 976.55 | bwd_inner_microstep: 976.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641 [2024-06-10 06:31:55,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.44 | bwd_microstep: 1583.40 | bwd_inner_microstep: 1583.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 06:31:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 06:31:59,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1289.53 | bwd_inner_microstep: 1289.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 06:32:01,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1315.91 | bwd_inner_microstep: 1315.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 06:32:03,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1316.65 | bwd_inner_microstep: 1316.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 06:32:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1559.22 | bwd_inner_microstep: 1559.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 06:32:07,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.62 | bwd_microstep: 1384.09 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3381 [2024-06-10 06:32:09,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1242.69 | bwd_inner_microstep: 1242.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 06:32:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1650.86 | bwd_inner_microstep: 1650.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 06:32:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.35 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 06:32:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1290.53 | bwd_inner_microstep: 1290.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3812 [2024-06-10 06:32:20,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 06:32:20,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.53 | bwd_microstep: 4927.94 | bwd_inner_microstep: 1907.08 | bwd_allreduce_microstep: 3020.82 | step_microstep: 38.79 [2024-06-10 06:32:20,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15818.94 | bwd: 45380.67 | bwd_inner: 42358.82 | bwd_allreduce: 3021.10 | step: 40.40 {'loss': 1.2819, 'learning_rate': 3.7187648439724755e-05, 'epoch': 0.2} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017 [2024-06-10 06:32:21,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.85 | bwd_microstep: 833.80 | bwd_inner_microstep: 833.69 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 06:32:23,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1386.07 | bwd_inner_microstep: 1386.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 06:32:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 06:32:27,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.37 | bwd_microstep: 1458.69 | bwd_inner_microstep: 1458.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471 [2024-06-10 06:32:29,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.58 | bwd_microstep: 1215.66 | bwd_inner_microstep: 1215.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3745 [2024-06-10 06:32:31,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1486.41 | bwd_inner_microstep: 1486.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 06:32:33,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1379.75 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 06:32:34,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2370 [2024-06-10 06:32:36,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 838.72 | bwd_inner_microstep: 838.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-10 06:32:37,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.65 | bwd_microstep: 1311.39 | bwd_inner_microstep: 1311.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-10 06:32:40,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.88 | bwd_microstep: 1627.92 | bwd_inner_microstep: 1627.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3412 [2024-06-10 06:32:42,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1295.60 | bwd_inner_microstep: 1295.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3672 [2024-06-10 06:32:44,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.72 | bwd_microstep: 1585.02 | bwd_inner_microstep: 1584.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 06:32:46,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1386.35 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546 [2024-06-10 06:32:48,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1587.27 | bwd_inner_microstep: 1587.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 06:32:50,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1507.47 | bwd_inner_microstep: 1507.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645 [2024-06-10 06:32:52,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1410.59 | bwd_inner_microstep: 1410.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3638 [2024-06-10 06:32:54,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.00 | bwd_microstep: 1221.28 | bwd_inner_microstep: 1221.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 06:32:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1293.87 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299 [2024-06-10 06:32:57,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.44 | bwd_microstep: 978.07 | bwd_inner_microstep: 978.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 06:32:59,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1485.49 | bwd_inner_microstep: 1485.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 06:33:00,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1281.46 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 06:33:03,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 06:33:05,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1512.82 | bwd_inner_microstep: 1512.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-10 06:33:06,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.23 | bwd_microstep: 817.99 | bwd_inner_microstep: 817.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2107 [2024-06-10 06:33:07,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.94 | bwd_microstep: 886.57 | bwd_inner_microstep: 886.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-10 06:33:09,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1437.71 | bwd_inner_microstep: 1437.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2043 [2024-06-10 06:33:10,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.36 | bwd_microstep: 845.03 | bwd_inner_microstep: 845.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1893 [2024-06-10 06:33:11,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.58 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 06:33:13,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1251.10 | bwd_inner_microstep: 1251.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 06:33:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2012 [2024-06-10 06:33:20,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 06:33:20,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.72 | bwd_microstep: 5294.04 | bwd_inner_microstep: 1022.30 | bwd_allreduce_microstep: 4271.69 | step_microstep: 38.65 [2024-06-10 06:33:20,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15189.69 | bwd: 44886.72 | bwd_inner: 40614.01 | bwd_allreduce: 4271.97 | step: 40.30 {'loss': 1.3032, 'learning_rate': 3.7168425814722127e-05, 'epoch': 0.2} � | 334/1726 [5:50:49<23:55:19, 61.87s/it] 19%|█▉ | 334/1726 [5:50:49<23:55:19, 61.87s/it] 19%|█▉ | 335/1726 [5:51:50<23:48:53, 61.63s/it] 19%|█▉ | 335/1726 [5:51:50<23:48:53, 61.63s/it] 19%|█▉ | 336/1726 [5:52:54<24:02:00, 62.24s/it] 19%|█▉ | 336/1726 [5:52:54<24:02:00, 62.24s/it] 20%|█▉ | 337/1726 [5:53:55<23:53:21, 61.92s/it] 20%|█▉ | 337/1726 [5:53:55<23:53:21, 61.92s/it] 20%|█▉ | 338/1726 [5:54:57<23:49:44, 61.80s/it] 20%|█▉ | 338/1726 [5:54:57<23:49:44, 61.80s/it] 20%|█▉ | 339/1726 [5:55:57<23:39:08, 61.39s/it] 20%|█▉ | 339/1726 [5dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3394 [2024-06-10 06:33:22,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1390.70 | bwd_inner_microstep: 1390.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 06:33:24,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1276.69 | bwd_inner_microstep: 1276.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874 [2024-06-10 06:33:26,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.44 | bwd_microstep: 1579.67 | bwd_inner_microstep: 1579.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 06:33:28,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1343.35 | bwd_inner_microstep: 1343.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899 [2024-06-10 06:33:29,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.80 | bwd_microstep: 778.34 | bwd_inner_microstep: 778.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748 [2024-06-10 06:33:32,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.77 | bwd_microstep: 1638.89 | bwd_inner_microstep: 1638.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 06:33:33,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1380.83 | bwd_inner_microstep: 1380.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913 [2024-06-10 06:33:34,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.26 | bwd_microstep: 715.51 | bwd_inner_microstep: 715.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489 [2024-06-10 06:33:36,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1218.96 | bwd_inner_microstep: 1218.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 06:33:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.10 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-10 06:33:39,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 793.72 | bwd_inner_microstep: 793.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892 [2024-06-10 06:33:40,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.78 | bwd_microstep: 372.69 | bwd_inner_microstep: 372.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 06:33:42,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-10 06:33:43,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.79 | bwd_microstep: 1185.06 | bwd_inner_microstep: 1185.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3663 [2024-06-10 06:33:45,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1330.60 | bwd_inner_microstep: 1330.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3387 [2024-06-10 06:33:47,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.78 | bwd_microstep: 1273.32 | bwd_inner_microstep: 1273.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509 [2024-06-10 06:33:49,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.17 | bwd_microstep: 1585.78 | bwd_inner_microstep: 1585.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3519 [2024-06-10 06:33:51,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1581.99 | bwd_inner_microstep: 1581.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3617 [2024-06-10 06:33:54,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1704.25 | bwd_inner_microstep: 1704.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817 [2024-06-10 06:33:56,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.91 | bwd_microstep: 1514.87 | bwd_inner_microstep: 1514.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3824 [2024-06-10 06:33:58,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.09 | bwd_microstep: 1601.35 | bwd_inner_microstep: 1601.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 06:34:00,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1446.01 | bwd_inner_microstep: 1445.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 06:34:02,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.62 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 06:34:04,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.01 | bwd_microstep: 1529.66 | bwd_inner_microstep: 1529.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 06:34:06,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1494.50 | bwd_inner_microstep: 1494.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3682 [2024-06-10 06:34:08,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1232.86 | bwd_inner_microstep: 1232.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2240 [2024-06-10 06:34:09,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.04 | bwd_microstep: 1060.42 | bwd_inner_microstep: 1060.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445 [2024-06-10 06:34:11,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.08 | bwd_microstep: 1157.41 | bwd_inner_microstep: 1157.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776 [2024-06-10 06:34:13,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.35 | bwd_microstep: 1681.73 | bwd_inner_microstep: 1681.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 06:34:15,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.46 | bwd_microstep: 1502.36 | bwd_inner_microstep: 1502.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-10 06:34:17,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.03 | bwd_microstep: 1603.41 | bwd_inner_microstep: 1603.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228 [2024-06-10 06:34:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 06:34:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.39 | bwd_microstep: 5214.74 | bwd_inner_microstep: 1089.57 | bwd_allreduce_microstep: 4125.11 | step_microstep: 38.83 [2024-06-10 06:34:23,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15729.54 | bwd: 46370.48 | bwd_inner: 42244.43 | bwd_allreduce: 4125.35 | step: 40.43 {'loss': 1.3781, 'learning_rate': 3.714914272261302e-05, 'epoch': 0.2} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 06:34:25,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1366.97 | bwd_inner_microstep: 1366.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4022 [2024-06-10 06:34:27,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.18 | bwd_microstep: 1709.42 | bwd_inner_microstep: 1709.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2422 [2024-06-10 06:34:29,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.72 | bwd_microstep: 1002.38 | bwd_inner_microstep: 1002.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2298 [2024-06-10 06:34:30,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 877.68 | bwd_inner_microstep: 877.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 06:34:32,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.85 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 06:34:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1250.26 | bwd_inner_microstep: 1250.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 06:34:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 06:34:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 06:34:39,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1247.73 | bwd_inner_microstep: 1247.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 06:34:41,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 06:34:42,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1418.04 | bwd_inner_microstep: 1418.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 06:34:43,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 698.65 | bwd_inner_microstep: 698.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449 [2024-06-10 06:34:45,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.33 | bwd_microstep: 1285.01 | bwd_inner_microstep: 1284.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3963 [2024-06-10 06:34:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.67 | bwd_microstep: 1800.48 | bwd_inner_microstep: 1800.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086 [2024-06-10 06:34:49,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.87 | bwd_microstep: 824.43 | bwd_inner_microstep: 824.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 06:34:51,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-10 06:34:52,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 795.00 | bwd_inner_microstep: 794.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 06:34:54,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 06:34:56,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1475.92 | bwd_inner_microstep: 1475.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-10 06:34:58,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1616.19 | bwd_inner_microstep: 1616.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 747 [2024-06-10 06:34:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.57 | bwd_microstep: 302.85 | bwd_inner_microstep: 302.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3559 [2024-06-10 06:35:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.01 | bwd_microstep: 1550.00 | bwd_inner_microstep: 1549.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-10 06:35:02,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 699.02 | bwd_inner_microstep: 698.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3712 [2024-06-10 06:35:03,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1236.33 | bwd_inner_microstep: 1236.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469 [2024-06-10 06:35:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.20 | bwd_microstep: 1184.29 | bwd_inner_microstep: 1184.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558 [2024-06-10 06:35:07,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1361.01 | bwd_inner_microstep: 1360.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 06:35:09,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.90 | bwd_microstep: 1260.32 | bwd_inner_microstep: 1260.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 06:35:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3767 [2024-06-10 06:35:13,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.44 | bwd_microstep: 1739.41 | bwd_inner_microstep: 1739.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3580 [2024-06-10 06:35:15,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1563.03 | bwd_inner_microstep: 1563.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 06:35:17,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1291.80 | bwd_inner_microstep: 1291.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1020 [2024-06-10 06:35:24,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.56 [2024-06-10 06:35:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.78 | bwd_microstep: 6920.40 | bwd_inner_microstep: 535.16 | bwd_allreduce_microstep: 6385.19 | step_microstep: 39.38 [2024-06-10 06:35:24,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14804.22 | bwd: 45893.27 | bwd_inner: 39507.17 | bwd_allreduce: 6385.42 | step: 41.01 {'loss': 1.264, 'learning_rate': 3.71297992313124e-05, 'epoch': 0.2} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 06:35:26,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1478.40 | bwd_inner_microstep: 1478.33 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878 [2024-06-10 06:35:27,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.41 | bwd_microstep: 747.83 | bwd_inner_microstep: 747.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477 [2024-06-10 06:35:29,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1340.70 | bwd_inner_microstep: 1340.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3839 [2024-06-10 06:35:31,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1483.96 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 06:35:33,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1403.64 | bwd_inner_microstep: 1403.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3754 [2024-06-10 06:35:35,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1370.30 | bwd_inner_microstep: 1370.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2476 [2024-06-10 06:35:36,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.11 | bwd_microstep: 953.11 | bwd_inner_microstep: 953.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617 [2024-06-10 06:35:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1313.40 | bwd_inner_microstep: 1313.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-10 06:35:40,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1632.24 | bwd_inner_microstep: 1632.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3736 [2024-06-10 06:35:42,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.39 | bwd_microstep: 1594.66 | bwd_inner_microstep: 1594.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792 [2024-06-10 06:35:44,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1469.64 | bwd_inner_microstep: 1469.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 06:35:46,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1350.07 | bwd_inner_microstep: 1350.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 06:35:48,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1319.67 | bwd_inner_microstep: 1319.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668 [2024-06-10 06:35:50,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.09 | bwd_microstep: 1618.88 | bwd_inner_microstep: 1618.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510 [2024-06-10 06:35:52,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.19 | bwd_inner_microstep: 1446.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984 [2024-06-10 06:35:53,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.38 | bwd_microstep: 830.07 | bwd_inner_microstep: 830.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1980 [2024-06-10 06:35:55,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.39 | bwd_microstep: 925.12 | bwd_inner_microstep: 925.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 06:35:57,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.32 | bwd_microstep: 1599.67 | bwd_inner_microstep: 1599.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3445 [2024-06-10 06:35:59,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.60 | bwd_microstep: 1187.82 | bwd_inner_microstep: 1187.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 06:36:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1653.66 | bwd_inner_microstep: 1653.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 06:36:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.90 | bwd_microstep: 1313.06 | bwd_inner_microstep: 1313.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 06:36:04,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1160.29 | bwd_inner_microstep: 1160.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103 [2024-06-10 06:36:05,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.16 | bwd_microstep: 824.73 | bwd_inner_microstep: 824.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 06:36:07,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.67 | bwd_microstep: 1450.45 | bwd_inner_microstep: 1450.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2187 [2024-06-10 06:36:08,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.53 | bwd_microstep: 767.23 | bwd_inner_microstep: 767.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 06:36:10,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.78 | bwd_inner_microstep: 1256.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 06:36:12,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.41 | bwd_microstep: 1645.89 | bwd_inner_microstep: 1645.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 06:36:15,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1655.75 | bwd_inner_microstep: 1655.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 06:36:17,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1554.78 | bwd_inner_microstep: 1554.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 06:36:19,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.03 | bwd_microstep: 1545.00 | bwd_inner_microstep: 1544.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 06:36:21,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1456.94 | bwd_inner_microstep: 1456.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-10 06:36:25,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.36 | optimizer_step: 6.60 [2024-06-10 06:36:25,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 3675.34 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 2196.86 | step_microstep: 39.53 [2024-06-10 06:36:25,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15955.47 | bwd: 45025.30 | bwd_inner: 42827.45 | bwd_allreduce: 2197.14 | step: 41.25 {'loss': 1.3162, 'learning_rate': 3.7110395408947937e-05, 'epoch': 0.2} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036 [2024-06-10 06:36:27,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.79 | bwd_microstep: 900.87 | bwd_inner_microstep: 900.71 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4078 [2024-06-10 06:36:29,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.00 | bwd_microstep: 1620.62 | bwd_inner_microstep: 1620.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-10 06:36:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1650.38 | bwd_inner_microstep: 1650.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 06:36:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1283.49 | bwd_inner_microstep: 1283.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 06:36:35,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 06:36:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1249.54 | bwd_inner_microstep: 1249.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 06:36:38,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-10 06:36:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.20 | bwd_microstep: 1445.28 | bwd_inner_microstep: 1445.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 06:36:42,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 06:36:44,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1253.23 | bwd_inner_microstep: 1253.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 06:36:46,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1417.57 | bwd_inner_microstep: 1417.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954 [2024-06-10 06:36:47,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.40 | bwd_microstep: 858.10 | bwd_inner_microstep: 858.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 06:36:49,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1388.23 | bwd_inner_microstep: 1388.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510 [2024-06-10 06:36:51,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.23 | bwd_microstep: 1553.97 | bwd_inner_microstep: 1553.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 06:36:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1553.77 | bwd_inner_microstep: 1553.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3696 [2024-06-10 06:36:56,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.06 | bwd_microstep: 1560.73 | bwd_inner_microstep: 1560.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 06:36:57,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.20 | bwd_microstep: 1386.85 | bwd_inner_microstep: 1386.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 06:36:59,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1314.58 | bwd_inner_microstep: 1314.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687 [2024-06-10 06:37:01,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.46 | bwd_microstep: 1435.48 | bwd_inner_microstep: 1435.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 06:37:03,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1506.25 | bwd_inner_microstep: 1506.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 06:37:05,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1553.23 | bwd_inner_microstep: 1553.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612 [2024-06-10 06:37:07,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.92 | bwd_microstep: 1316.37 | bwd_inner_microstep: 1316.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520 [2024-06-10 06:37:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.44 | bwd_microstep: 1452.84 | bwd_inner_microstep: 1452.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 06:37:11,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1497.05 | bwd_inner_microstep: 1497.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 06:37:13,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603 [2024-06-10 06:37:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.86 | bwd_microstep: 1707.49 | bwd_inner_microstep: 1707.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816 [2024-06-10 06:37:18,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.58 | bwd_microstep: 1604.07 | bwd_inner_microstep: 1604.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581 [2024-06-10 06:37:20,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.21 | bwd_microstep: 1697.09 | bwd_inner_microstep: 1697.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3726 [2024-06-10 06:37:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.41 | bwd_microstep: 1837.33 | bwd_inner_microstep: 1837.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2232 [2024-06-10 06:37:24,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.96 | bwd_microstep: 993.96 | bwd_inner_microstep: 993.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 06:37:26,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 06:37:28,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 06:37:28,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1440.56 | bwd_inner_microstep: 1432.80 | bwd_allreduce_microstep: 7.71 | step_microstep: 38.38 [2024-06-10 06:37:28,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16988.25 | bwd: 45593.95 | bwd_inner: 45585.22 | bwd_allreduce: 7.99 | step: 40.06 {'loss': 1.2781, 'learning_rate': 3.7090931323859794e-05, 'epoch': 0.2} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402 [2024-06-10 06:37:30,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.99 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2620 [2024-06-10 06:37:32,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.87 | bwd_microstep: 1049.33 | bwd_inner_microstep: 1049.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873 [2024-06-10 06:37:34,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1685.85 | bwd_inner_microstep: 1685.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-10 06:37:36,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.19 | bwd_microstep: 1539.66 | bwd_inner_microstep: 1539.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 06:37:38,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.18 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1863 [2024-06-10 06:37:39,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 709.19 | bwd_inner_microstep: 709.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 06:37:41,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.58 | bwd_microstep: 1541.56 | bwd_inner_microstep: 1541.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 06:37:43,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1251.36 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 06:37:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1287.28 | bwd_inner_microstep: 1287.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490 [2024-06-10 06:37:46,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.68 | bwd_microstep: 1223.55 | bwd_inner_microstep: 1223.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 06:37:48,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1248.24 | bwd_inner_microstep: 1248.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1943 [2024-06-10 06:37:49,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.91 | bwd_microstep: 812.71 | bwd_inner_microstep: 812.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3403 [2024-06-10 06:37:51,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1294.70 | bwd_inner_microstep: 1294.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 06:37:53,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1490.61 | bwd_inner_microstep: 1490.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2183 [2024-06-10 06:37:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.22 | bwd_microstep: 985.76 | bwd_inner_microstep: 985.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-10 06:37:56,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.86 | bwd_microstep: 1452.36 | bwd_inner_microstep: 1452.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 06:37:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1256.68 | bwd_inner_microstep: 1256.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2480 [2024-06-10 06:37:59,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.95 | bwd_microstep: 956.03 | bwd_inner_microstep: 956.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2147 [2024-06-10 06:38:01,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 883.30 | bwd_inner_microstep: 883.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2940 [2024-06-10 06:38:02,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.76 | bwd_microstep: 1245.41 | bwd_inner_microstep: 1245.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 06:38:04,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1387.34 | bwd_inner_microstep: 1387.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 06:38:06,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1403.39 | bwd_inner_microstep: 1403.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 06:38:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.74 | bwd_microstep: 1352.93 | bwd_inner_microstep: 1352.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2286 [2024-06-10 06:38:09,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.82 | bwd_microstep: 914.95 | bwd_inner_microstep: 914.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815 [2024-06-10 06:38:11,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1388.87 | bwd_inner_microstep: 1388.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 06:38:14,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.61 | bwd_microstep: 2021.26 | bwd_inner_microstep: 2021.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 06:38:16,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1411.73 | bwd_inner_microstep: 1411.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 06:38:17,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.20 | bwd_inner_microstep: 1255.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2411 [2024-06-10 06:38:19,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.51 | bwd_microstep: 1003.02 | bwd_inner_microstep: 1002.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 06:38:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.66 | bwd_microstep: 1539.88 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710 [2024-06-10 06:38:23,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1531.55 | bwd_inner_microstep: 1531.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 06:38:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 06:38:27,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.50 | bwd_microstep: 3742.34 | bwd_inner_microstep: 1742.28 | bwd_allreduce_microstep: 2000.01 | step_microstep: 38.70 [2024-06-10 06:38:27,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15287.29 | bwd: 43592.14 | bwd_inner: 41591.18 | bwd_allreduce: 2000.25 | step: 40.45 {'loss': 1.3095, 'learning_rate': 3.707140704460037e-05, 'epoch': 0.2} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 06:38:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.25 | bwd_microstep: 781.65 | bwd_inner_microstep: 781.57 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 06:38:30,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1378.45 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 06:38:32,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.02 | bwd_inner_microstep: 1282.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 06:38:34,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.77 | bwd_microstep: 1342.44 | bwd_inner_microstep: 1342.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777 [2024-06-10 06:38:36,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.55 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-10 06:38:38,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1640.00 | bwd_inner_microstep: 1639.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719 [2024-06-10 06:38:40,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 06:38:42,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 06:38:44,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696 [2024-06-10 06:38:46,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1430.29 | bwd_inner_microstep: 1430.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3585 [2024-06-10 06:38:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.08 | bwd_microstep: 1212.93 | bwd_inner_microstep: 1212.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937 [2024-06-10 06:38:49,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.35 | bwd_microstep: 757.92 | bwd_inner_microstep: 757.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2174 [2024-06-10 06:38:50,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.81 | bwd_microstep: 948.19 | bwd_inner_microstep: 948.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-10 06:38:52,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1317.16 | bwd_inner_microstep: 1317.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 06:38:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 06:38:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1475.11 | bwd_inner_microstep: 1475.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 06:38:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1391.53 | bwd_inner_microstep: 1391.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 06:39:00,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1515.42 | bwd_inner_microstep: 1515.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 06:39:02,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 06:39:04,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1414.99 | bwd_inner_microstep: 1414.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 06:39:06,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1496.82 | bwd_inner_microstep: 1496.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4123 [2024-06-10 06:39:08,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.75 | bwd_microstep: 1567.31 | bwd_inner_microstep: 1567.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3815 [2024-06-10 06:39:10,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1624.28 | bwd_inner_microstep: 1624.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 06:39:12,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437 [2024-06-10 06:39:14,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1381.88 | bwd_inner_microstep: 1381.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2188 [2024-06-10 06:39:15,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.27 | bwd_microstep: 795.29 | bwd_inner_microstep: 795.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 06:39:17,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2552 [2024-06-10 06:39:19,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.50 | bwd_microstep: 1154.03 | bwd_inner_microstep: 1154.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762 [2024-06-10 06:39:21,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1465.29 | bwd_inner_microstep: 1465.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 06:39:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.30 | bwd_microstep: 1598.21 | bwd_inner_microstep: 1598.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 06:39:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.86 | bwd_microstep: 1643.14 | bwd_inner_microstep: 1643.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3398 [2024-06-10 06:39:27,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.64 [2024-06-10 06:39:27,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1432.35 | bwd_inner_microstep: 1424.65 | bwd_allreduce_microstep: 7.66 | step_microstep: 38.39 [2024-06-10 06:39:27,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16192.46 | bwd: 43231.01 | bwd_inner: 43222.39 | bwd_allreduce: 7.92 | step: 40.04 :55:57<23:39:08, 61.39s/it] 20%|█▉ | 340/1726 [5:57:00<23:45:26, 61.71s/it] 20%|█▉ | 340/1726 [5:57:00<23:45:26, 61.71s/it] 20%|█▉ | 341/1726 [5:58:01<23:39:48, 61.51s/it] 20%|█▉ | 341/1726 [5:58:01<23:39:48, 61.51s/it] 20%|█▉ | 342/1726 [5:59:02<23:37:32, 61.45s/it] 20%|█▉ | 342/1726 [5:59:02<23:37:32, 61.45s/it] 20%|█▉ | 343/1726 [6:00:05<23:46:48, 61.90s/it] 20%|█▉ | 343/1726 [6:00:05<23:46:48, 61.90s/it] 20%|█▉ | 344/1726 [6:01:04<23:27:22, 61.10s/it] 20%|█▉ | 344/1726 [6:01:04<23:27:22, 61.10s/it] 20%|█▉ | 345/1726 [6:02:04<23:17:10, 60.70s/it] {'loss': 1.281, 'learning_rate': 3.7051822639934086e-05, 'epoch': 0.2} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-10 06:39:28,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.59 | bwd_microstep: 783.20 | bwd_inner_microstep: 783.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985 [2024-06-10 06:39:30,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1504.95 | bwd_inner_microstep: 1504.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845 [2024-06-10 06:39:33,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.47 | bwd_microstep: 1662.43 | bwd_inner_microstep: 1662.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 06:39:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1382.66 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4217 [2024-06-10 06:39:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.72 | bwd_microstep: 1658.54 | bwd_inner_microstep: 1658.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 06:39:39,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1457.81 | bwd_inner_microstep: 1457.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576 [2024-06-10 06:39:41,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.70 | bwd_microstep: 1208.91 | bwd_inner_microstep: 1208.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3480 [2024-06-10 06:39:42,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1333.48 | bwd_inner_microstep: 1333.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3742 [2024-06-10 06:39:44,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.04 | bwd_microstep: 1337.55 | bwd_inner_microstep: 1337.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425 [2024-06-10 06:39:46,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1286.09 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 06:39:48,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.59 | bwd_microstep: 1156.94 | bwd_inner_microstep: 1156.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 06:39:49,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.14 | bwd_inner_microstep: 1255.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-10 06:39:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.12 | bwd_microstep: 729.07 | bwd_inner_microstep: 729.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 06:39:52,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.92 | bwd_microstep: 1451.35 | bwd_inner_microstep: 1451.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885 [2024-06-10 06:39:53,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.55 | bwd_microstep: 684.93 | bwd_inner_microstep: 684.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 06:39:55,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1395.37 | bwd_inner_microstep: 1395.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893 [2024-06-10 06:39:58,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.53 | bwd_microstep: 1689.91 | bwd_inner_microstep: 1689.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-10 06:39:59,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.87 | bwd_microstep: 1282.55 | bwd_inner_microstep: 1282.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 06:40:01,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1314.49 | bwd_inner_microstep: 1314.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 06:40:03,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535 [2024-06-10 06:40:05,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-10 06:40:07,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1343.54 | bwd_inner_microstep: 1343.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3517 [2024-06-10 06:40:09,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.02 | bwd_microstep: 1224.68 | bwd_inner_microstep: 1224.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 06:40:10,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.54 | bwd_microstep: 803.73 | bwd_inner_microstep: 803.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 06:40:12,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.46 | bwd_microstep: 1658.26 | bwd_inner_microstep: 1658.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 06:40:14,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1288.43 | bwd_inner_microstep: 1288.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 06:40:16,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1473.61 | bwd_inner_microstep: 1473.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 06:40:18,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1652.92 | bwd_inner_microstep: 1652.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765 [2024-06-10 06:40:20,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.08 | bwd_microstep: 1636.77 | bwd_inner_microstep: 1636.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588 [2024-06-10 06:40:23,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1568.61 | bwd_inner_microstep: 1568.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899 [2024-06-10 06:40:24,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.35 | bwd_microstep: 680.17 | bwd_inner_microstep: 680.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2247 [2024-06-10 06:40:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.35 | optimizer_step: 6.59 [2024-06-10 06:40:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.33 | bwd_microstep: 3205.12 | bwd_inner_microstep: 1019.69 | bwd_allreduce_microstep: 2185.38 | step_microstep: 38.61 [2024-06-10 06:40:27,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15641.05 | bwd: 43925.59 | bwd_inner: 41739.20 | bwd_allreduce: 2185.66 | step: 40.47 {'loss': 1.2802, 'learning_rate': 3.70321781788371e-05, 'epoch': 0.2} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 06:40:29,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.78 | bwd_microstep: 1271.71 | bwd_inner_microstep: 1271.57 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3887 [2024-06-10 06:40:31,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.66 | bwd_microstep: 1683.59 | bwd_inner_microstep: 1683.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 06:40:33,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1554.30 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 06:40:35,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523 [2024-06-10 06:40:37,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1229.60 | bwd_inner_microstep: 1229.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793 [2024-06-10 06:40:39,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.10 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3518 [2024-06-10 06:40:41,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.20 | bwd_microstep: 1256.04 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902 [2024-06-10 06:40:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.72 | bwd_microstep: 686.38 | bwd_inner_microstep: 686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3737 [2024-06-10 06:40:44,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.99 | bwd_microstep: 1485.70 | bwd_inner_microstep: 1485.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686 [2024-06-10 06:40:46,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.36 | bwd_microstep: 1531.02 | bwd_inner_microstep: 1530.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434 [2024-06-10 06:40:48,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2086 [2024-06-10 06:40:49,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.50 | bwd_microstep: 760.77 | bwd_inner_microstep: 760.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3657 [2024-06-10 06:40:51,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1455.39 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 06:40:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1377.47 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 06:40:55,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1343.23 | bwd_inner_microstep: 1343.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-10 06:40:57,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.72 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3521 [2024-06-10 06:40:59,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.09 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 06:41:01,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1246.16 | bwd_inner_microstep: 1246.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3516 [2024-06-10 06:41:03,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1244.93 | bwd_inner_microstep: 1244.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 06:41:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782 [2024-06-10 06:41:06,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.61 | bwd_microstep: 1479.97 | bwd_inner_microstep: 1479.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3541 [2024-06-10 06:41:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721 [2024-06-10 06:41:10,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.50 | bwd_microstep: 1339.94 | bwd_inner_microstep: 1339.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3603 [2024-06-10 06:41:12,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.17 | bwd_microstep: 1543.75 | bwd_inner_microstep: 1543.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 06:41:14,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1473.88 | bwd_inner_microstep: 1473.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267 [2024-06-10 06:41:16,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.99 | bwd_microstep: 971.17 | bwd_inner_microstep: 971.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2415 [2024-06-10 06:41:17,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.77 | bwd_microstep: 1044.75 | bwd_inner_microstep: 1044.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 06:41:19,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.69 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 06:41:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1296.10 | bwd_inner_microstep: 1296.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-10 06:41:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1315.76 | bwd_inner_microstep: 1315.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279 [2024-06-10 06:41:24,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.96 | bwd_microstep: 880.26 | bwd_inner_microstep: 880.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729 [2024-06-10 06:41:28,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 06:41:28,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 3569.52 | bwd_inner_microstep: 1624.10 | bwd_allreduce_microstep: 1945.37 | step_microstep: 38.81 [2024-06-10 06:41:28,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15977.02 | bwd: 44602.00 | bwd_inner: 42655.60 | bwd_allreduce: 1945.67 | step: 40.60 {'loss': 1.3171, 'learning_rate': 3.7012473730497115e-05, 'epoch': 0.2} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 06:41:30,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.40 | bwd_microstep: 1607.39 | bwd_inner_microstep: 1607.23 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937 [2024-06-10 06:41:32,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.33 | bwd_microstep: 1491.98 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 06:41:34,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1482.86 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407 [2024-06-10 06:41:36,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.18 | bwd_microstep: 1309.06 | bwd_inner_microstep: 1309.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 06:41:38,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.72 | bwd_microstep: 1505.48 | bwd_inner_microstep: 1505.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473 [2024-06-10 06:41:40,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 1332.27 | bwd_inner_microstep: 1332.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851 [2024-06-10 06:41:42,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.52 | bwd_microstep: 1662.48 | bwd_inner_microstep: 1662.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 06:41:44,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1246.78 | bwd_inner_microstep: 1246.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2124 [2024-06-10 06:41:45,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.41 | bwd_microstep: 831.15 | bwd_inner_microstep: 831.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643 [2024-06-10 06:41:47,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1317.31 | bwd_inner_microstep: 1317.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695 [2024-06-10 06:41:49,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.58 | bwd_microstep: 1629.58 | bwd_inner_microstep: 1629.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514 [2024-06-10 06:41:51,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1415.61 | bwd_inner_microstep: 1415.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 06:41:53,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-10 06:41:55,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.60 | bwd_microstep: 1435.90 | bwd_inner_microstep: 1435.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2906 [2024-06-10 06:41:57,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.09 | bwd_microstep: 1124.15 | bwd_inner_microstep: 1124.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 06:41:59,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 1379.11 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 06:42:01,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.52 | bwd_microstep: 1501.08 | bwd_inner_microstep: 1501.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653 [2024-06-10 06:42:03,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.29 | bwd_microstep: 1612.28 | bwd_inner_microstep: 1612.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1899 [2024-06-10 06:42:04,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3570 [2024-06-10 06:42:06,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.98 | bwd_microstep: 1207.42 | bwd_inner_microstep: 1207.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-10 06:42:07,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.21 | bwd_microstep: 1186.96 | bwd_inner_microstep: 1186.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572 [2024-06-10 06:42:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1564.92 | bwd_inner_microstep: 1564.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 06:42:11,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.85 | bwd_microstep: 705.60 | bwd_inner_microstep: 705.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 06:42:12,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1247.03 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 06:42:14,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 06:42:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1397.40 | bwd_inner_microstep: 1397.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 06:42:18,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.43 | bwd_microstep: 1544.04 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 06:42:20,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1394.56 | bwd_inner_microstep: 1394.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3456 [2024-06-10 06:42:22,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.41 | bwd_microstep: 1221.31 | bwd_inner_microstep: 1221.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 06:42:24,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.10 | bwd_microstep: 1654.04 | bwd_inner_microstep: 1654.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 06:42:26,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.32 | bwd_microstep: 1301.37 | bwd_inner_microstep: 1301.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-10 06:42:31,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 06:42:31,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.41 | bwd_microstep: 4427.19 | bwd_inner_microstep: 2105.93 | bwd_allreduce_microstep: 2321.20 | step_microstep: 38.79 [2024-06-10 06:42:31,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16368.98 | bwd: 46440.14 | bwd_inner: 44117.90 | bwd_allreduce: 2321.49 | step: 40.52 {'loss': 1.3499, 'learning_rate': 3.699270936431309e-05, 'epoch': 0.2} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1955 [2024-06-10 06:42:32,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 889.33 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.19 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3989 [2024-06-10 06:42:34,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1409.60 | bwd_inner_microstep: 1409.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898 [2024-06-10 06:42:37,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.54 | bwd_microstep: 1683.35 | bwd_inner_microstep: 1683.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 06:42:39,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233 [2024-06-10 06:42:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.43 | bwd_microstep: 959.51 | bwd_inner_microstep: 959.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 06:42:42,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1386.19 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-10 06:42:44,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1547.54 | bwd_inner_microstep: 1547.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-10 06:42:46,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.44 | bwd_microstep: 1632.87 | bwd_inner_microstep: 1632.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 06:42:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.49 | bwd_microstep: 1301.91 | bwd_inner_microstep: 1301.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 06:42:50,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 06:42:52,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.71 | bwd_microstep: 1456.60 | bwd_inner_microstep: 1456.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 06:42:54,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.40 | bwd_microstep: 1481.40 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964 [2024-06-10 06:42:55,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.11 | bwd_microstep: 829.99 | bwd_inner_microstep: 829.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 06:42:57,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1345.93 | bwd_inner_microstep: 1345.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2913 [2024-06-10 06:42:59,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.32 | bwd_microstep: 1189.72 | bwd_inner_microstep: 1189.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398 [2024-06-10 06:43:01,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1440.22 | bwd_inner_microstep: 1440.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 06:43:03,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.66 | bwd_microstep: 1586.59 | bwd_inner_microstep: 1586.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 06:43:04,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 802.31 | bwd_inner_microstep: 802.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 06:43:06,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 06:43:08,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.69 | bwd_microstep: 1353.79 | bwd_inner_microstep: 1353.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2433 [2024-06-10 06:43:09,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.87 | bwd_microstep: 1042.26 | bwd_inner_microstep: 1042.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-10 06:43:11,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.79 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 06:43:13,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.74 | bwd_microstep: 1257.24 | bwd_inner_microstep: 1257.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605 [2024-06-10 06:43:15,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3812 [2024-06-10 06:43:17,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1420.98 | bwd_inner_microstep: 1420.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 06:43:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1253.18 | bwd_inner_microstep: 1253.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549 [2024-06-10 06:43:21,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.18 | bwd_microstep: 1440.73 | bwd_inner_microstep: 1440.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 06:43:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.47 | bwd_microstep: 1503.61 | bwd_inner_microstep: 1503.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3770 [2024-06-10 06:43:25,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.92 | bwd_microstep: 1606.98 | bwd_inner_microstep: 1606.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 06:43:27,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1442.49 | bwd_inner_microstep: 1442.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 06:43:29,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1649.90 | bwd_inner_microstep: 1649.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3738 [2024-06-10 06:43:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.64 [2024-06-10 06:43:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.20 | bwd_microstep: 1769.41 | bwd_inner_microstep: 1761.70 | bwd_allreduce_microstep: 7.66 | step_microstep: 38.32 [2024-06-10 06:43:32,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16324.05 | bwd: 43745.36 | bwd_inner: 43736.59 | bwd_allreduce: 7.98 | step: 40.15 {'loss': 1.3545, 'learning_rate': 3.697288514989502e-05, 'epoch': 0.2} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1929 [2024-06-10 06:43:33,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.24 | bwd_microstep: 882.22 | bwd_inner_microstep: 882.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985 [2024-06-10 06:43:35,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1610.84 | bwd_inner_microstep: 1610.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 06:43:37,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1553.80 | bwd_inner_microstep: 1553.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840 [2024-06-10 06:43:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1658.49 | bwd_inner_microstep: 1658.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 06:43:41,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763 [2024-06-10 06:43:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1643.16 | bwd_inner_microstep: 1643.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 06:43:45,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.72 | bwd_microstep: 1248.57 | bwd_inner_microstep: 1248.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2440 [2024-06-10 06:43:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.65 | bwd_microstep: 948.31 | bwd_inner_microstep: 948.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3408 [2024-06-10 06:43:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.18 | bwd_microstep: 1213.02 | bwd_inner_microstep: 1212.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619 [2024-06-10 06:43:50,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.80 | bwd_microstep: 1316.46 | bwd_inner_microstep: 1316.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 06:43:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.35 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-10 06:43:54,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 06:43:56,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1384.35 | bwd_inner_microstep: 1384.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723 [2024-06-10 06:43:58,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.42 | bwd_microstep: 1626.73 | bwd_inner_microstep: 1626.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 06:43:59,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 892.06 | bwd_inner_microstep: 892.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3535 [2024-06-10 06:44:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1577.32 | bwd_inner_microstep: 1577.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2106 [2024-06-10 06:44:02,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.02 | bwd_microstep: 825.28 | bwd_inner_microstep: 825.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 06:44:04,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 1534.97 | bwd_inner_microstep: 1534.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 06:44:06,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 06:44:08,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.46 | bwd_microstep: 1463.23 | bwd_inner_microstep: 1463.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3858 [2024-06-10 06:44:10,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1371.18 | bwd_inner_microstep: 1371.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641 [2024-06-10 06:44:12,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1318.45 | bwd_inner_microstep: 1318.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 06:44:14,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1453.47 | bwd_inner_microstep: 1453.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 06:44:16,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 06:44:18,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1509.84 | bwd_inner_microstep: 1509.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2393 [2024-06-10 06:44:19,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.65 | bwd_microstep: 946.29 | bwd_inner_microstep: 946.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627 [2024-06-10 06:44:22,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.91 | bwd_microstep: 1575.52 | bwd_inner_microstep: 1575.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276 [2024-06-10 06:44:23,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1070.99 | bwd_inner_microstep: 1070.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2236 [2024-06-10 06:44:24,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.32 | bwd_microstep: 898.91 | bwd_inner_microstep: 898.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3774 [2024-06-10 06:44:26,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.16 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 06:44:28,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1304.42 | bwd_inner_microstep: 1304.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 06:44:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 06:44:34,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.43 | bwd_microstep: 4668.07 | bwd_inner_microstep: 1955.57 | bwd_allreduce_microstep: 2712.45 | step_microstep: 38.78 [2024-06-10 06:44:34,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15912.37 | bwd: 45568.45 | bwd_inner: 42854.98 | bwd_allreduce: 2712.73 | step: 40.44 {'loss': 1.2517, 'learning_rate': 3.6953001157063686e-05, 'epoch': 0.2} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482 [2024-06-10 06:44:36,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.63 | bwd_microstep: 1571.90 | bwd_inner_microstep: 1571.70 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3920 [2024-06-10 06:44:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1484.57 | bwd_inner_microstep: 1484.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 06:44:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1343.80 | bwd_inner_microstep: 1343.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 06:44:42,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1476.83 | bwd_inner_microstep: 1476.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-10 06:44:43,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1312.95 | bwd_inner_microstep: 1312.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454 [2024-06-10 06:44:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 06:44:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.81 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 06:44:48,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 796.74 | bwd_inner_microstep: 796.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 06:44:50,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.11 | bwd_microstep: 1279.24 | bwd_inner_microstep: 1279.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 06:44:52,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1483.34 | bwd_inner_microstep: 1483.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2873 [2024-06-10 06:44:53,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.21 | bwd_microstep: 1175.35 | bwd_inner_microstep: 1175.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-10 06:44:56,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1583.98 | bwd_inner_microstep: 1583.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-10 06:44:58,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1596.20 | bwd_inner_microstep: 1596.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3658 [2024-06-10 06:45:00,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.75 | bwd_inner_microstep: 1383.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3682 [2024-06-10 06:45:02,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.89 | bwd_microstep: 1717.47 | bwd_inner_microstep: 1717.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-10 06:45:03,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.82 | bwd_microstep: 791.21 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 06:45:05,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 06:45:07,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1255.60 | bwd_inner_microstep: 1255.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 06:45:08,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.11 | bwd_microstep: 1290.29 | bwd_inner_microstep: 1290.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505 [2024-06-10 06:45:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.53 | bwd_microstep: 1225.82 | bwd_inner_microstep: 1225.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 06:45:12,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1412.67 | bwd_inner_microstep: 1412.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 06:45:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.77 | bwd_microstep: 1296.19 | bwd_inner_microstep: 1296.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-10 06:45:16,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.30 | bwd_microstep: 1307.90 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592 [2024-06-10 06:45:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.73 | bwd_microstep: 1310.64 | bwd_inner_microstep: 1310.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451 [2024-06-10 06:45:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.69 | bwd_microstep: 1382.44 | bwd_inner_microstep: 1382.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 06:45:21,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.56 | bwd_microstep: 977.18 | bwd_inner_microstep: 977.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3605 [2024-06-10 06:45:23,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.10 | bwd_microstep: 1373.79 | bwd_inner_microstep: 1373.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755 [2024-06-10 06:45:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1344.78 | bwd_inner_microstep: 1344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 06:45:27,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 1478.45 | bwd_inner_microstep: 1478.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 06:45:29,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.08 | bwd_microstep: 1444.49 | bwd_inner_microstep: 1444.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 06:45:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.44 | bwd_microstep: 1459.01 | bwd_inner_microstep: 1458.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 06:45:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.63 [2024-06-10 06:45:35,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.02 | bwd_microstep: 4279.75 | bwd_inner_microstep: 1691.65 | bwd_allreduce_microstep: 2588.06 | step_microstep: 38.60 [2024-06-10 06:45:35,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.73 | bwd: 45562.82 | bwd_inner: 42973.70 | bwd_allreduce: 2588.36 | step: 40.27 20%|█▉ | 345/1726 [6:02:04<23:17:10, 60.70s/it] 20%|██ | 346/1726 [6:03:04<23:10:47, 60.47s/it] 20%|██ | 346/1726 [6:03:04<23:10:47, 60.47s/it] 20%|██ | 347/1726 [6:04:05<23:13:00, 60.61s/it] 20%|██ | 347/1726 [6:04:05<23:13:00, 60.61s/it] 20%|██ | 348/1726 [6:05:08<23:29:36, 61.38s/it] 20%|██ | 348/1726 [6:05:08<23:29:36, 61.38s/it] 20%|██ | 349/1726 [6:06:08<23:22:05, 61.09s/it] 20%|██ | 349/1726 [6:06:08<23:22:05, 61.09s/it] 20%|██ | 350/1726 [6:07:10<23:26:09, 61.32s/it] 20%|██ | 350/1726 [6:07:10<23:26:09, 61.32s/it] 20%|██ | 351/1726 [6:08:12<23:29:41, 61.51s/it] {'loss': 1.2848, 'learning_rate': 3.693305745585041e-05, 'epoch': 0.2} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 06:45:37,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1392.61 | bwd_inner_microstep: 1392.53 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 06:45:39,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 06:45:41,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 06:45:43,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1342.36 | bwd_inner_microstep: 1342.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402 [2024-06-10 06:45:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.61 | bwd_microstep: 1209.79 | bwd_inner_microstep: 1209.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2962 [2024-06-10 06:45:46,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.47 | bwd_microstep: 1103.54 | bwd_inner_microstep: 1103.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 06:45:48,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1383.85 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775 [2024-06-10 06:45:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1475.26 | bwd_inner_microstep: 1475.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 06:45:52,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.44 | bwd_microstep: 1251.54 | bwd_inner_microstep: 1251.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2221 [2024-06-10 06:45:53,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.31 | bwd_microstep: 832.87 | bwd_inner_microstep: 832.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958 [2024-06-10 06:45:54,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.21 | bwd_microstep: 890.05 | bwd_inner_microstep: 890.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675 [2024-06-10 06:45:56,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.00 | bwd_microstep: 1485.20 | bwd_inner_microstep: 1485.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3706 [2024-06-10 06:45:59,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.62 | bwd_microstep: 1723.11 | bwd_inner_microstep: 1723.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430 [2024-06-10 06:46:00,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1281.01 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508 [2024-06-10 06:46:02,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.91 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 06:46:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-10 06:46:06,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1525.49 | bwd_inner_microstep: 1525.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 06:46:08,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1357.24 | bwd_inner_microstep: 1357.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 06:46:10,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1558.33 | bwd_inner_microstep: 1558.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-10 06:46:12,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1614.54 | bwd_inner_microstep: 1614.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613 [2024-06-10 06:46:14,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.16 | bwd_microstep: 1613.43 | bwd_inner_microstep: 1613.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-10 06:46:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1445.33 | bwd_inner_microstep: 1445.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2565 [2024-06-10 06:46:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.92 | bwd_microstep: 1071.09 | bwd_inner_microstep: 1071.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 06:46:20,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.37 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 06:46:22,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.61 | bwd_microstep: 1466.54 | bwd_inner_microstep: 1466.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 06:46:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 705.00 | bwd_inner_microstep: 704.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3726 [2024-06-10 06:46:25,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1615.11 | bwd_inner_microstep: 1615.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2241 [2024-06-10 06:46:27,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.43 | bwd_microstep: 930.25 | bwd_inner_microstep: 930.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592 [2024-06-10 06:46:29,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.17 | bwd_microstep: 1703.09 | bwd_inner_microstep: 1703.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-10 06:46:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 06:46:32,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 06:46:38,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 06:46:38,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.74 | bwd_microstep: 5464.95 | bwd_inner_microstep: 2167.85 | bwd_allreduce_microstep: 3297.05 | step_microstep: 38.75 [2024-06-10 06:46:38,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15925.00 | bwd: 46423.34 | bwd_inner: 43125.31 | bwd_allreduce: 3297.32 | step: 40.49 {'loss': 1.3025, 'learning_rate': 3.6913054116496797e-05, 'epoch': 0.2} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 06:46:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1365.97 | bwd_inner_microstep: 1365.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3898 [2024-06-10 06:46:42,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1681.65 | bwd_inner_microstep: 1681.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 06:46:44,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1449.57 | bwd_inner_microstep: 1449.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3873 [2024-06-10 06:46:46,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.29 | bwd_microstep: 1509.14 | bwd_inner_microstep: 1509.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 06:46:48,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1280.09 | bwd_inner_microstep: 1280.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3810 [2024-06-10 06:46:50,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.74 | bwd_microstep: 1512.59 | bwd_inner_microstep: 1512.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482 [2024-06-10 06:46:52,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.60 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601 [2024-06-10 06:46:54,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1309.75 | bwd_inner_microstep: 1309.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 06:46:55,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 790.09 | bwd_inner_microstep: 790.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502 [2024-06-10 06:46:57,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.10 | bwd_microstep: 1192.31 | bwd_inner_microstep: 1192.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 06:46:59,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1391.28 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 06:47:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1497.19 | bwd_inner_microstep: 1497.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2007 [2024-06-10 06:47:02,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.88 | bwd_microstep: 930.32 | bwd_inner_microstep: 930.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 06:47:04,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1658.15 | bwd_inner_microstep: 1658.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 06:47:06,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1508.81 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 06:47:08,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1403.04 | bwd_inner_microstep: 1403.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3468 [2024-06-10 06:47:10,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1500.35 | bwd_inner_microstep: 1500.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 06:47:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.70 | bwd_microstep: 790.14 | bwd_inner_microstep: 790.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632 [2024-06-10 06:47:13,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1538.05 | bwd_inner_microstep: 1538.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2190 [2024-06-10 06:47:15,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.49 | bwd_microstep: 770.11 | bwd_inner_microstep: 770.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3823 [2024-06-10 06:47:17,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.13 | bwd_microstep: 1638.79 | bwd_inner_microstep: 1638.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 06:47:18,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.94 | bwd_microstep: 980.17 | bwd_inner_microstep: 980.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 06:47:20,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.62 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299 [2024-06-10 06:47:22,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.72 | bwd_microstep: 979.41 | bwd_inner_microstep: 979.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 06:47:24,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1409.19 | bwd_inner_microstep: 1409.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 06:47:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.79 | bwd_microstep: 1376.81 | bwd_inner_microstep: 1376.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 06:47:27,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 1403.67 | bwd_inner_microstep: 1403.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-10 06:47:29,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.13 | bwd_microstep: 878.64 | bwd_inner_microstep: 878.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 06:47:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1411.44 | bwd_inner_microstep: 1411.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398 [2024-06-10 06:47:32,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1372.05 | bwd_inner_microstep: 1372.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 06:47:34,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 06:47:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.38 | optimizer_step: 6.58 [2024-06-10 06:47:41,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.62 | bwd_microstep: 6253.91 | bwd_inner_microstep: 2153.74 | bwd_allreduce_microstep: 4100.11 | step_microstep: 39.31 [2024-06-10 06:47:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15826.42 | bwd: 46904.69 | bwd_inner: 42803.62 | bwd_allreduce: 4100.35 | step: 40.91 {'loss': 1.2939, 'learning_rate': 3.689299120945451e-05, 'epoch': 0.2} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 06:47:43,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.44 | bwd_microstep: 1510.74 | bwd_inner_microstep: 1510.64 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 06:47:45,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2347 [2024-06-10 06:47:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.36 | bwd_microstep: 987.44 | bwd_inner_microstep: 987.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 913 [2024-06-10 06:47:47,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.37 | bwd_microstep: 373.23 | bwd_inner_microstep: 373.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 06:47:49,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1247.33 | bwd_inner_microstep: 1247.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3793 [2024-06-10 06:47:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3392 [2024-06-10 06:47:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1296.45 | bwd_inner_microstep: 1296.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3784 [2024-06-10 06:47:55,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.82 | bwd_microstep: 1576.00 | bwd_inner_microstep: 1575.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 06:47:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3735 [2024-06-10 06:47:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.63 | bwd_microstep: 1731.04 | bwd_inner_microstep: 1731.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 06:48:01,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1349.20 | bwd_inner_microstep: 1349.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3534 [2024-06-10 06:48:03,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.96 | bwd_microstep: 1355.66 | bwd_inner_microstep: 1355.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070 [2024-06-10 06:48:04,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.31 | bwd_microstep: 916.69 | bwd_inner_microstep: 916.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 06:48:06,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1348.40 | bwd_inner_microstep: 1348.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2455 [2024-06-10 06:48:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.52 | bwd_microstep: 1113.84 | bwd_inner_microstep: 1113.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3521 [2024-06-10 06:48:10,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.71 | bwd_microstep: 1691.85 | bwd_inner_microstep: 1691.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3889 [2024-06-10 06:48:12,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1682.87 | bwd_inner_microstep: 1682.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675 [2024-06-10 06:48:15,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.58 | bwd_microstep: 1825.82 | bwd_inner_microstep: 1825.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1149 [2024-06-10 06:48:15,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 182.96 | bwd_microstep: 478.43 | bwd_inner_microstep: 478.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505 [2024-06-10 06:48:17,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.45 | bwd_microstep: 1224.99 | bwd_inner_microstep: 1224.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 06:48:19,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1256.28 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100 [2024-06-10 06:48:20,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.26 | bwd_microstep: 923.98 | bwd_inner_microstep: 923.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545 [2024-06-10 06:48:22,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.13 | bwd_microstep: 1356.48 | bwd_inner_microstep: 1356.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602 [2024-06-10 06:48:24,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.95 | bwd_microstep: 1472.94 | bwd_inner_microstep: 1472.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733 [2024-06-10 06:48:26,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.92 | bwd_microstep: 1630.87 | bwd_inner_microstep: 1630.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2040 [2024-06-10 06:48:27,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.23 | bwd_microstep: 907.29 | bwd_inner_microstep: 907.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3461 [2024-06-10 06:48:29,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.48 | bwd_microstep: 1244.16 | bwd_inner_microstep: 1244.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 06:48:31,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1558.51 | bwd_inner_microstep: 1558.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928 [2024-06-10 06:48:32,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.93 | bwd_microstep: 698.41 | bwd_inner_microstep: 698.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 06:48:34,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.82 | bwd_microstep: 1512.85 | bwd_inner_microstep: 1512.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 06:48:36,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1483.16 | bwd_inner_microstep: 1483.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-10 06:48:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 06:48:43,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 5818.28 | bwd_inner_microstep: 1863.42 | bwd_allreduce_microstep: 3954.81 | step_microstep: 38.64 [2024-06-10 06:48:43,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15435.36 | bwd: 45681.80 | bwd_inner: 41725.98 | bwd_allreduce: 3955.10 | step: 40.34 {'loss': 1.3562, 'learning_rate': 3.6872868805385004e-05, 'epoch': 0.21} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389 [2024-06-10 06:48:44,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1235.43 | bwd_inner_microstep: 1235.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2450 [2024-06-10 06:48:46,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 949.98 | bwd_inner_microstep: 949.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939 [2024-06-10 06:48:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.82 | bwd_microstep: 1691.74 | bwd_inner_microstep: 1691.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894 [2024-06-10 06:48:50,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1584.93 | bwd_inner_microstep: 1584.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3783 [2024-06-10 06:48:52,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.34 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 06:48:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.37 | bwd_microstep: 804.75 | bwd_inner_microstep: 804.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 760 [2024-06-10 06:48:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.72 | bwd_microstep: 303.79 | bwd_inner_microstep: 303.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 06:48:55,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 797.85 | bwd_inner_microstep: 797.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 06:48:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734 [2024-06-10 06:48:59,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1535.59 | bwd_inner_microstep: 1535.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516 [2024-06-10 06:49:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1421.19 | bwd_inner_microstep: 1421.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3715 [2024-06-10 06:49:03,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.23 | bwd_microstep: 1666.86 | bwd_inner_microstep: 1666.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 06:49:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1374.30 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3378 [2024-06-10 06:49:07,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1487.20 | bwd_inner_microstep: 1487.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3698 [2024-06-10 06:49:09,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.13 | bwd_microstep: 1725.80 | bwd_inner_microstep: 1725.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 06:49:11,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1527.01 | bwd_inner_microstep: 1526.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 06:49:14,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.60 | bwd_microstep: 1614.15 | bwd_inner_microstep: 1614.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 06:49:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1387.00 | bwd_inner_microstep: 1386.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 06:49:18,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1513.38 | bwd_inner_microstep: 1513.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1269 [2024-06-10 06:49:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 174.80 | bwd_microstep: 430.28 | bwd_inner_microstep: 430.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630 [2024-06-10 06:49:20,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1479.90 | bwd_inner_microstep: 1479.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3474 [2024-06-10 06:49:22,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.72 | bwd_microstep: 1343.67 | bwd_inner_microstep: 1343.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-10 06:49:24,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.98 | bwd_microstep: 1637.42 | bwd_inner_microstep: 1637.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609 [2024-06-10 06:49:26,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1462.88 | bwd_inner_microstep: 1462.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729 [2024-06-10 06:49:28,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1528.43 | bwd_inner_microstep: 1528.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933 [2024-06-10 06:49:31,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1602.94 | bwd_inner_microstep: 1602.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467 [2024-06-10 06:49:32,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.92 | bwd_microstep: 1185.08 | bwd_inner_microstep: 1185.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083 [2024-06-10 06:49:34,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.26 | bwd_microstep: 918.22 | bwd_inner_microstep: 918.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3817 [2024-06-10 06:49:36,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1419.37 | bwd_inner_microstep: 1419.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 06:49:37,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.41 | bwd_microstep: 974.77 | bwd_inner_microstep: 974.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 06:49:39,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1509.97 | bwd_inner_microstep: 1509.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424 [2024-06-10 06:49:43,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 06:49:43,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 3335.73 | bwd_inner_microstep: 1754.65 | bwd_allreduce_microstep: 1581.03 | step_microstep: 38.61 [2024-06-10 06:49:43,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15788.01 | bwd: 44047.12 | bwd_inner: 42465.13 | bwd_allreduce: 1581.27 | step: 40.31 {'loss': 1.2431, 'learning_rate': 3.685268697515928e-05, 'epoch': 0.21} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3379 [2024-06-10 06:49:45,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1263.76 | bwd_inner_microstep: 1263.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4000 [2024-06-10 06:49:47,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.07 | bwd_microstep: 1609.73 | bwd_inner_microstep: 1609.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4342 [2024-06-10 06:49:49,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1602.82 | bwd_inner_microstep: 1602.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 06:49:50,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 795.75 | bwd_inner_microstep: 795.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 06:49:52,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.08 | bwd_microstep: 1515.79 | bwd_inner_microstep: 1515.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 06:49:54,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.61 | bwd_microstep: 1387.41 | bwd_inner_microstep: 1387.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 06:49:56,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.35 | bwd_microstep: 1252.33 | bwd_inner_microstep: 1252.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729 [2024-06-10 06:49:58,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.25 | bwd_microstep: 1436.51 | bwd_inner_microstep: 1436.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 06:50:00,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 946 [2024-06-10 06:50:00,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.12 | bwd_microstep: 378.91 | bwd_inner_microstep: 378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 873 [2024-06-10 06:50:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.51 | bwd_microstep: 399.25 | bwd_inner_microstep: 399.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 06:50:03,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.91 | bwd_microstep: 1341.76 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 06:50:05,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1342.51 | bwd_inner_microstep: 1342.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 06:50:06,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660 [2024-06-10 06:50:09,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.68 | bwd_microstep: 1724.28 | bwd_inner_microstep: 1724.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 06:50:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1646.11 | bwd_inner_microstep: 1646.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 06:50:13,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.53 | bwd_microstep: 1490.19 | bwd_inner_microstep: 1490.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102 [2024-06-10 06:50:14,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.26 | bwd_microstep: 823.13 | bwd_inner_microstep: 823.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 06:50:16,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1296.11 | bwd_inner_microstep: 1296.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 06:50:18,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1285.94 | bwd_inner_microstep: 1285.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 06:50:19,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.48 | bwd_microstep: 803.48 | bwd_inner_microstep: 803.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3614 [2024-06-10 06:50:21,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1314.01 | bwd_inner_microstep: 1313.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 06:50:22,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.48 | bwd_microstep: 1163.78 | bwd_inner_microstep: 1163.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 06:50:24,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1157.47 | bwd_inner_microstep: 1157.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488 [2024-06-10 06:50:26,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.91 | bwd_microstep: 1220.80 | bwd_inner_microstep: 1220.52 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.15 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3556 [2024-06-10 06:50:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.89 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743 [2024-06-10 06:50:30,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.54 | bwd_microstep: 1738.50 | bwd_inner_microstep: 1738.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 06:50:32,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.65 | bwd_microstep: 1606.70 | bwd_inner_microstep: 1606.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3797 [2024-06-10 06:50:34,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.88 | bwd_microstep: 1684.91 | bwd_inner_microstep: 1684.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 06:50:36,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1403.88 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597 [2024-06-10 06:50:39,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.94 | bwd_microstep: 1701.06 | bwd_inner_microstep: 1701.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240 [2024-06-10 06:50:43,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 06:50:43,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.26 | bwd_microstep: 4065.07 | bwd_inner_microstep: 1096.55 | bwd_allreduce_microstep: 2968.46 | step_microstep: 38.76 [2024-06-10 06:50:43,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15446.46 | bwd: 44295.16 | bwd_inner: 41325.60 | bwd_allreduce: 2968.79 | step: 40.55 {'loss': 1.2607, 'learning_rate': 3.683244578985763e-05, 'epoch': 0.21} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470 [2024-06-10 06:50:45,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.28 | bwd_microstep: 1493.83 | bwd_inner_microstep: 1493.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 06:50:47,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.84 | bwd_microstep: 1282.25 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4268 [2024-06-10 06:50:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.85 | bwd_microstep: 1667.64 | bwd_inner_microstep: 1667.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 06:50:51,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1343.64 | bwd_inner_microstep: 1343.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 06:50:53,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 06:50:55,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1352.70 | bwd_inner_microstep: 1352.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442 [2024-06-10 06:50:56,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.03 | bwd_microstep: 1158.74 | bwd_inner_microstep: 1158.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 06:50:58,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 06:51:00,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1388.76 | bwd_inner_microstep: 1388.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1900 [2024-06-10 06:51:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.67 | bwd_microstep: 748.81 | bwd_inner_microstep: 748.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960 [2024-06-10 06:51:02,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.54 | bwd_microstep: 832.15 | bwd_inner_microstep: 832.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2318 [2024-06-10 06:51:04,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.06 | bwd_microstep: 890.78 | bwd_inner_microstep: 890.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3681 [2024-06-10 06:51:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.59 | bwd_microstep: 1723.30 | bwd_inner_microstep: 1723.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3881 [2024-06-10 06:51:08,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.68 | bwd_microstep: 1749.96 | bwd_inner_microstep: 1749.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 06:51:11,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.59 | bwd_microstep: 1620.48 | bwd_inner_microstep: 1620.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 06:51:12,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028 [2024-06-10 06:51:14,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.33 | bwd_microstep: 904.35 | bwd_inner_microstep: 904.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 06:51:16,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1409.67 | bwd_inner_microstep: 1409.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 06:51:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.74 | bwd_microstep: 1401.98 | bwd_inner_microstep: 1401.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 06:51:20,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1555.55 | bwd_inner_microstep: 1555.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3517 [2024-06-10 06:51:22,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.45 | bwd_microstep: 1555.97 | bwd_inner_microstep: 1555.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830 [2024-06-10 06:51:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1366.56 | bwd_inner_microstep: 1366.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3049 [2024-06-10 06:51:25,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.73 | bwd_microstep: 1139.11 | bwd_inner_microstep: 1139.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-10 06:51:27,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110 [2024-06-10 06:51:29,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.69 | bwd_microstep: 924.22 | bwd_inner_microstep: 924.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 06:51:31,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.42 | bwd_microstep: 1660.99 | bwd_inner_microstep: 1660.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399 [2024-06-10 06:51:32,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.76 | bwd_microstep: 1001.43 | bwd_inner_microstep: 1001.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529 [2024-06-10 06:51:34,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-10 06:51:36,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.99 | bwd_microstep: 1330.15 | bwd_inner_microstep: 1330.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3672 [2024-06-10 06:51:38,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1482.42 | bwd_inner_microstep: 1482.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3532 [2024-06-10 06:51:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.92 | bwd_microstep: 1688.39 | bwd_inner_microstep: 1688.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 06:51:47,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.33 | optimizer_step: 6.58 [2024-06-10 06:51:47,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 5780.62 | bwd_inner_microstep: 1518.75 | bwd_allreduce_microstep: 4261.80 | step_microstep: 38.96 [2024-06-10 06:51:47,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16071.53 | bwd: 47373.67 | bwd_inner: 43110.94 | bwd_allreduce: 4262.04 | step: 40.57 20%|██ | 351/1726 [6:08:12<23:29:41, 61.51s/it] 20%|██ | 352/1726 [6:09:15<23:36:48, 61.87s/it] 20%|██ | 352/1726 [6:09:15<23:36:48, 61.87s/it] 20%|██ | 353/1726 [6:10:18<23:44:07, 62.23s/it] 20%|██ | 353/1726 [6:10:18<23:44:07, 62.23s/it] 21%|██ | 354/1726 [6:11:19<23:37:53, 62.01s/it] 21%|██ | 354/1726 [6:11:20<23:37:53, 62.01s/it] 21%|██ | 355/1726 [6:12:20<23:24:30, 61.47s/it] 21%|██ | 355/1726 [6:12:20<23:24:30, 61.47s/it] 21%|██ | 356/1726 [6:13:20<23:14:03, 61.05s/it] 21%|██ | 356/1726 [6:13:20<23:14:03, 61.05s/it] 21%|██ | 357/1726 [6:14{'loss': 1.2993, 'learning_rate': 3.6812145320769415e-05, 'epoch': 0.21} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 06:51:49,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1471.17 | bwd_inner_microstep: 1471.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925 [2024-06-10 06:51:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.16 | bwd_microstep: 697.40 | bwd_inner_microstep: 697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3866 [2024-06-10 06:51:52,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1301.33 | bwd_inner_microstep: 1301.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2343 [2024-06-10 06:51:53,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.74 | bwd_microstep: 923.76 | bwd_inner_microstep: 923.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 06:51:55,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1379.79 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 06:51:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.92 | bwd_microstep: 1543.40 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2475 [2024-06-10 06:51:58,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.99 | bwd_microstep: 1051.48 | bwd_inner_microstep: 1051.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 06:52:00,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1249.37 | bwd_inner_microstep: 1249.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 06:52:02,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.81 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 06:52:04,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1249.32 | bwd_inner_microstep: 1249.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 06:52:05,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.37 | bwd_microstep: 806.01 | bwd_inner_microstep: 805.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 06:52:07,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1448.92 | bwd_inner_microstep: 1448.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3514 [2024-06-10 06:52:09,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1337.60 | bwd_inner_microstep: 1337.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 06:52:11,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.61 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 06:52:13,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1312.36 | bwd_inner_microstep: 1312.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3686 [2024-06-10 06:52:15,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.19 | bwd_microstep: 1785.26 | bwd_inner_microstep: 1785.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3446 [2024-06-10 06:52:17,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1305.27 | bwd_inner_microstep: 1305.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967 [2024-06-10 06:52:18,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.97 | bwd_microstep: 704.72 | bwd_inner_microstep: 704.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952 [2024-06-10 06:52:20,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.64 | bwd_microstep: 1700.70 | bwd_inner_microstep: 1700.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 06:52:22,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290 [2024-06-10 06:52:24,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.07 | bwd_microstep: 915.33 | bwd_inner_microstep: 915.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076 [2024-06-10 06:52:25,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.44 | bwd_microstep: 916.73 | bwd_inner_microstep: 916.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 06:52:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.84 | bwd_microstep: 1661.25 | bwd_inner_microstep: 1661.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 06:52:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1399.78 | bwd_inner_microstep: 1399.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 06:52:31,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1292.08 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3552 [2024-06-10 06:52:33,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.24 | bwd_microstep: 1204.69 | bwd_inner_microstep: 1204.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566 [2024-06-10 06:52:34,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.97 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3568 [2024-06-10 06:52:37,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.67 | bwd_microstep: 1631.75 | bwd_inner_microstep: 1631.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596 [2024-06-10 06:52:39,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1466.89 | bwd_inner_microstep: 1466.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-10 06:52:41,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.07 | bwd_microstep: 1506.64 | bwd_inner_microstep: 1506.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 06:52:43,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.89 | bwd_microstep: 1637.34 | bwd_inner_microstep: 1637.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2378 [2024-06-10 06:52:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 06:52:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.45 | bwd_microstep: 2048.73 | bwd_inner_microstep: 1056.14 | bwd_allreduce_microstep: 992.53 | step_microstep: 38.60 [2024-06-10 06:52:46,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15600.75 | bwd: 42751.17 | bwd_inner: 41757.73 | bwd_allreduce: 992.76 | step: 40.18 {'loss': 1.2862, 'learning_rate': 3.679178563939278e-05, 'epoch': 0.21} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 06:52:47,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1277.38 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 06:52:49,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2321 [2024-06-10 06:52:51,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.48 | bwd_microstep: 984.58 | bwd_inner_microstep: 984.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 06:52:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.00 | bwd_inner_microstep: 1393.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4103 [2024-06-10 06:52:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.25 | bwd_microstep: 1734.29 | bwd_inner_microstep: 1734.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 06:52:57,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.44 | bwd_microstep: 1179.74 | bwd_inner_microstep: 1179.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 06:52:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1390.06 | bwd_inner_microstep: 1390.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423 [2024-06-10 06:53:00,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.42 | bwd_microstep: 1155.47 | bwd_inner_microstep: 1155.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 06:53:02,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1406.86 | bwd_inner_microstep: 1406.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-10 06:53:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.20 | bwd_microstep: 1420.21 | bwd_inner_microstep: 1420.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-10 06:53:06,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1625.40 | bwd_inner_microstep: 1625.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 06:53:08,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1317.08 | bwd_inner_microstep: 1317.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561 [2024-06-10 06:53:10,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1592.88 | bwd_inner_microstep: 1592.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3665 [2024-06-10 06:53:12,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.00 | bwd_microstep: 1653.24 | bwd_inner_microstep: 1653.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 06:53:14,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.21 | bwd_microstep: 1389.18 | bwd_inner_microstep: 1389.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952 [2024-06-10 06:53:15,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 699.20 | bwd_inner_microstep: 699.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 06:53:17,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1488.38 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628 [2024-06-10 06:53:20,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1539.61 | bwd_inner_microstep: 1539.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 06:53:22,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1497.77 | bwd_inner_microstep: 1497.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 06:53:23,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 06:53:25,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.47 | bwd_microstep: 1261.15 | bwd_inner_microstep: 1261.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 06:53:27,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.76 | bwd_microstep: 1188.63 | bwd_inner_microstep: 1188.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 06:53:29,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1403.10 | bwd_inner_microstep: 1403.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 06:53:31,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 06:53:33,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.87 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 06:53:35,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1503.23 | bwd_inner_microstep: 1503.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2878 [2024-06-10 06:53:37,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.83 | bwd_microstep: 1185.14 | bwd_inner_microstep: 1185.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 06:53:38,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 730.05 | bwd_inner_microstep: 730.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818 [2024-06-10 06:53:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.81 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 06:53:42,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1347.66 | bwd_inner_microstep: 1347.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774 [2024-06-10 06:53:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.10 | bwd_microstep: 1741.94 | bwd_inner_microstep: 1741.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 06:53:47,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 06:53:47,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.93 | bwd_microstep: 1658.80 | bwd_inner_microstep: 1650.75 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.08 [2024-06-10 06:53:47,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16495.37 | bwd: 44129.46 | bwd_inner: 44120.41 | bwd_allreduce: 8.30 | step: 39.74 {'loss': 1.2611, 'learning_rate': 3.6771366817434416e-05, 'epoch': 0.21} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3471 [2024-06-10 06:53:49,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1576.14 | bwd_inner_microstep: 1576.06 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 06:53:51,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1407.34 | bwd_inner_microstep: 1407.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 06:53:53,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1351.18 | bwd_inner_microstep: 1351.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3863 [2024-06-10 06:53:55,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1564.31 | bwd_inner_microstep: 1564.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 06:53:56,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.20 | bwd_microstep: 1191.42 | bwd_inner_microstep: 1191.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 06:53:58,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1544.16 | bwd_inner_microstep: 1544.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 06:54:00,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 1359.39 | bwd_inner_microstep: 1359.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-10 06:54:02,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1530.35 | bwd_inner_microstep: 1530.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-10 06:54:04,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967 [2024-06-10 06:54:06,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.76 | bwd_microstep: 861.45 | bwd_inner_microstep: 861.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3489 [2024-06-10 06:54:08,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.74 | bwd_microstep: 1615.55 | bwd_inner_microstep: 1615.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 06:54:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1514.79 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500 [2024-06-10 06:54:12,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.00 | bwd_microstep: 1585.81 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3985 [2024-06-10 06:54:14,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1540.83 | bwd_inner_microstep: 1540.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399 [2024-06-10 06:54:16,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.94 | bwd_microstep: 1369.73 | bwd_inner_microstep: 1369.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3531 [2024-06-10 06:54:18,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.90 | bwd_microstep: 1559.82 | bwd_inner_microstep: 1559.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 06:54:20,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1349.37 | bwd_inner_microstep: 1349.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649 [2024-06-10 06:54:22,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.94 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-10 06:54:23,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 801.26 | bwd_inner_microstep: 801.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 06:54:25,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1404.75 | bwd_inner_microstep: 1404.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 06:54:27,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.42 | bwd_microstep: 1408.12 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3553 [2024-06-10 06:54:29,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.08 | bwd_microstep: 1233.97 | bwd_inner_microstep: 1233.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294 [2024-06-10 06:54:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.76 | bwd_microstep: 980.75 | bwd_inner_microstep: 980.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-10 06:54:32,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.76 | bwd_microstep: 1453.46 | bwd_inner_microstep: 1453.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 06:54:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1502.90 | bwd_inner_microstep: 1502.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-10 06:54:36,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1443.86 | bwd_inner_microstep: 1443.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3307 [2024-06-10 06:54:38,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1325.05 | bwd_inner_microstep: 1325.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2260 [2024-06-10 06:54:40,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.03 | bwd_microstep: 1067.81 | bwd_inner_microstep: 1067.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 06:54:42,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1394.53 | bwd_inner_microstep: 1394.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 06:54:44,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.96 | bwd_microstep: 1598.14 | bwd_inner_microstep: 1598.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 06:54:46,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1404.62 | bwd_inner_microstep: 1404.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-10 06:54:48,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.64 [2024-06-10 06:54:48,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.89 | bwd_microstep: 1502.33 | bwd_inner_microstep: 1494.65 | bwd_allreduce_microstep: 7.64 | step_microstep: 38.40 [2024-06-10 06:54:48,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16559.26 | bwd: 44373.85 | bwd_inner: 44365.23 | bwd_allreduce: 7.91 | step: 40.15 {'loss': 1.2731, 'learning_rate': 3.67508889268093e-05, 'epoch': 0.21} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 06:54:50,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1369.90 | bwd_inner_microstep: 1369.79 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3537 [2024-06-10 06:54:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.46 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3906 [2024-06-10 06:54:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1452.94 | bwd_inner_microstep: 1452.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 06:54:55,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1315.49 | bwd_inner_microstep: 1315.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798 [2024-06-10 06:54:58,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.30 | bwd_inner_microstep: 1550.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-10 06:55:00,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 06:55:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.93 | bwd_microstep: 1251.65 | bwd_inner_microstep: 1251.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1979 [2024-06-10 06:55:02,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.95 | bwd_microstep: 707.21 | bwd_inner_microstep: 707.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-10 06:55:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1536.48 | bwd_inner_microstep: 1536.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-10 06:55:05,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.42 | bwd_microstep: 699.76 | bwd_inner_microstep: 699.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 06:55:07,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 06:55:09,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.09 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 06:55:11,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1375.70 | bwd_inner_microstep: 1375.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674 [2024-06-10 06:55:13,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.63 | bwd_microstep: 1552.90 | bwd_inner_microstep: 1552.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 06:55:15,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2145 [2024-06-10 06:55:17,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.43 | bwd_microstep: 1042.12 | bwd_inner_microstep: 1042.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3449 [2024-06-10 06:55:19,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.61 | bwd_microstep: 1548.17 | bwd_inner_microstep: 1548.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 06:55:21,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 06:55:23,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1438.24 | bwd_inner_microstep: 1438.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-10 06:55:25,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.95 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1483.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-10 06:55:27,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1404.86 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461 [2024-06-10 06:55:29,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.08 | bwd_microstep: 1436.31 | bwd_inner_microstep: 1436.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3617 [2024-06-10 06:55:31,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1472.36 | bwd_inner_microstep: 1472.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 06:55:33,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1559.77 | bwd_inner_microstep: 1559.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 06:55:35,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.86 | bwd_microstep: 1392.93 | bwd_inner_microstep: 1392.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3552 [2024-06-10 06:55:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1443.33 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442 [2024-06-10 06:55:39,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1452.75 | bwd_inner_microstep: 1452.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 06:55:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1648.48 | bwd_inner_microstep: 1648.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-10 06:55:43,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1391.98 | bwd_inner_microstep: 1391.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964 [2024-06-10 06:55:44,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.15 | bwd_microstep: 704.42 | bwd_inner_microstep: 704.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3694 [2024-06-10 06:55:46,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1334.88 | bwd_inner_microstep: 1334.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 06:55:50,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 06:55:50,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 3520.31 | bwd_inner_microstep: 1439.67 | bwd_allreduce_microstep: 2080.59 | step_microstep: 38.72 [2024-06-10 06:55:50,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16193.20 | bwd: 45395.58 | bwd_inner: 43313.97 | bwd_allreduce: 2080.88 | step: 40.52 {'loss': 1.2967, 'learning_rate': 3.6730352039640476e-05, 'epoch': 0.21} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 06:55:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1240.72 | bwd_inner_microstep: 1240.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 06:55:53,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1244.63 | bwd_inner_microstep: 1244.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 06:55:54,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.99 | bwd_microstep: 786.44 | bwd_inner_microstep: 786.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 06:55:56,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1275.44 | bwd_inner_microstep: 1275.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 06:55:58,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1273.49 | bwd_inner_microstep: 1273.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-10 06:56:00,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.02 | bwd_microstep: 1639.35 | bwd_inner_microstep: 1639.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 06:56:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.26 | bwd_microstep: 1628.93 | bwd_inner_microstep: 1628.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 06:56:04,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-10 06:56:06,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.91 | bwd_microstep: 1152.80 | bwd_inner_microstep: 1152.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 06:56:08,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1412.56 | bwd_inner_microstep: 1412.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1899 [2024-06-10 06:56:09,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 686.91 | bwd_inner_microstep: 686.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 06:56:10,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 792.53 | bwd_inner_microstep: 792.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656 [2024-06-10 06:56:12,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1510.68 | bwd_inner_microstep: 1510.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-10 06:56:14,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3508 [2024-06-10 06:56:16,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.85 | bwd_microstep: 1686.01 | bwd_inner_microstep: 1685.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3511 [2024-06-10 06:56:18,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1513.51 | bwd_inner_microstep: 1513.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3498 [2024-06-10 06:56:20,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1457.14 | bwd_inner_microstep: 1457.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 06:56:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1281.48 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969 [2024-06-10 06:56:23,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.99 | bwd_microstep: 857.97 | bwd_inner_microstep: 857.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 06:56:26,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.82 | bwd_microstep: 1623.66 | bwd_inner_microstep: 1623.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-10 06:56:27,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1424.33 | bwd_inner_microstep: 1424.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 06:56:30,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.89 | bwd_microstep: 1660.61 | bwd_inner_microstep: 1660.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688 [2024-06-10 06:56:32,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.83 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1359 [2024-06-10 06:56:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.59 | bwd_microstep: 519.21 | bwd_inner_microstep: 519.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3700 [2024-06-10 06:56:35,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.19 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 06:56:36,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1254.66 | bwd_inner_microstep: 1254.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3675 [2024-06-10 06:56:39,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1571.67 | bwd_inner_microstep: 1571.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535 [2024-06-10 06:56:41,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1591.23 | bwd_inner_microstep: 1591.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 06:56:43,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2964 [2024-06-10 06:56:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.23 | bwd_microstep: 1202.70 | bwd_inner_microstep: 1202.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 06:56:46,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 789.94 | bwd_inner_microstep: 789.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887 [2024-06-10 06:56:51,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.22 | optimizer_step: 6.58 [2024-06-10 06:56:51,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 4607.93 | bwd_inner_microstep: 1785.31 | bwd_allreduce_microstep: 2822.57 | step_microstep: 38.68 [2024-06-10 06:56:51,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15713.54 | bwd: 45045.43 | bwd_inner: 42221.95 | bwd_allreduce: 2822.79 | step: 40.45 {'loss': 1.2592, 'learning_rate': 3.6709756228258735e-05, 'epoch': 0.21} :24<23:31:48, 61.88s/it] 21%|██ | 357/1726 [6:14:24<23:31:48, 61.88s/it] 21%|██ | 358/1726 [6:15:22<23:09:00, 60.92s/it] 21%|██ | 358/1726 [6:15:22<23:09:00, 60.92s/it] 21%|██ | 359/1726 [6:16:23<23:08:21, 60.94s/it] 21%|██ | 359/1726 [6:16:23<23:08:21, 60.94s/it] 21%|██ | 360/1726 [6:17:25<23:09:49, 61.05s/it] 21%|██ | 360/1726 [6:17:25<23:09:49, 61.05s/it] 21%|██ | 361/1726 [6:18:26<23:14:55, 61.32s/it] 21%|██ | 361/1726 [6:18:27<23:14:55, 61.32s/it] 21%|██ | 362/1726 [6:19:28<23:12:30, 61.25s/it] 21%|██ | 362/1726 [6:19:28<23:12:30, 61.25s/idynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-10 06:56:53,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.41 | bwd_microstep: 1425.01 | bwd_inner_microstep: 1424.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 06:56:55,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1347.33 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 06:56:57,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1546.70 | bwd_inner_microstep: 1546.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 06:56:59,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1247.40 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930 [2024-06-10 06:57:00,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.44 | bwd_microstep: 760.58 | bwd_inner_microstep: 760.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 06:57:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.47 | bwd_microstep: 1633.67 | bwd_inner_microstep: 1633.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415 [2024-06-10 06:57:04,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1185.74 | bwd_inner_microstep: 1185.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 06:57:05,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-10 06:57:08,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.29 | bwd_microstep: 1524.81 | bwd_inner_microstep: 1524.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039 [2024-06-10 06:57:09,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.77 | bwd_microstep: 810.61 | bwd_inner_microstep: 810.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3487 [2024-06-10 06:57:11,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2610 [2024-06-10 06:57:12,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.80 | bwd_microstep: 1002.73 | bwd_inner_microstep: 1002.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 06:57:14,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1418.96 | bwd_inner_microstep: 1418.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 06:57:16,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.20 | bwd_microstep: 1626.86 | bwd_inner_microstep: 1626.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3451 [2024-06-10 06:57:18,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1620.54 | bwd_inner_microstep: 1620.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 06:57:21,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1593.35 | bwd_inner_microstep: 1593.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 06:57:23,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 06:57:24,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.79 | bwd_microstep: 1254.97 | bwd_inner_microstep: 1254.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 06:57:26,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1252.53 | bwd_inner_microstep: 1252.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1972 [2024-06-10 06:57:27,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.87 | bwd_microstep: 861.40 | bwd_inner_microstep: 861.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 06:57:29,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1251.35 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 06:57:31,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1183.61 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3673 [2024-06-10 06:57:32,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1230.82 | bwd_inner_microstep: 1230.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 06:57:34,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1505.47 | bwd_inner_microstep: 1505.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 06:57:37,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.12 | bwd_microstep: 1558.30 | bwd_inner_microstep: 1558.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3550 [2024-06-10 06:57:38,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.13 | bwd_microstep: 1328.11 | bwd_inner_microstep: 1328.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581 [2024-06-10 06:57:40,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1365.67 | bwd_inner_microstep: 1365.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 06:57:42,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.09 | bwd_microstep: 1256.37 | bwd_inner_microstep: 1256.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466 [2024-06-10 06:57:44,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1312.84 | bwd_inner_microstep: 1312.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2981 [2024-06-10 06:57:45,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.59 | bwd_microstep: 1141.81 | bwd_inner_microstep: 1141.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3762 [2024-06-10 06:57:48,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.88 | bwd_microstep: 1601.05 | bwd_inner_microstep: 1601.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441 [2024-06-10 06:57:52,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.57 [2024-06-10 06:57:52,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.20 | bwd_microstep: 3950.91 | bwd_inner_microstep: 1568.55 | bwd_allreduce_microstep: 2382.31 | step_microstep: 38.80 [2024-06-10 06:57:52,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15924.10 | bwd: 44966.33 | bwd_inner: 42583.00 | bwd_allreduce: 2382.61 | step: 40.47 {'loss': 1.3242, 'learning_rate': 3.6689101565202416e-05, 'epoch': 0.21} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039 [2024-06-10 06:57:53,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.41 | bwd_microstep: 900.26 | bwd_inner_microstep: 900.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 06:57:55,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.81 | bwd_microstep: 1414.32 | bwd_inner_microstep: 1414.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 06:57:57,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1354.13 | bwd_inner_microstep: 1354.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481 [2024-06-10 06:57:59,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.15 | bwd_microstep: 1348.65 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 06:58:01,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1516.60 | bwd_inner_microstep: 1516.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 06:58:03,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1188.67 | bwd_inner_microstep: 1188.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 06:58:05,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1286.77 | bwd_inner_microstep: 1286.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870 [2024-06-10 06:58:06,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.08 | bwd_microstep: 680.42 | bwd_inner_microstep: 680.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 06:58:08,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1536.72 | bwd_inner_microstep: 1536.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 06:58:09,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.74 | bwd_microstep: 684.78 | bwd_inner_microstep: 684.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 06:58:11,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.95 | bwd_microstep: 1392.84 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3703 [2024-06-10 06:58:13,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1553.04 | bwd_inner_microstep: 1553.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692 [2024-06-10 06:58:15,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.38 | bwd_microstep: 1460.51 | bwd_inner_microstep: 1460.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467 [2024-06-10 06:58:17,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1572.51 | bwd_inner_microstep: 1572.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686 [2024-06-10 06:58:19,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.39 | bwd_microstep: 1726.61 | bwd_inner_microstep: 1726.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 06:58:21,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 06:58:23,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.08 | bwd_microstep: 1497.23 | bwd_inner_microstep: 1497.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2755 [2024-06-10 06:58:25,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.56 | bwd_microstep: 1173.00 | bwd_inner_microstep: 1172.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 06:58:26,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.50 | bwd_microstep: 881.47 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 06:58:28,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 06:58:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 975.38 | bwd_inner_microstep: 975.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2701 [2024-06-10 06:58:31,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.60 | bwd_microstep: 1129.74 | bwd_inner_microstep: 1129.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3153 [2024-06-10 06:58:32,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805 [2024-06-10 06:58:35,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1516.51 | bwd_inner_microstep: 1516.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2075 [2024-06-10 06:58:36,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.45 | bwd_microstep: 1012.00 | bwd_inner_microstep: 1011.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3535 [2024-06-10 06:58:38,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.19 | bwd_microstep: 1688.68 | bwd_inner_microstep: 1688.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 06:58:40,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 06:58:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-10 06:58:43,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.30 | bwd_microstep: 976.19 | bwd_inner_microstep: 976.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 06:58:45,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076 [2024-06-10 06:58:46,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.00 | bwd_microstep: 822.43 | bwd_inner_microstep: 822.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2242 [2024-06-10 06:58:53,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 06:58:53,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.85 | bwd_microstep: 6635.06 | bwd_inner_microstep: 876.33 | bwd_allreduce_microstep: 5758.68 | step_microstep: 38.61 [2024-06-10 06:58:53,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14976.92 | bwd: 45790.66 | bwd_inner: 40030.96 | bwd_allreduce: 5758.96 | step: 40.30 {'loss': 1.2691, 'learning_rate': 3.6668388123217154e-05, 'epoch': 0.21} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484 [2024-06-10 06:58:55,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1333.27 | bwd_inner_microstep: 1333.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3949 [2024-06-10 06:58:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.15 | bwd_microstep: 1699.32 | bwd_inner_microstep: 1699.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 06:58:59,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.49 | bwd_microstep: 1277.48 | bwd_inner_microstep: 1277.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482 [2024-06-10 06:59:01,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1411.64 | bwd_inner_microstep: 1411.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-10 06:59:03,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.25 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 06:59:05,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1385.21 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 06:59:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1395.30 | bwd_inner_microstep: 1395.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-10 06:59:08,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.96 | bwd_microstep: 684.71 | bwd_inner_microstep: 684.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2197 [2024-06-10 06:59:09,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.46 | bwd_microstep: 829.26 | bwd_inner_microstep: 829.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948 [2024-06-10 06:59:10,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.67 | bwd_microstep: 827.35 | bwd_inner_microstep: 827.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907 [2024-06-10 06:59:11,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.46 | bwd_microstep: 687.20 | bwd_inner_microstep: 687.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2934 [2024-06-10 06:59:13,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.81 | bwd_microstep: 1165.21 | bwd_inner_microstep: 1165.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1949 [2024-06-10 06:59:14,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.63 | bwd_microstep: 851.27 | bwd_inner_microstep: 851.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3637 [2024-06-10 06:59:16,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1461.98 | bwd_inner_microstep: 1461.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434 [2024-06-10 06:59:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641 [2024-06-10 06:59:20,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1707.97 | bwd_inner_microstep: 1707.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3513 [2024-06-10 06:59:23,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.97 | bwd_microstep: 1681.84 | bwd_inner_microstep: 1681.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3661 [2024-06-10 06:59:25,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.80 | bwd_microstep: 1589.84 | bwd_inner_microstep: 1589.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 06:59:27,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.18 | bwd_microstep: 1297.93 | bwd_inner_microstep: 1297.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 06:59:29,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1492.91 | bwd_inner_microstep: 1492.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 06:59:31,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1557.11 | bwd_inner_microstep: 1557.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3586 [2024-06-10 06:59:33,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1309.18 | bwd_inner_microstep: 1309.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689 [2024-06-10 06:59:35,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 06:59:36,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1375.14 | bwd_inner_microstep: 1375.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 06:59:38,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.51 | bwd_microstep: 1253.54 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 06:59:40,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.41 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426 [2024-06-10 06:59:42,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.15 | bwd_microstep: 1313.88 | bwd_inner_microstep: 1313.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-10 06:59:44,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.08 | bwd_microstep: 1759.96 | bwd_inner_microstep: 1759.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2054 [2024-06-10 06:59:46,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.27 | bwd_microstep: 946.37 | bwd_inner_microstep: 946.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 06:59:48,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.62 | bwd_microstep: 1544.96 | bwd_inner_microstep: 1544.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 06:59:50,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.07 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 06:59:53,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 06:59:53,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 2879.88 | bwd_inner_microstep: 1567.99 | bwd_allreduce_microstep: 1311.83 | step_microstep: 38.55 [2024-06-10 06:59:53,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15841.34 | bwd: 43767.22 | bwd_inner: 42454.38 | bwd_allreduce: 1312.11 | step: 40.15 {'loss': 1.2989, 'learning_rate': 3.664761597525557e-05, 'epoch': 0.21} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 06:59:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.58 | bwd_microstep: 1467.46 | bwd_inner_microstep: 1467.39 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 06:59:57,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.02 | bwd_microstep: 1384.57 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 06:59:59,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1480.65 | bwd_inner_microstep: 1480.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3585 [2024-06-10 07:00:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.03 | bwd_microstep: 1238.62 | bwd_inner_microstep: 1238.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 07:00:03,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 07:00:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.62 | bwd_microstep: 798.92 | bwd_inner_microstep: 798.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 07:00:05,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761 [2024-06-10 07:00:07,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1341.79 | bwd_inner_microstep: 1341.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 07:00:09,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1343.85 | bwd_inner_microstep: 1343.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 07:00:11,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1524.40 | bwd_inner_microstep: 1524.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 07:00:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.06 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 07:00:15,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1495.03 | bwd_inner_microstep: 1495.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 07:00:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1279.38 | bwd_inner_microstep: 1279.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-10 07:00:19,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.24 | bwd_microstep: 1584.30 | bwd_inner_microstep: 1584.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-10 07:00:21,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.30 | bwd_microstep: 1427.98 | bwd_inner_microstep: 1427.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-10 07:00:22,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.05 | bwd_microstep: 804.59 | bwd_inner_microstep: 804.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3682 [2024-06-10 07:00:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1553.07 | bwd_inner_microstep: 1553.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 07:00:26,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1396.41 | bwd_inner_microstep: 1396.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3640 [2024-06-10 07:00:28,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.83 | bwd_microstep: 1709.94 | bwd_inner_microstep: 1709.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 07:00:30,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 07:00:32,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 07:00:34,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.57 | bwd_microstep: 1278.15 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 07:00:36,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1401.45 | bwd_inner_microstep: 1401.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 07:00:38,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.76 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 07:00:40,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1314.45 | bwd_inner_microstep: 1314.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2447 [2024-06-10 07:00:41,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.93 | bwd_microstep: 948.54 | bwd_inner_microstep: 948.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248 [2024-06-10 07:00:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.77 | bwd_microstep: 969.58 | bwd_inner_microstep: 969.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-10 07:00:45,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1508.33 | bwd_inner_microstep: 1508.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2282 [2024-06-10 07:00:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.88 | bwd_microstep: 911.35 | bwd_inner_microstep: 911.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-10 07:00:47,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.38 | bwd_microstep: 729.64 | bwd_inner_microstep: 729.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2064 [2024-06-10 07:00:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.65 | bwd_microstep: 846.38 | bwd_inner_microstep: 846.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3746 [2024-06-10 07:00:55,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 07:00:55,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.13 | bwd_microstep: 6186.55 | bwd_inner_microstep: 1600.69 | bwd_allreduce_microstep: 4585.81 | step_microstep: 38.56 [2024-06-10 07:00:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15459.64 | bwd: 45883.72 | bwd_inner: 41296.94 | bwd_allreduce: 4586.07 | step: 40.23 {'loss': 1.3355, 'learning_rate': 3.662678519447706e-05, 'epoch': 0.21} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 07:00:57,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1374.39 | bwd_inner_microstep: 1374.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4039 [2024-06-10 07:00:59,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.08 | bwd_microstep: 1713.58 | bwd_inner_microstep: 1713.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-10 07:01:01,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.50 | bwd_microstep: 1683.98 | bwd_inner_microstep: 1683.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3863 [2024-06-10 07:01:03,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1368.18 | bwd_inner_microstep: 1368.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 07:01:06,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1555.61 | bwd_inner_microstep: 1555.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 07:01:07,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.58 | bwd_microstep: 793.72 | bwd_inner_microstep: 793.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 07:01:09,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.11 | bwd_microstep: 1651.39 | bwd_inner_microstep: 1651.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3234 [2024-06-10 07:01:11,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.23 | bwd_microstep: 1180.29 | bwd_inner_microstep: 1180.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3511 [2024-06-10 07:01:12,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.01 | bwd_microstep: 1225.85 | bwd_inner_microstep: 1225.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-10 07:01:13,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.64 | bwd_microstep: 800.78 | bwd_inner_microstep: 800.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 07:01:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669 [2024-06-10 07:01:17,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.96 | bwd_microstep: 1523.35 | bwd_inner_microstep: 1523.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3415 [2024-06-10 07:01:19,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.20 | bwd_microstep: 1370.05 | bwd_inner_microstep: 1370.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3454 [2024-06-10 07:01:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.64 | bwd_microstep: 1336.57 | bwd_inner_microstep: 1336.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3983 [2024-06-10 07:01:23,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1375.35 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 07:01:25,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1480.56 | bwd_inner_microstep: 1480.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645 [2024-06-10 07:01:27,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.34 | bwd_microstep: 1644.06 | bwd_inner_microstep: 1644.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-10 07:01:29,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1528.93 | bwd_inner_microstep: 1528.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2020 [2024-06-10 07:01:30,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.58 | bwd_microstep: 719.40 | bwd_inner_microstep: 719.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3599 [2024-06-10 07:01:32,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1474.49 | bwd_inner_microstep: 1474.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 07:01:34,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1357.35 | bwd_inner_microstep: 1357.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-10 07:01:35,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 805.76 | bwd_inner_microstep: 805.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3686 [2024-06-10 07:01:37,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1391.59 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3711 [2024-06-10 07:01:39,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1239.41 | bwd_inner_microstep: 1239.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031 [2024-06-10 07:01:40,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.52 | bwd_microstep: 809.59 | bwd_inner_microstep: 809.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192 [2024-06-10 07:01:41,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.25 | bwd_microstep: 798.28 | bwd_inner_microstep: 798.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 07:01:43,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.06 | bwd_microstep: 1304.59 | bwd_inner_microstep: 1304.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3532 [2024-06-10 07:01:45,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.16 | bwd_microstep: 1559.96 | bwd_inner_microstep: 1559.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2929 [2024-06-10 07:01:47,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.66 | bwd_microstep: 1192.32 | bwd_inner_microstep: 1192.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 07:01:49,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.88 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 07:01:51,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.23 | bwd_microstep: 1557.16 | bwd_inner_microstep: 1557.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3437 [2024-06-10 07:01:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 07:01:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.12 | bwd_microstep: 3306.65 | bwd_inner_microstep: 1617.45 | bwd_allreduce_microstep: 1689.15 | step_microstep: 38.69 [2024-06-10 07:01:55,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15729.05 | bwd: 43887.47 | bwd_inner: 42197.38 | bwd_allreduce: 1689.38 | step: 40.37 {'loss': 1.3645, 'learning_rate': 3.6605895854247534e-05, 'epoch': 0.21} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846 [2024-06-10 07:01:56,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 667.78 | bwd_inner_microstep: 667.64 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947 [2024-06-10 07:01:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.08 | bwd_microstep: 1591.74 | bwd_inner_microstep: 1591.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847 [2024-06-10 07:02:00,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.05 | bwd_microstep: 1661.69 | bwd_inner_microstep: 1661.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302 [2024-06-10 07:02:02,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.83 | bwd_microstep: 881.19 | bwd_inner_microstep: 881.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 07:02:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.19 | bwd_microstep: 1445.02 | bwd_inner_microstep: 1444.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733 [2024-06-10 07:02:06,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1637.82 | bwd_inner_microstep: 1637.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 07:02:08,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1287.98 | bwd_inner_microstep: 1287.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404 [2024-06-10 07:02:09,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.11 | bwd_microstep: 1212.40 | bwd_inner_microstep: 1212.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 07:02:11,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1250.97 | bwd_inner_microstep: 1250.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870 [2024-06-10 07:02:12,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.66 | bwd_microstep: 683.47 | bwd_inner_microstep: 683.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405 [2024-06-10 07:02:14,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.37 | bwd_microstep: 1370.64 | bwd_inner_microstep: 1370.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-10 07:02:16,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1280.56 | bwd_inner_microstep: 1280.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 07:02:18,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.47 | bwd_microstep: 1615.13 | bwd_inner_microstep: 1615.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510 [2024-06-10 07:02:20,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.42 | bwd_microstep: 1553.76 | bwd_inner_microstep: 1553.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639 [2024-06-10 07:02:22,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.14 | bwd_microstep: 1575.71 | bwd_inner_microstep: 1575.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3648 [2024-06-10 07:02:24,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.15 | bwd_microstep: 1592.15 | bwd_inner_microstep: 1592.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 07:02:26,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.36 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-10 07:02:28,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.58 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-10 07:02:30,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1614.16 | bwd_inner_microstep: 1614.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-10 07:02:32,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.41 | bwd_microstep: 974.78 | bwd_inner_microstep: 974.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 07:02:34,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2311 [2024-06-10 07:02:35,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.30 | bwd_microstep: 984.12 | bwd_inner_microstep: 984.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 07:02:37,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1254.67 | bwd_inner_microstep: 1254.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465 [2024-06-10 07:02:39,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1331.51 | bwd_inner_microstep: 1331.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653 [2024-06-10 07:02:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1427.47 | bwd_inner_microstep: 1427.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3602 [2024-06-10 07:02:42,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.74 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3561 [2024-06-10 07:02:44,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.47 | bwd_microstep: 1523.64 | bwd_inner_microstep: 1523.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 07:02:46,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2268 [2024-06-10 07:02:48,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.82 | bwd_microstep: 937.16 | bwd_inner_microstep: 937.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 07:02:50,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1356.47 | bwd_inner_microstep: 1356.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 07:02:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1402.80 | bwd_inner_microstep: 1402.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-10 07:02:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 07:02:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 4313.17 | bwd_inner_microstep: 1697.97 | bwd_allreduce_microstep: 2615.15 | step_microstep: 38.72 [2024-06-10 07:02:56,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15914.85 | bwd: 45332.66 | bwd_inner: 42716.50 | bwd_allreduce: 2615.43 | step: 40.43 {'loss': 1.3196, 'learning_rate': 3.6584948028139126e-05, 'epoch': 0.21} t] 21%|██ | 363/1726 [6:20:29<23:11:26, 61.25s/it] 21%|██ | 363/1726 [6:20:29<23:11:26, 61.25s/it] 21%|██ | 364/1726 [6:21:30<23:09:33, 61.21s/it] 21%|██ | 364/1726 [6:21:30<23:09:33, 61.21s/it] 21%|██ | 365/1726 [6:22:30<22:59:59, 60.84s/it] 21%|██ | 365/1726 [6:22:30<22:59:59, 60.84s/it] 21%|██ | 366/1726 [6:23:32<23:04:49, 61.10s/it] 21%|██ | 366/1726 [6:23:32<23:04:49, 61.10s/it] 21%|██▏ | 367/1726 [6:24:32<22:56:10, 60.76s/it] 21%|██▏ | 367/1726 [6:24:32<22:56:10, 60.76s/it] 21%|██▏ | 368/1726 [6:25:33<23:00:51, 61.01s/it] 21%|██�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478 [2024-06-10 07:02:58,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1305.10 | bwd_inner_microstep: 1305.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3897 [2024-06-10 07:03:00,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.51 | bwd_microstep: 1481.15 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3930 [2024-06-10 07:03:03,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.25 | bwd_microstep: 1593.19 | bwd_inner_microstep: 1593.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 07:03:05,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.39 | bwd_microstep: 1491.29 | bwd_inner_microstep: 1491.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3770 [2024-06-10 07:03:07,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 1471.49 | bwd_inner_microstep: 1471.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 07:03:08,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.61 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 07:03:09,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.99 | bwd_microstep: 794.25 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 07:03:11,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 07:03:12,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1288.32 | bwd_inner_microstep: 1288.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504 [2024-06-10 07:03:14,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1224.07 | bwd_inner_microstep: 1224.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 07:03:16,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1249.45 | bwd_inner_microstep: 1249.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 07:03:18,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-10 07:03:20,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1318.93 | bwd_inner_microstep: 1318.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 07:03:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1468.90 | bwd_inner_microstep: 1468.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 07:03:23,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1383.88 | bwd_inner_microstep: 1383.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 07:03:25,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1391.00 | bwd_inner_microstep: 1390.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2184 [2024-06-10 07:03:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.35 | bwd_microstep: 867.53 | bwd_inner_microstep: 867.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-10 07:03:29,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1515.92 | bwd_inner_microstep: 1515.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 07:03:30,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1284.83 | bwd_inner_microstep: 1284.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 07:03:32,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.53 | bwd_microstep: 1281.11 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 07:03:34,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451 [2024-06-10 07:03:36,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.44 | bwd_microstep: 1320.08 | bwd_inner_microstep: 1320.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549 [2024-06-10 07:03:38,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.88 | bwd_microstep: 1331.29 | bwd_inner_microstep: 1331.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067 [2024-06-10 07:03:39,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.21 | bwd_microstep: 819.19 | bwd_inner_microstep: 819.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 07:03:41,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.91 | bwd_microstep: 1499.36 | bwd_inner_microstep: 1499.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 07:03:43,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1413.02 | bwd_inner_microstep: 1412.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-10 07:03:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 814.94 | bwd_inner_microstep: 814.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 07:03:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.25 | bwd_microstep: 1449.87 | bwd_inner_microstep: 1449.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471 [2024-06-10 07:03:48,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1403.85 | bwd_inner_microstep: 1403.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3603 [2024-06-10 07:03:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.90 | bwd_microstep: 1775.78 | bwd_inner_microstep: 1775.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2040 [2024-06-10 07:03:52,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.86 | bwd_microstep: 903.38 | bwd_inner_microstep: 903.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 07:03:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 07:03:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 5516.94 | bwd_inner_microstep: 1753.21 | bwd_allreduce_microstep: 3763.67 | step_microstep: 38.68 [2024-06-10 07:03:58,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15609.04 | bwd: 45485.97 | bwd_inner: 41721.33 | bwd_allreduce: 3763.93 | step: 40.31 {'loss': 1.3082, 'learning_rate': 3.6563941789929994e-05, 'epoch': 0.21} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 07:04:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1372.68 | bwd_inner_microstep: 1372.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 07:04:02,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1382.98 | bwd_inner_microstep: 1382.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 07:04:04,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.96 | bwd_microstep: 1543.55 | bwd_inner_microstep: 1543.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4121 [2024-06-10 07:04:06,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1437.76 | bwd_inner_microstep: 1437.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-10 07:04:08,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1279.75 | bwd_inner_microstep: 1279.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 07:04:09,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.42 | bwd_microstep: 1250.46 | bwd_inner_microstep: 1250.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 07:04:11,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.42 | bwd_microstep: 1242.76 | bwd_inner_microstep: 1242.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 07:04:13,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.87 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882 [2024-06-10 07:04:14,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.44 | bwd_microstep: 710.96 | bwd_inner_microstep: 710.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730 [2024-06-10 07:04:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.04 | bwd_microstep: 1732.89 | bwd_inner_microstep: 1732.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 705 [2024-06-10 07:04:17,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.08 | bwd_microstep: 289.81 | bwd_inner_microstep: 289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455 [2024-06-10 07:04:19,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1317.16 | bwd_inner_microstep: 1317.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004 [2024-06-10 07:04:20,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.88 | bwd_microstep: 831.17 | bwd_inner_microstep: 831.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 07:04:22,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1342.53 | bwd_inner_microstep: 1342.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 07:04:24,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.50 | bwd_microstep: 1627.84 | bwd_inner_microstep: 1627.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3551 [2024-06-10 07:04:26,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1363.37 | bwd_inner_microstep: 1363.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3704 [2024-06-10 07:04:28,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.93 | bwd_microstep: 1724.66 | bwd_inner_microstep: 1724.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518 [2024-06-10 07:04:30,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1513.41 | bwd_inner_microstep: 1513.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 07:04:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1506.45 | bwd_inner_microstep: 1506.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2101 [2024-06-10 07:04:34,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.96 | bwd_microstep: 921.11 | bwd_inner_microstep: 921.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525 [2024-06-10 07:04:36,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.78 | bwd_microstep: 1454.55 | bwd_inner_microstep: 1454.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 07:04:37,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1383.10 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 07:04:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1463.96 | bwd_inner_microstep: 1463.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 3033 [2024-06-10 07:04:41,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.10 | bwd_microstep: 1091.44 | bwd_inner_microstep: 1091.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2059 [2024-06-10 07:04:42,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.51 | bwd_microstep: 944.24 | bwd_inner_microstep: 944.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 07:04:44,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.89 | bwd_microstep: 1254.18 | bwd_inner_microstep: 1254.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 07:04:46,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.96 | bwd_microstep: 1498.35 | bwd_inner_microstep: 1498.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451 [2024-06-10 07:04:48,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 07:04:50,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.97 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 07:04:52,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.33 | bwd_microstep: 1506.35 | bwd_inner_microstep: 1506.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 07:04:54,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1415.19 | bwd_inner_microstep: 1415.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2182 [2024-06-10 07:05:00,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 07:05:00,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.52 | bwd_microstep: 5697.32 | bwd_inner_microstep: 865.13 | bwd_allreduce_microstep: 4832.13 | step_microstep: 38.94 [2024-06-10 07:05:00,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15460.27 | bwd: 46157.75 | bwd_inner: 41324.71 | bwd_allreduce: 4832.37 | step: 40.58 {'loss': 1.305, 'learning_rate': 3.654287721360398e-05, 'epoch': 0.21} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4489 [2024-06-10 07:05:03,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 709.69 | bwd_microstep: 1915.79 | bwd_inner_microstep: 1915.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 07:05:04,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1373.17 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 07:05:06,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1343.52 | bwd_inner_microstep: 1343.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867 [2024-06-10 07:05:07,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.35 | bwd_microstep: 708.00 | bwd_inner_microstep: 707.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 07:05:09,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1346.61 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 07:05:11,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 07:05:13,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1550.69 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 07:05:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.99 | bwd_microstep: 1400.67 | bwd_inner_microstep: 1400.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 07:05:16,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.45 | bwd_microstep: 793.69 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 07:05:18,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1246.82 | bwd_inner_microstep: 1246.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1900 [2024-06-10 07:05:19,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 717.61 | bwd_inner_microstep: 717.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947 [2024-06-10 07:05:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.25 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3422 [2024-06-10 07:05:22,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3500 [2024-06-10 07:05:24,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.51 | bwd_microstep: 1680.83 | bwd_inner_microstep: 1680.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3645 [2024-06-10 07:05:26,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1538.42 | bwd_inner_microstep: 1538.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513 [2024-06-10 07:05:28,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.49 | bwd_microstep: 1200.64 | bwd_inner_microstep: 1200.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2142 [2024-06-10 07:05:29,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.63 | bwd_microstep: 866.64 | bwd_inner_microstep: 866.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972 [2024-06-10 07:05:30,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.13 | bwd_microstep: 705.61 | bwd_inner_microstep: 705.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2095 [2024-06-10 07:05:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.57 | bwd_microstep: 918.47 | bwd_inner_microstep: 918.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 07:05:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1419.19 | bwd_inner_microstep: 1419.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 07:05:35,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.67 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2293 [2024-06-10 07:05:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.49 | bwd_microstep: 819.20 | bwd_inner_microstep: 819.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 07:05:38,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1488.34 | bwd_inner_microstep: 1488.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063 [2024-06-10 07:05:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.63 | bwd_microstep: 915.26 | bwd_inner_microstep: 915.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3627 [2024-06-10 07:05:41,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1373.79 | bwd_inner_microstep: 1373.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875 [2024-06-10 07:05:44,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.69 | bwd_microstep: 1678.11 | bwd_inner_microstep: 1678.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2241 [2024-06-10 07:05:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.29 | bwd_microstep: 1062.56 | bwd_inner_microstep: 1062.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 07:05:47,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.03 | bwd_microstep: 970.26 | bwd_inner_microstep: 970.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 07:05:49,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 07:05:51,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1498.83 | bwd_inner_microstep: 1498.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2892 [2024-06-10 07:05:52,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.25 | bwd_microstep: 1187.29 | bwd_inner_microstep: 1187.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 07:06:00,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.32 | optimizer_step: 6.61 [2024-06-10 07:06:00,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 7404.00 | bwd_inner_microstep: 1696.82 | bwd_allreduce_microstep: 5707.13 | step_microstep: 38.95 [2024-06-10 07:06:00,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14758.57 | bwd: 45328.02 | bwd_inner: 39619.98 | bwd_allreduce: 5707.37 | step: 40.66 {'loss': 1.2988, 'learning_rate': 3.652175437335041e-05, 'epoch': 0.21} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3471 [2024-06-10 07:06:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1569.45 | bwd_inner_microstep: 1569.38 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 07:06:04,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1393.38 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 07:06:06,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1374.23 | bwd_inner_microstep: 1374.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 07:06:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1383.67 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 07:06:10,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1281.61 | bwd_inner_microstep: 1281.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3770 [2024-06-10 07:06:12,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1494.25 | bwd_inner_microstep: 1494.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3404 [2024-06-10 07:06:14,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.28 | bwd_microstep: 1213.18 | bwd_inner_microstep: 1213.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4041 [2024-06-10 07:06:16,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.06 | bwd_microstep: 1718.85 | bwd_inner_microstep: 1718.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499 [2024-06-10 07:06:18,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.83 | bwd_microstep: 1319.04 | bwd_inner_microstep: 1319.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970 [2024-06-10 07:06:19,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 796.75 | bwd_inner_microstep: 796.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2199 [2024-06-10 07:06:20,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.20 | bwd_microstep: 797.59 | bwd_inner_microstep: 797.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 07:06:22,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1390.44 | bwd_inner_microstep: 1390.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 07:06:24,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3457 [2024-06-10 07:06:26,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.06 | bwd_microstep: 1330.29 | bwd_inner_microstep: 1330.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 07:06:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1856 [2024-06-10 07:06:29,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.45 | bwd_microstep: 675.50 | bwd_inner_microstep: 675.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 07:06:31,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1486.10 | bwd_inner_microstep: 1486.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473 [2024-06-10 07:06:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1426.35 | bwd_inner_microstep: 1426.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 07:06:35,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.14 | bwd_microstep: 1622.48 | bwd_inner_microstep: 1622.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 07:06:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 1657.94 | bwd_inner_microstep: 1657.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 07:06:39,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1506.69 | bwd_inner_microstep: 1506.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 07:06:41,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1258.83 | bwd_inner_microstep: 1258.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 07:06:42,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.10 | bwd_microstep: 813.99 | bwd_inner_microstep: 813.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-10 07:06:43,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.87 | bwd_microstep: 883.43 | bwd_inner_microstep: 883.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 07:06:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.37 | bwd_microstep: 1459.99 | bwd_inner_microstep: 1459.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 07:06:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1283.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772 [2024-06-10 07:06:49,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.31 | bwd_microstep: 1346.55 | bwd_inner_microstep: 1346.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 07:06:51,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.63 | bwd_microstep: 1656.73 | bwd_inner_microstep: 1656.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600 [2024-06-10 07:06:53,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.79 | bwd_microstep: 1466.24 | bwd_inner_microstep: 1466.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3434 [2024-06-10 07:06:55,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.63 | bwd_microstep: 1464.84 | bwd_inner_microstep: 1464.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 07:06:58,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2414 [2024-06-10 07:06:59,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-10 07:06:59,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.01 | bwd_microstep: 1168.89 | bwd_inner_microstep: 1161.23 | bwd_allreduce_microstep: 7.62 | step_microstep: 38.37 [2024-06-10 07:06:59,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15958.37 | bwd: 42664.78 | bwd_inner: 42654.89 | bwd_allreduce: 9.16 | step: 40.11 {'loss': 1.2854, 'learning_rate': 3.6500573343563835e-05, 'epoch': 0.22} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 07:07:01,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.16 | bwd_microstep: 1450.53 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 07:07:03,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1344.93 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875 [2024-06-10 07:07:05,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.74 | bwd_microstep: 1683.93 | bwd_inner_microstep: 1683.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-10 07:07:08,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1642.00 | bwd_inner_microstep: 1641.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3474 [2024-06-10 07:07:10,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1331.44 | bwd_inner_microstep: 1331.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 07:07:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 07:07:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1378.02 | bwd_inner_microstep: 1377.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 07:07:16,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 07:07:17,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 07:07:19,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 794.47 | bwd_inner_microstep: 794.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 07:07:20,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.50 | bwd_microstep: 1153.14 | bwd_inner_microstep: 1153.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974 [2024-06-10 07:07:21,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.92 | bwd_microstep: 832.54 | bwd_inner_microstep: 832.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469 [2024-06-10 07:07:23,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.32 | bwd_microstep: 1217.07 | bwd_inner_microstep: 1217.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3895 [2024-06-10 07:07:25,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1646.90 | bwd_inner_microstep: 1646.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2152 [2024-06-10 07:07:27,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.02 | bwd_microstep: 947.99 | bwd_inner_microstep: 947.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669 [2024-06-10 07:07:29,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1479.64 | bwd_inner_microstep: 1479.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675 [2024-06-10 07:07:31,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1479.43 | bwd_inner_microstep: 1479.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938 [2024-06-10 07:07:32,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 698.70 | bwd_inner_microstep: 698.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 07:07:34,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1350.85 | bwd_inner_microstep: 1350.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 07:07:35,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 07:07:38,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1557.29 | bwd_inner_microstep: 1557.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 07:07:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.66 | bwd_microstep: 1257.53 | bwd_inner_microstep: 1257.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1967 [2024-06-10 07:07:40,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 767.27 | bwd_inner_microstep: 767.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 07:07:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.69 | bwd_microstep: 1658.71 | bwd_inner_microstep: 1658.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-10 07:07:44,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.30 | bwd_microstep: 1158.75 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3464 [2024-06-10 07:07:46,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.99 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3777 [2024-06-10 07:07:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1474.62 | bwd_inner_microstep: 1474.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819 [2024-06-10 07:07:50,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.21 | bwd_microstep: 1717.87 | bwd_inner_microstep: 1717.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388 [2024-06-10 07:07:52,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.64 | bwd_microstep: 1436.34 | bwd_inner_microstep: 1436.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2046 [2024-06-10 07:07:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.42 | bwd_microstep: 874.23 | bwd_inner_microstep: 874.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 07:07:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1493.61 | bwd_inner_microstep: 1493.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044 [2024-06-10 07:08:00,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.25 | optimizer_step: 6.63 [2024-06-10 07:08:00,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.08 | bwd_microstep: 4413.65 | bwd_inner_microstep: 1043.02 | bwd_allreduce_microstep: 3370.56 | step_microstep: 38.91 [2024-06-10 07:08:00,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15622.19 | bwd: 45204.76 | bwd_inner: 41833.15 | bwd_allreduce: 3370.87 | step: 40.57 {'loss': 1.333, 'learning_rate': 3.647933419884371e-05, 'epoch': 0.22} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2426 [2024-06-10 07:08:02,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.24 | bwd_microstep: 1027.79 | bwd_inner_microstep: 1027.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 07:08:04,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1246.25 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 07:08:06,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1401.40 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3801 [2024-06-10 07:08:07,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.46 | bwd_microstep: 1352.10 | bwd_inner_microstep: 1352.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 07:08:09,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1384.26 | bwd_inner_microstep: 1384.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 07:08:11,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1381.91 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3462 [2024-06-10 07:08:13,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.91 | bwd_microstep: 1239.71 | bwd_inner_microstep: 1239.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692 [2024-06-10 07:08:15,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1617.99 | bwd_inner_microstep: 1617.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 07:08:17,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1375.04 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4023 [2024-06-10 07:08:19,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.84 | bwd_microstep: 1703.63 | bwd_inner_microstep: 1703.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 07:08:22,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1483.28 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529 [2024-06-10 07:08:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.61 | bwd_microstep: 1326.21 | bwd_inner_microstep: 1326.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2470 [2024-06-10 07:08:25,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.56 | bwd_microstep: 857.91 | bwd_inner_microstep: 857.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2480 [2024-06-10 07:08:26,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.81 | bwd_microstep: 959.01 | bwd_inner_microstep: 958.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 07:08:28,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1400.38 | bwd_inner_microstep: 1400.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081 [2024-06-10 07:08:29,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.90 | bwd_microstep: 916.40 | bwd_inner_microstep: 916.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 07:08:31,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1451.04 | bwd_inner_microstep: 1451.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 07:08:33,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1284.63 | bwd_inner_microstep: 1284.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564 [2024-06-10 07:08:35,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.05 | bwd_microstep: 1424.60 | bwd_inner_microstep: 1424.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083 [2024-06-10 07:08:36,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 822.99 | bwd_inner_microstep: 822.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2543 [2024-06-10 07:08:37,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.56 | bwd_microstep: 969.26 | bwd_inner_microstep: 969.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 07:08:39,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.31 | bwd_microstep: 1489.89 | bwd_inner_microstep: 1489.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591 [2024-06-10 07:08:41,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1307.13 | bwd_inner_microstep: 1307.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 07:08:43,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1519.67 | bwd_inner_microstep: 1519.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 07:08:45,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1561.17 | bwd_inner_microstep: 1561.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3596 [2024-06-10 07:08:47,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.87 | bwd_microstep: 1275.74 | bwd_inner_microstep: 1275.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3548 [2024-06-10 07:08:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 07:08:51,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 1555.84 | bwd_inner_microstep: 1555.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617 [2024-06-10 07:08:54,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 1603.71 | bwd_inner_microstep: 1603.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3434 [2024-06-10 07:08:55,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1297.02 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-10 07:08:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 07:09:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 07:09:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 4731.44 | bwd_inner_microstep: 1676.49 | bwd_allreduce_microstep: 3054.89 | step_microstep: 38.71 [2024-06-10 07:09:03,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16092.16 | bwd: 46118.29 | bwd_inner: 43062.48 | bwd_allreduce: 3055.12 | step: 40.27 � | 368/1726 [6:25:33<23:00:51, 61.01s/it] 21%|██▏ | 369/1726 [6:26:35<23:02:46, 61.14s/it] 21%|██▏ | 369/1726 [6:26:35<23:02:46, 61.14s/it] 21%|██▏ | 370/1726 [6:27:37<23:07:21, 61.39s/it] 21%|██▏ | 370/1726 [6:27:37<23:07:21, 61.39s/it] 21%|██▏ | 371/1726 [6:28:37<22:59:55, 61.10s/it] 21%|██▏ | 371/1726 [6:28:37<22:59:55, 61.10s/it] 22%|██▏ | 372/1726 [6:29:36<22:44:30, 60.47s/it] 22%|██▏ | 372/1726 [6:29:36<22:44:30, 60.47s/it] 22%|██▏ | 373/1726 [6:30:37<22:48:21, 60.68s/it] 22%|██▏ | 373/1726 [6:30:37<22:48:21, 60.68s/it] 22%|██▏ | 374/1726 [6:31:40<23:00:02, 61.24s/it] {'loss': 1.3482, 'learning_rate': 3.6458037013994214e-05, 'epoch': 0.22} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 07:09:05,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1387.71 | bwd_inner_microstep: 1387.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3910 [2024-06-10 07:09:07,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1493.91 | bwd_inner_microstep: 1493.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-10 07:09:09,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.99 | bwd_microstep: 1455.44 | bwd_inner_microstep: 1455.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 07:09:11,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 07:09:13,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1345.72 | bwd_inner_microstep: 1345.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 07:09:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1875 [2024-06-10 07:09:16,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.89 | bwd_microstep: 681.65 | bwd_inner_microstep: 681.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 07:09:17,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.16 | bwd_microstep: 1194.31 | bwd_inner_microstep: 1194.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-10 07:09:19,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.10 | bwd_microstep: 1320.67 | bwd_inner_microstep: 1320.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-10 07:09:20,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.23 | bwd_microstep: 806.56 | bwd_inner_microstep: 806.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1946 [2024-06-10 07:09:21,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.33 | bwd_microstep: 891.93 | bwd_inner_microstep: 891.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:09:23,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.43 | bwd_microstep: 1387.81 | bwd_inner_microstep: 1387.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-10 07:09:26,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.31 | bwd_microstep: 1639.36 | bwd_inner_microstep: 1639.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441 [2024-06-10 07:09:27,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1301.63 | bwd_inner_microstep: 1301.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691 [2024-06-10 07:09:29,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1326.69 | bwd_inner_microstep: 1326.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 07:09:31,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1397.07 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 07:09:33,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1389.63 | bwd_inner_microstep: 1389.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-10 07:09:34,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.60 | bwd_microstep: 798.88 | bwd_inner_microstep: 798.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 07:09:36,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.09 | bwd_microstep: 1259.04 | bwd_inner_microstep: 1259.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844 [2024-06-10 07:09:38,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.34 | bwd_microstep: 1655.24 | bwd_inner_microstep: 1655.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2167 [2024-06-10 07:09:39,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.84 | bwd_microstep: 953.02 | bwd_inner_microstep: 952.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744 [2024-06-10 07:09:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1472.28 | bwd_inner_microstep: 1472.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 07:09:44,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.13 | bwd_microstep: 1663.81 | bwd_inner_microstep: 1663.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 07:09:46,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1304.83 | bwd_inner_microstep: 1304.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 07:09:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.41 | bwd_microstep: 1302.98 | bwd_inner_microstep: 1302.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 07:09:49,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.72 | bwd_microstep: 1202.04 | bwd_inner_microstep: 1202.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3751 [2024-06-10 07:09:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1279.97 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2036 [2024-06-10 07:09:52,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.51 | bwd_microstep: 717.71 | bwd_inner_microstep: 717.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 07:09:54,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1286.88 | bwd_inner_microstep: 1286.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3474 [2024-06-10 07:09:56,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1442.57 | bwd_inner_microstep: 1442.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771 [2024-06-10 07:09:58,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.83 | bwd_microstep: 1741.54 | bwd_inner_microstep: 1741.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 07:10:05,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.60 [2024-06-10 07:10:05,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 6190.73 | bwd_inner_microstep: 1871.43 | bwd_allreduce_microstep: 4319.25 | step_microstep: 38.67 [2024-06-10 07:10:05,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15576.82 | bwd: 45957.34 | bwd_inner: 41637.17 | bwd_allreduce: 4319.48 | step: 40.22 {'loss': 1.2833, 'learning_rate': 3.643668186402392e-05, 'epoch': 0.22} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 07:10:07,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.62 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.14 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3987 [2024-06-10 07:10:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1630.59 | bwd_inner_microstep: 1630.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 07:10:10,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.27 | bwd_microstep: 791.39 | bwd_inner_microstep: 791.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3930 [2024-06-10 07:10:12,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1593.80 | bwd_inner_microstep: 1593.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 07:10:14,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 07:10:16,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1381.68 | bwd_inner_microstep: 1381.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 07:10:18,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.88 | bwd_microstep: 1289.91 | bwd_inner_microstep: 1289.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 07:10:20,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1385.47 | bwd_inner_microstep: 1385.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 07:10:22,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3716 [2024-06-10 07:10:23,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1400.33 | bwd_inner_microstep: 1400.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 07:10:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 798.29 | bwd_inner_microstep: 798.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-10 07:10:26,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.54 | bwd_microstep: 1282.87 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 07:10:28,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.59 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3044 [2024-06-10 07:10:30,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.24 | bwd_microstep: 1232.52 | bwd_inner_microstep: 1232.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2988 [2024-06-10 07:10:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.89 | bwd_microstep: 1203.50 | bwd_inner_microstep: 1203.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003 [2024-06-10 07:10:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.43 | bwd_microstep: 740.73 | bwd_inner_microstep: 740.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-10 07:10:35,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.50 | bwd_microstep: 1525.45 | bwd_inner_microstep: 1525.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640 [2024-06-10 07:10:37,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.52 | bwd_microstep: 1318.77 | bwd_inner_microstep: 1318.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 07:10:39,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1297.71 | bwd_inner_microstep: 1297.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546 [2024-06-10 07:10:40,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.03 | bwd_microstep: 1201.67 | bwd_inner_microstep: 1201.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627 [2024-06-10 07:10:42,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1615.70 | bwd_inner_microstep: 1615.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 07:10:44,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1516.20 | bwd_inner_microstep: 1516.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1967 [2024-06-10 07:10:45,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.82 | bwd_microstep: 705.00 | bwd_inner_microstep: 704.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824 [2024-06-10 07:10:47,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.81 | bwd_microstep: 1361.19 | bwd_inner_microstep: 1361.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911 [2024-06-10 07:10:50,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.28 | bwd_microstep: 1702.01 | bwd_inner_microstep: 1701.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278 [2024-06-10 07:10:51,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.30 | bwd_microstep: 909.34 | bwd_inner_microstep: 909.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-10 07:10:53,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1318.36 | bwd_inner_microstep: 1318.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3602 [2024-06-10 07:10:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1573.09 | bwd_inner_microstep: 1573.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-10 07:10:57,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 1508.47 | bwd_inner_microstep: 1508.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 07:10:59,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1399.78 | bwd_inner_microstep: 1399.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848 [2024-06-10 07:11:01,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.64 | bwd_microstep: 1764.99 | bwd_inner_microstep: 1764.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 07:11:06,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.20 | optimizer_step: 6.56 [2024-06-10 07:11:06,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.35 | bwd_microstep: 3976.50 | bwd_inner_microstep: 1595.41 | bwd_allreduce_microstep: 2381.04 | step_microstep: 38.56 [2024-06-10 07:11:06,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15823.89 | bwd: 44849.16 | bwd_inner: 42467.11 | bwd_allreduce: 2381.33 | step: 40.29 {'loss': 1.3417, 'learning_rate': 3.641526882414553e-05, 'epoch': 0.22} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 07:11:08,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.89 | bwd_microstep: 1332.73 | bwd_inner_microstep: 1332.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3920 [2024-06-10 07:11:10,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.56 | bwd_microstep: 1687.70 | bwd_inner_microstep: 1687.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2397 [2024-06-10 07:11:11,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.82 | bwd_microstep: 906.57 | bwd_inner_microstep: 906.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-10 07:11:14,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.19 | bwd_microstep: 1686.90 | bwd_inner_microstep: 1686.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 07:11:15,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1299.59 | bwd_inner_microstep: 1299.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 07:11:17,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1253.74 | bwd_inner_microstep: 1253.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 07:11:19,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 07:11:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1389.23 | bwd_inner_microstep: 1389.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-10 07:11:22,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.83 | bwd_microstep: 687.59 | bwd_inner_microstep: 687.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948 [2024-06-10 07:11:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 731.31 | bwd_inner_microstep: 731.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 07:11:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.83 | bwd_microstep: 798.66 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684 [2024-06-10 07:11:26,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1423.03 | bwd_inner_microstep: 1423.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 07:11:28,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.43 | bwd_microstep: 1293.86 | bwd_inner_microstep: 1293.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421 [2024-06-10 07:11:30,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1308.84 | bwd_inner_microstep: 1308.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3960 [2024-06-10 07:11:32,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.24 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 07:11:34,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1477.13 | bwd_inner_microstep: 1477.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-10 07:11:36,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1513.15 | bwd_inner_microstep: 1513.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 07:11:38,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3373 [2024-06-10 07:11:39,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.72 | bwd_microstep: 1210.49 | bwd_inner_microstep: 1210.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 07:11:41,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1285.67 | bwd_inner_microstep: 1285.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3947 [2024-06-10 07:11:43,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.21 | bwd_microstep: 1505.91 | bwd_inner_microstep: 1505.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2282 [2024-06-10 07:11:44,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.17 | bwd_microstep: 850.10 | bwd_inner_microstep: 850.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 07:11:46,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1299.41 | bwd_inner_microstep: 1299.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-10 07:11:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.24 | bwd_microstep: 810.81 | bwd_inner_microstep: 810.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 07:11:49,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.88 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 07:11:51,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1394.43 | bwd_inner_microstep: 1394.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 07:11:52,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.44 | bwd_microstep: 685.32 | bwd_inner_microstep: 685.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-10 07:11:53,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 879.27 | bwd_inner_microstep: 879.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-10 07:11:55,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1403.21 | bwd_inner_microstep: 1403.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585 [2024-06-10 07:11:57,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.68 | bwd_microstep: 1532.36 | bwd_inner_microstep: 1532.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 07:11:59,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3596 [2024-06-10 07:12:06,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 07:12:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.17 | bwd_microstep: 6082.81 | bwd_inner_microstep: 2048.66 | bwd_allreduce_microstep: 4034.09 | step_microstep: 38.65 [2024-06-10 07:12:06,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15143.05 | bwd: 44569.34 | bwd_inner: 40534.33 | bwd_allreduce: 4034.32 | step: 40.28 {'loss': 1.3252, 'learning_rate': 3.639379796977569e-05, 'epoch': 0.22} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 07:12:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1491.50 | bwd_inner_microstep: 1491.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3884 [2024-06-10 07:12:10,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1412.98 | bwd_inner_microstep: 1412.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3399 [2024-06-10 07:12:12,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 07:12:14,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229 [2024-06-10 07:12:15,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.66 | bwd_microstep: 959.80 | bwd_inner_microstep: 959.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3784 [2024-06-10 07:12:17,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1396.26 | bwd_inner_microstep: 1396.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 07:12:19,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 874 [2024-06-10 07:12:20,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.14 | bwd_microstep: 366.98 | bwd_inner_microstep: 366.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 07:12:22,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.95 | bwd_microstep: 1388.00 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 07:12:23,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1255.02 | bwd_inner_microstep: 1254.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 07:12:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1299.37 | bwd_inner_microstep: 1299.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 07:12:27,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.98 | bwd_microstep: 1381.84 | bwd_inner_microstep: 1381.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 07:12:29,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 1522.75 | bwd_inner_microstep: 1522.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 07:12:31,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1522.98 | bwd_inner_microstep: 1522.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3672 [2024-06-10 07:12:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.67 | bwd_microstep: 1656.52 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3528 [2024-06-10 07:12:36,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 07:12:38,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1490.11 | bwd_inner_microstep: 1490.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1995 [2024-06-10 07:12:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.01 | bwd_microstep: 787.55 | bwd_inner_microstep: 787.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 07:12:40,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.87 | bwd_microstep: 796.73 | bwd_inner_microstep: 796.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 07:12:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1379.18 | bwd_inner_microstep: 1379.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691 [2024-06-10 07:12:44,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1332.00 | bwd_inner_microstep: 1331.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917 [2024-06-10 07:12:45,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.36 | bwd_microstep: 689.56 | bwd_inner_microstep: 689.40 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 07:12:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 07:12:49,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1413.12 | bwd_inner_microstep: 1413.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748 [2024-06-10 07:12:51,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.29 | bwd_microstep: 1441.78 | bwd_inner_microstep: 1441.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 07:12:52,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1257.35 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1025 [2024-06-10 07:12:53,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 168.61 | bwd_microstep: 433.46 | bwd_inner_microstep: 433.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604 [2024-06-10 07:12:55,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.06 | bwd_microstep: 1535.97 | bwd_inner_microstep: 1535.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 07:12:57,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.09 | bwd_microstep: 1257.02 | bwd_inner_microstep: 1256.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 07:12:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.78 | bwd_microstep: 1522.85 | bwd_inner_microstep: 1522.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3806 [2024-06-10 07:13:02,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.27 | bwd_microstep: 1802.31 | bwd_inner_microstep: 1802.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3392 [2024-06-10 07:13:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.56 [2024-06-10 07:13:06,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 4209.69 | bwd_inner_microstep: 1632.62 | bwd_allreduce_microstep: 2577.01 | step_microstep: 38.60 [2024-06-10 07:13:06,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15584.79 | bwd: 44401.26 | bwd_inner: 41823.21 | bwd_allreduce: 2577.31 | step: 40.24 {'loss': 1.2705, 'learning_rate': 3.637226937653461e-05, 'epoch': 0.22} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 07:13:08,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1471.68 | bwd_inner_microstep: 1471.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 07:13:10,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1477.52 | bwd_inner_microstep: 1477.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454 [2024-06-10 07:13:12,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 07:13:14,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 07:13:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.06 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885 [2024-06-10 07:13:17,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.90 | bwd_microstep: 682.47 | bwd_inner_microstep: 682.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 07:13:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1243.63 | bwd_inner_microstep: 1243.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:13:21,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1247.86 | bwd_inner_microstep: 1247.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 07:13:23,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.10 | bwd_microstep: 1251.76 | bwd_inner_microstep: 1251.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2635 [2024-06-10 07:13:24,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.40 | bwd_microstep: 1021.89 | bwd_inner_microstep: 1021.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540 [2024-06-10 07:13:26,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1426.58 | bwd_inner_microstep: 1426.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1989 [2024-06-10 07:13:27,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.10 | bwd_microstep: 833.32 | bwd_inner_microstep: 833.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 07:13:29,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1428.22 | bwd_inner_microstep: 1428.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 07:13:31,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1248.38 | bwd_inner_microstep: 1248.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3707 [2024-06-10 07:13:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.60 | bwd_microstep: 1725.55 | bwd_inner_microstep: 1725.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:13:35,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1389.12 | bwd_inner_microstep: 1389.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434 [2024-06-10 07:13:37,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.48 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 07:13:38,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1254.21 | bwd_inner_microstep: 1254.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 07:13:40,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1456.33 | bwd_inner_microstep: 1456.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 07:13:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.49 | bwd_microstep: 1656.36 | bwd_inner_microstep: 1656.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-10 07:13:45,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1442.10 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431 [2024-06-10 07:13:47,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 07:13:48,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.97 | bwd_microstep: 979.94 | bwd_inner_microstep: 979.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 07:13:49,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.20 | bwd_microstep: 698.01 | bwd_inner_microstep: 697.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 07:13:51,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.76 | bwd_microstep: 1377.68 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180 [2024-06-10 07:13:52,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 862.12 | bwd_inner_microstep: 862.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2275 [2024-06-10 07:13:53,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.92 | bwd_microstep: 1003.37 | bwd_inner_microstep: 1003.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719 [2024-06-10 07:13:55,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1337.05 | bwd_inner_microstep: 1337.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-10 07:13:57,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1507.35 | bwd_inner_microstep: 1507.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 07:13:59,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1500.20 | bwd_inner_microstep: 1500.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3567 [2024-06-10 07:14:01,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1331.57 | bwd_inner_microstep: 1331.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1927 [2024-06-10 07:14:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 07:14:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.98 | bwd_microstep: 5440.02 | bwd_inner_microstep: 938.18 | bwd_allreduce_microstep: 4501.79 | step_microstep: 38.62 [2024-06-10 07:14:07,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15233.79 | bwd: 45202.32 | bwd_inner: 40699.61 | bwd_allreduce: 4502.02 | step: 40.27 {'loss': 1.2972, 'learning_rate': 3.6350683120245906e-05, 'epoch': 0.22} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-10 07:14:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.84 | bwd_microstep: 1570.06 | bwd_inner_microstep: 1570.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 07:14:11,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.10 | bwd_microstep: 1241.90 | bwd_inner_microstep: 1241.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 07:14:13,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.33 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-10 07:14:15,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 07:14:17,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 07:14:19,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406 [2024-06-10 07:14:20,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.04 | bwd_microstep: 1213.71 | bwd_inner_microstep: 1213.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 07:14:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1248.97 | bwd_inner_microstep: 1248.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-10 07:14:24,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1482.22 | bwd_inner_microstep: 1482.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 07:14:26,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1290.67 | bwd_inner_microstep: 1290.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444 [2024-06-10 07:14:28,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.80 | bwd_microstep: 1221.09 | bwd_inner_microstep: 1221.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3676 [2024-06-10 07:14:30,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.71 | bwd_microstep: 1671.19 | bwd_inner_microstep: 1671.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 1061 [2024-06-10 07:14:30,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.17 | bwd_microstep: 389.84 | bwd_inner_microstep: 389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670 [2024-06-10 07:14:32,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1524.10 | bwd_inner_microstep: 1524.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3424 [2024-06-10 07:14:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1396.66 | bwd_inner_microstep: 1396.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 07:14:36,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1388.37 | bwd_inner_microstep: 1388.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526 [2024-06-10 07:14:38,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1441.96 | bwd_inner_microstep: 1441.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3431 [2024-06-10 07:14:40,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.48 | bwd_microstep: 1475.68 | bwd_inner_microstep: 1475.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 07:14:41,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 797.22 | bwd_inner_microstep: 797.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 07:14:43,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1479.23 | bwd_inner_microstep: 1479.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483 [2024-06-10 07:14:45,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.25 | bwd_microstep: 1315.73 | bwd_inner_microstep: 1315.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2766 [2024-06-10 07:14:47,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.93 | bwd_microstep: 1145.17 | bwd_inner_microstep: 1145.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3527 [2024-06-10 07:14:49,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.37 | bwd_microstep: 1690.36 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 07:14:51,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1585.15 | bwd_inner_microstep: 1585.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 07:14:54,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.74 | bwd_microstep: 1534.19 | bwd_inner_microstep: 1534.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 07:14:56,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1556.83 | bwd_inner_microstep: 1556.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-10 07:14:58,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1602.05 | bwd_inner_microstep: 1602.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 07:15:00,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1355.10 | bwd_inner_microstep: 1355.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 07:15:02,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1378.04 | bwd_inner_microstep: 1378.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 07:15:04,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1655.86 | bwd_inner_microstep: 1655.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3557 [2024-06-10 07:15:06,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.11 | bwd_microstep: 1265.83 | bwd_inner_microstep: 1265.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 07:15:10,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 07:15:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 3909.98 | bwd_inner_microstep: 1756.43 | bwd_allreduce_microstep: 2153.49 | step_microstep: 38.64 [2024-06-10 07:15:10,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16465.20 | bwd: 46321.65 | bwd_inner: 44167.26 | bwd_allreduce: 2153.72 | step: 40.32 22%|██▏ | 374/1726 [6:31:40<23:00:02, 61.24s/it] 22%|██▏ | 375/1726 [6:32:42<23:03:17, 61.43s/it] 22%|██▏ | 375/1726 [6:32:42<23:03:17, 61.43s/it] 22%|██▏ | 376/1726 [6:33:43<22:59:28, 61.31s/it] 22%|██▏ | 376/1726 [6:33:43<22:59:28, 61.31s/it] 22%|██▏ | 377/1726 [6:34:43<22:49:56, 60.93s/it] 22%|██▏ | 377/1726 [6:34:43<22:49:56, 60.93s/it] 22%|██▏ | 378/1726 [6:35:43<22:44:53, 60.75s/it] 22%|██▏ | 378/1726 [6:35:43<22:44:53, 60.75s/it] 22%|██▏ | 379/1726 [6:36:44<22:44:06, 60.76s/it] 22%|██▏ | 379/1726 [6:36:44<22:44:06, 60.76s/it] 22%|█{'loss': 1.2874, 'learning_rate': 3.6329039276936254e-05, 'epoch': 0.22} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 07:15:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 1398.26 | bwd_inner_microstep: 1398.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2433 [2024-06-10 07:15:13,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.99 | bwd_microstep: 914.98 | bwd_inner_microstep: 914.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 07:15:15,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.48 | bwd_microstep: 1244.83 | bwd_inner_microstep: 1244.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 07:15:17,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1276.34 | bwd_inner_microstep: 1276.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 07:15:19,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.17 | bwd_microstep: 1384.99 | bwd_inner_microstep: 1384.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040 [2024-06-10 07:15:20,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.87 | bwd_microstep: 809.04 | bwd_inner_microstep: 809.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 07:15:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.49 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930 [2024-06-10 07:15:23,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 07:15:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1253.17 | bwd_inner_microstep: 1253.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 07:15:26,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.47 | bwd_microstep: 1223.49 | bwd_inner_microstep: 1223.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 07:15:28,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 1933 [2024-06-10 07:15:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.17 | bwd_microstep: 872.26 | bwd_inner_microstep: 872.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418 [2024-06-10 07:15:31,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1278.81 | bwd_inner_microstep: 1278.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1967 [2024-06-10 07:15:32,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.50 | bwd_microstep: 847.84 | bwd_inner_microstep: 847.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 07:15:33,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.23 | bwd_microstep: 801.14 | bwd_inner_microstep: 801.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302 [2024-06-10 07:15:34,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 880.00 | bwd_inner_microstep: 879.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681 [2024-06-10 07:15:36,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1328.80 | bwd_inner_microstep: 1328.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3528 [2024-06-10 07:15:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.97 | bwd_microstep: 1230.25 | bwd_inner_microstep: 1230.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2007 [2024-06-10 07:15:39,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.80 | bwd_microstep: 773.71 | bwd_inner_microstep: 773.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3854 [2024-06-10 07:15:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.21 | bwd_microstep: 1568.83 | bwd_inner_microstep: 1568.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 07:15:43,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3540 [2024-06-10 07:15:45,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1543.54 | bwd_inner_microstep: 1543.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702 [2024-06-10 07:15:47,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1339.40 | bwd_inner_microstep: 1339.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555 [2024-06-10 07:15:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1428.21 | bwd_inner_microstep: 1428.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-10 07:15:50,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.39 | bwd_microstep: 876.21 | bwd_inner_microstep: 876.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3621 [2024-06-10 07:15:53,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.05 | bwd_microstep: 1709.07 | bwd_inner_microstep: 1709.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 07:15:54,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1350.40 | bwd_inner_microstep: 1350.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-10 07:15:57,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.94 | bwd_microstep: 1573.98 | bwd_inner_microstep: 1573.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565 [2024-06-10 07:15:59,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1591.42 | bwd_inner_microstep: 1591.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 07:16:01,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.94 | bwd_microstep: 1398.07 | bwd_inner_microstep: 1398.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 07:16:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.46 | bwd_microstep: 1642.58 | bwd_inner_microstep: 1642.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 07:16:13,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.57 [2024-06-10 07:16:13,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 9677.88 | bwd_inner_microstep: 1543.58 | bwd_allreduce_microstep: 8134.24 | step_microstep: 38.91 [2024-06-10 07:16:13,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14873.65 | bwd: 47798.15 | bwd_inner: 39662.99 | bwd_allreduce: 8134.47 | step: 40.53 {'loss': 1.2614, 'learning_rate': 3.630733792283515e-05, 'epoch': 0.22} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 07:16:15,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.34 | bwd_microstep: 1358.25 | bwd_inner_microstep: 1358.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920 [2024-06-10 07:16:17,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1584.42 | bwd_inner_microstep: 1584.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4287 [2024-06-10 07:16:20,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.80 | bwd_microstep: 1766.92 | bwd_inner_microstep: 1766.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 07:16:22,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1380.25 | bwd_inner_microstep: 1380.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:16:23,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3710 [2024-06-10 07:16:25,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1329.21 | bwd_inner_microstep: 1329.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 07:16:27,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-10 07:16:29,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1504.95 | bwd_inner_microstep: 1504.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-10 07:16:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.61 | bwd_microstep: 1522.94 | bwd_inner_microstep: 1522.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-10 07:16:33,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 07:16:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1293.75 | bwd_inner_microstep: 1293.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 07:16:37,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.44 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3595 [2024-06-10 07:16:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.98 | bwd_microstep: 1467.50 | bwd_inner_microstep: 1467.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512 [2024-06-10 07:16:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.04 | bwd_microstep: 1191.23 | bwd_inner_microstep: 1191.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506 [2024-06-10 07:16:42,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1349.55 | bwd_inner_microstep: 1349.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 07:16:44,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.19 | bwd_microstep: 1464.29 | bwd_inner_microstep: 1464.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 07:16:46,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.30 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554 [2024-06-10 07:16:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1299.42 | bwd_inner_microstep: 1299.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 07:16:50,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1398.65 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 07:16:52,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734 [2024-06-10 07:16:54,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1562.19 | bwd_inner_microstep: 1562.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-10 07:16:56,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1517.03 | bwd_inner_microstep: 1517.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3648 [2024-06-10 07:16:58,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1543.28 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377 [2024-06-10 07:17:00,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1239.91 | bwd_inner_microstep: 1239.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3540 [2024-06-10 07:17:02,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1438.70 | bwd_inner_microstep: 1438.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 07:17:04,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1343.20 | bwd_inner_microstep: 1343.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3814 [2024-06-10 07:17:06,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.39 | bwd_microstep: 1818.29 | bwd_inner_microstep: 1818.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765 [2024-06-10 07:17:09,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.83 | bwd_microstep: 1736.46 | bwd_inner_microstep: 1736.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3755 [2024-06-10 07:17:11,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.57 | bwd_microstep: 1434.33 | bwd_inner_microstep: 1434.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 07:17:13,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.75 | bwd_microstep: 1542.97 | bwd_inner_microstep: 1542.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 07:17:15,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3387 [2024-06-10 07:17:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.17 | optimizer_step: 6.64 [2024-06-10 07:17:16,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.65 | bwd_microstep: 1283.31 | bwd_inner_microstep: 1275.12 | bwd_allreduce_microstep: 8.14 | step_microstep: 38.36 [2024-06-10 07:17:16,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17078.46 | bwd: 45673.47 | bwd_inner: 45664.43 | bwd_allreduce: 8.36 | step: 39.99 {'loss': 1.2884, 'learning_rate': 3.6285579134374655e-05, 'epoch': 0.22} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1873 [2024-06-10 07:17:17,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.91 | bwd_microstep: 766.43 | bwd_inner_microstep: 766.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865 [2024-06-10 07:17:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.14 | bwd_microstep: 1563.05 | bwd_inner_microstep: 1563.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4205 [2024-06-10 07:17:22,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.29 | bwd_microstep: 1757.05 | bwd_inner_microstep: 1757.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 07:17:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.35 | bwd_microstep: 1446.85 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-10 07:17:26,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1357.04 | bwd_inner_microstep: 1357.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:17:28,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.96 | bwd_inner_microstep: 1245.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 07:17:29,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1245.90 | bwd_inner_microstep: 1245.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 07:17:31,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1256.00 | bwd_inner_microstep: 1255.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961 [2024-06-10 07:17:32,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 888.21 | bwd_inner_microstep: 888.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 07:17:34,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 07:17:36,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1282.26 | bwd_inner_microstep: 1282.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-10 07:17:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1284.92 | bwd_inner_microstep: 1284.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-10 07:17:40,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1615.02 | bwd_inner_microstep: 1614.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 07:17:42,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 07:17:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1342.57 | bwd_inner_microstep: 1342.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317 [2024-06-10 07:17:45,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.82 | bwd_microstep: 983.60 | bwd_inner_microstep: 983.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889 [2024-06-10 07:17:47,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.78 | bwd_microstep: 1590.87 | bwd_inner_microstep: 1590.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630 [2024-06-10 07:17:49,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.23 | bwd_microstep: 1314.31 | bwd_inner_microstep: 1314.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674 [2024-06-10 07:17:51,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.68 | bwd_microstep: 1687.70 | bwd_inner_microstep: 1687.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3625 [2024-06-10 07:17:54,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.76 | bwd_microstep: 1557.49 | bwd_inner_microstep: 1557.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816 [2024-06-10 07:17:56,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1623.35 | bwd_inner_microstep: 1623.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2018 [2024-06-10 07:17:57,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.92 | bwd_microstep: 841.09 | bwd_inner_microstep: 841.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032 [2024-06-10 07:17:58,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.50 | bwd_microstep: 807.35 | bwd_inner_microstep: 807.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816 [2024-06-10 07:18:00,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.75 | bwd_microstep: 1514.33 | bwd_inner_microstep: 1514.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533 [2024-06-10 07:18:02,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.73 | bwd_microstep: 1228.09 | bwd_inner_microstep: 1228.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 07:18:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1523.16 | bwd_inner_microstep: 1523.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 07:18:06,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1392.76 | bwd_inner_microstep: 1392.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 07:18:07,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.99 | bwd_microstep: 699.68 | bwd_inner_microstep: 699.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-10 07:18:09,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1322.30 | bwd_inner_microstep: 1322.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 07:18:11,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1345.57 | bwd_inner_microstep: 1345.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3455 [2024-06-10 07:18:13,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 07:18:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.25 | optimizer_step: 6.58 [2024-06-10 07:18:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 7093.34 | bwd_inner_microstep: 1531.96 | bwd_allreduce_microstep: 5561.33 | step_microstep: 38.89 [2024-06-10 07:18:20,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15805.18 | bwd: 47897.24 | bwd_inner: 42334.90 | bwd_allreduce: 5561.61 | step: 40.57 {'loss': 1.3223, 'learning_rate': 3.626376298818911e-05, 'epoch': 0.22} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478 [2024-06-10 07:18:23,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.70 | bwd_microstep: 1571.23 | bwd_inner_microstep: 1571.16 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974 [2024-06-10 07:18:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.01 | bwd_microstep: 1600.99 | bwd_inner_microstep: 1600.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-10 07:18:27,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 07:18:29,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1449.96 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 07:18:31,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752 [2024-06-10 07:18:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1633.85 | bwd_inner_microstep: 1633.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3734 [2024-06-10 07:18:35,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.02 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 07:18:37,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 07:18:38,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.84 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 07:18:40,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1275.09 | bwd_inner_microstep: 1275.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 07:18:42,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2663 [2024-06-10 07:18:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.34 | bwd_microstep: 1118.21 | bwd_inner_microstep: 1118.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 07:18:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701 [2024-06-10 07:18:48,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.28 | bwd_microstep: 1616.04 | bwd_inner_microstep: 1616.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502 [2024-06-10 07:18:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1429.51 | bwd_inner_microstep: 1429.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426 [2024-06-10 07:18:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1247.08 | bwd_inner_microstep: 1247.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 07:18:54,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.65 | bwd_microstep: 1599.35 | bwd_inner_microstep: 1599.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3840 [2024-06-10 07:18:56,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.30 | bwd_microstep: 1585.69 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 07:18:58,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1276.53 | bwd_inner_microstep: 1276.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 07:18:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1289.24 | bwd_inner_microstep: 1289.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-10 07:19:01,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.60 | bwd_inner_microstep: 1189.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 07:19:03,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1509.23 | bwd_inner_microstep: 1509.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 07:19:05,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1525.96 | bwd_inner_microstep: 1525.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-10 07:19:07,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.22 | bwd_microstep: 1623.06 | bwd_inner_microstep: 1623.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3551 [2024-06-10 07:19:10,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.64 | bwd_microstep: 1561.82 | bwd_inner_microstep: 1561.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 07:19:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.83 | bwd_microstep: 1449.99 | bwd_inner_microstep: 1449.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 07:19:14,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1491.16 | bwd_inner_microstep: 1491.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 07:19:16,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-10 07:19:18,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1477.86 | bwd_inner_microstep: 1477.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563 [2024-06-10 07:19:20,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.00 | bwd_microstep: 1330.05 | bwd_inner_microstep: 1330.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 07:19:22,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.21 | bwd_microstep: 1748.99 | bwd_inner_microstep: 1748.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 07:19:24,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.19 | optimizer_step: 6.62 [2024-06-10 07:19:24,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1382.32 | bwd_inner_microstep: 1374.63 | bwd_allreduce_microstep: 7.65 | step_microstep: 38.50 [2024-06-10 07:19:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17197.76 | bwd: 45983.78 | bwd_inner: 45975.18 | bwd_allreduce: 7.90 | step: 40.16 {'loss': 1.3053, 'learning_rate': 3.624188956111487e-05, 'epoch': 0.22} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2008 [2024-06-10 07:19:25,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.55 | bwd_microstep: 891.56 | bwd_inner_microstep: 891.47 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2866 [2024-06-10 07:19:27,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.40 | bwd_microstep: 1027.04 | bwd_inner_microstep: 1027.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-10 07:19:29,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.78 | bwd_microstep: 1413.15 | bwd_inner_microstep: 1413.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 07:19:30,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 07:19:32,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 07:19:34,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.56 | bwd_microstep: 1255.26 | bwd_inner_microstep: 1255.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 07:19:36,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1245.69 | bwd_inner_microstep: 1245.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 07:19:37,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1376.82 | bwd_inner_microstep: 1376.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894 [2024-06-10 07:19:38,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.59 | bwd_microstep: 683.63 | bwd_inner_microstep: 683.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-10 07:19:40,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1524.83 | bwd_inner_microstep: 1524.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502 [2024-06-10 07:19:42,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1446.64 | bwd_inner_microstep: 1446.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2130 [2024-06-10 07:19:44,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.83 | bwd_microstep: 770.26 | bwd_inner_microstep: 770.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 07:19:45,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 800.29 | bwd_inner_microstep: 800.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3530 [2024-06-10 07:19:47,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1623.66 | bwd_inner_microstep: 1623.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4002 [2024-06-10 07:19:49,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.40 | bwd_microstep: 1811.45 | bwd_inner_microstep: 1811.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 07:19:51,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.58 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876 [2024-06-10 07:19:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.82 | bwd_microstep: 680.72 | bwd_inner_microstep: 680.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929 [2024-06-10 07:19:53,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.33 | bwd_microstep: 726.03 | bwd_inner_microstep: 726.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639 [2024-06-10 07:19:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.65 | bwd_microstep: 1319.74 | bwd_inner_microstep: 1319.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827 [2024-06-10 07:19:57,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1490.22 | bwd_inner_microstep: 1490.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665 [2024-06-10 07:19:59,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1422.43 | bwd_inner_microstep: 1422.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-10 07:20:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.80 | bwd_microstep: 976.52 | bwd_inner_microstep: 976.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 07:20:02,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1381.38 | bwd_inner_microstep: 1381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 07:20:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.73 | bwd_microstep: 1502.73 | bwd_inner_microstep: 1502.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543 [2024-06-10 07:20:06,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1451.09 | bwd_inner_microstep: 1451.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2269 [2024-06-10 07:20:08,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.19 | bwd_microstep: 877.94 | bwd_inner_microstep: 877.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 07:20:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1401.78 | bwd_inner_microstep: 1401.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 07:20:12,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.76 | bwd_microstep: 1651.38 | bwd_inner_microstep: 1651.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 07:20:14,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 1646.60 | bwd_inner_microstep: 1646.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011 [2024-06-10 07:20:15,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.45 | bwd_microstep: 834.71 | bwd_inner_microstep: 834.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3766 [2024-06-10 07:20:18,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 1742.70 | bwd_inner_microstep: 1742.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 07:20:25,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.39 | optimizer_step: 6.59 [2024-06-10 07:20:25,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 7341.07 | bwd_inner_microstep: 1682.81 | bwd_allreduce_microstep: 5658.19 | step_microstep: 39.59 [2024-06-10 07:20:25,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15100.60 | bwd: 46073.31 | bwd_inner: 40414.11 | bwd_allreduce: 5658.48 | step: 41.20 {'loss': 1.3032, 'learning_rate': 3.621995893019003e-05, 'epoch': 0.22} █▏ | 380/1726 [6:37:47<22:59:07, 61.48s/it] 22%|██▏ | 380/1726 [6:37:47<22:59:07, 61.48s/it] 22%|██▏ | 381/1726 [6:38:50<23:08:25, 61.94s/it] 22%|██▏ | 381/1726 [6:38:50<23:08:25, 61.94s/it] 22%|██▏ | 382/1726 [6:39:53<23:15:17, 62.29s/it] 22%|██▏ | 382/1726 [6:39:53<23:15:17, 62.29s/it] 22%|██▏ | 383/1726 [6:40:57<23:26:07, 62.82s/it] 22%|██▏ | 383/1726 [6:40:57<23:26:07, 62.82s/it] 22%|██▏ | 384/1726 [6:42:01<23:29:54, 63.04s/it] 22%|██▏ | 384/1726 [6:42:01<23:29:54, 63.04s/it] 22%|██▏ | 385/1726 [6:43:02<23:18:42, 62.58s/it] 22%|█dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 07:20:27,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1365.20 | bwd_inner_microstep: 1365.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 07:20:29,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.30 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 07:20:31,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.74 | bwd_microstep: 1344.80 | bwd_inner_microstep: 1344.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2252 [2024-06-10 07:20:32,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.75 | bwd_microstep: 967.18 | bwd_inner_microstep: 967.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 07:20:34,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 791.96 | bwd_inner_microstep: 791.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783 [2024-06-10 07:20:36,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.30 | bwd_microstep: 1443.88 | bwd_inner_microstep: 1443.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 07:20:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.47 | bwd_microstep: 1278.23 | bwd_inner_microstep: 1278.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 07:20:39,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 07:20:41,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1388.73 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 07:20:43,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1476.51 | bwd_inner_microstep: 1476.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 07:20:45,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1389.69 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 07:20:46,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 806.36 | bwd_inner_microstep: 806.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 07:20:48,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1286.34 | bwd_inner_microstep: 1286.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-10 07:20:50,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.65 | bwd_microstep: 1614.99 | bwd_inner_microstep: 1614.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-10 07:20:52,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.89 | bwd_microstep: 1601.00 | bwd_inner_microstep: 1600.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-10 07:20:53,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.40 | bwd_microstep: 787.22 | bwd_inner_microstep: 787.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143 [2024-06-10 07:20:55,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.31 | bwd_microstep: 834.63 | bwd_inner_microstep: 834.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 07:20:56,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1392.87 | bwd_inner_microstep: 1392.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 07:20:58,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1397.80 | bwd_inner_microstep: 1397.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659 [2024-06-10 07:21:00,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 07:21:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.71 | bwd_microstep: 1757.21 | bwd_inner_microstep: 1757.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 07:21:05,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 07:21:07,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1485.61 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3819 [2024-06-10 07:21:09,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1596.65 | bwd_inner_microstep: 1596.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3432 [2024-06-10 07:21:11,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.57 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900 [2024-06-10 07:21:12,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.27 | bwd_microstep: 779.90 | bwd_inner_microstep: 779.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-10 07:21:14,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1285.10 | bwd_inner_microstep: 1285.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424 [2024-06-10 07:21:16,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 07:21:18,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1590.60 | bwd_inner_microstep: 1590.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3584 [2024-06-10 07:21:20,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.52 | bwd_microstep: 1568.64 | bwd_inner_microstep: 1568.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 07:21:23,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.34 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 07:21:28,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-10 07:21:28,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.49 | bwd_microstep: 4402.39 | bwd_inner_microstep: 1949.81 | bwd_allreduce_microstep: 2452.51 | step_microstep: 39.24 [2024-06-10 07:21:28,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.75 | bwd: 45726.75 | bwd_inner: 43273.28 | bwd_allreduce: 2452.76 | step: 40.81 {'loss': 1.3406, 'learning_rate': 3.6197971172654156e-05, 'epoch': 0.22} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 07:21:29,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1336.75 | bwd_inner_microstep: 1336.68 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-10 07:21:31,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 793.13 | bwd_inner_microstep: 793.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3869 [2024-06-10 07:21:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.67 | bwd_microstep: 1665.39 | bwd_inner_microstep: 1665.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 07:21:34,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.22 | bwd_microstep: 699.35 | bwd_inner_microstep: 699.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-10 07:21:36,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1445.25 | bwd_inner_microstep: 1445.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 07:21:38,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1482.29 | bwd_inner_microstep: 1482.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 07:21:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1251.22 | bwd_inner_microstep: 1251.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-10 07:21:41,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 815.28 | bwd_inner_microstep: 815.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 07:21:43,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1527.49 | bwd_inner_microstep: 1527.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963 [2024-06-10 07:21:44,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.36 | bwd_microstep: 734.88 | bwd_inner_microstep: 734.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2169 [2024-06-10 07:21:45,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.09 | bwd_microstep: 983.45 | bwd_inner_microstep: 983.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 07:21:47,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3409 [2024-06-10 07:21:49,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1509.24 | bwd_inner_microstep: 1509.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 07:21:51,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.84 | bwd_microstep: 1488.28 | bwd_inner_microstep: 1488.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3610 [2024-06-10 07:21:53,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.87 | bwd_microstep: 1671.95 | bwd_inner_microstep: 1671.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-10 07:21:55,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1489.71 | bwd_inner_microstep: 1489.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298 [2024-06-10 07:21:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.31 | bwd_microstep: 978.75 | bwd_inner_microstep: 978.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689 [2024-06-10 07:21:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1329.76 | bwd_inner_microstep: 1329.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617 [2024-06-10 07:22:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1312.87 | bwd_inner_microstep: 1312.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3453 [2024-06-10 07:22:02,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1338.93 | bwd_inner_microstep: 1338.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 07:22:04,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1510.63 | bwd_inner_microstep: 1510.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 07:22:06,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1350.95 | bwd_inner_microstep: 1350.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2760 [2024-06-10 07:22:08,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.16 | bwd_microstep: 1048.25 | bwd_inner_microstep: 1048.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3629 [2024-06-10 07:22:09,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.48 | bwd_microstep: 1248.57 | bwd_inner_microstep: 1248.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 07:22:12,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1493.20 | bwd_inner_microstep: 1493.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591 [2024-06-10 07:22:13,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1338.38 | bwd_inner_microstep: 1338.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441 [2024-06-10 07:22:15,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.67 | bwd_microstep: 953.30 | bwd_inner_microstep: 953.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558 [2024-06-10 07:22:17,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.49 | bwd_microstep: 1562.83 | bwd_inner_microstep: 1562.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 07:22:19,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1489.91 | bwd_inner_microstep: 1489.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 07:22:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080 [2024-06-10 07:22:22,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 816.66 | bwd_inner_microstep: 816.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 07:22:28,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 07:22:28,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 5618.05 | bwd_inner_microstep: 1434.03 | bwd_allreduce_microstep: 4183.98 | step_microstep: 38.72 [2024-06-10 07:22:28,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15255.36 | bwd: 45022.85 | bwd_inner: 40837.91 | bwd_allreduce: 4184.24 | step: 40.38 {'loss': 1.2818, 'learning_rate': 3.617592636594801e-05, 'epoch': 0.22} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 07:22:30,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1331.43 | bwd_inner_microstep: 1331.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 07:22:32,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1243.16 | bwd_inner_microstep: 1243.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 07:22:34,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 07:22:36,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 07:22:37,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745 [2024-06-10 07:22:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.04 | bwd_microstep: 1340.67 | bwd_inner_microstep: 1340.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2237 [2024-06-10 07:22:41,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.82 | bwd_microstep: 961.92 | bwd_inner_microstep: 961.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498 [2024-06-10 07:22:42,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1322.38 | bwd_inner_microstep: 1322.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 07:22:45,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.45 | bwd_microstep: 1498.85 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727 [2024-06-10 07:22:46,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.09 | bwd_microstep: 1431.98 | bwd_inner_microstep: 1431.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-10 07:22:48,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1429.92 | bwd_inner_microstep: 1429.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3634 [2024-06-10 07:22:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1446.41 | bwd_inner_microstep: 1446.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 07:22:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.84 | bwd_microstep: 1254.59 | bwd_inner_microstep: 1254.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-10 07:22:54,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1407.49 | bwd_inner_microstep: 1407.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3515 [2024-06-10 07:22:56,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1442.04 | bwd_inner_microstep: 1442.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-10 07:22:58,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1440.01 | bwd_inner_microstep: 1439.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502 [2024-06-10 07:23:00,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3842 [2024-06-10 07:23:03,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 680.38 | bwd_microstep: 1867.78 | bwd_inner_microstep: 1867.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 07:23:04,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1347.84 | bwd_inner_microstep: 1347.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441 [2024-06-10 07:23:06,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.64 | bwd_microstep: 1194.39 | bwd_inner_microstep: 1194.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 07:23:08,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1501.95 | bwd_inner_microstep: 1501.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 07:23:10,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1314.04 | bwd_inner_microstep: 1314.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 07:23:12,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.97 | bwd_microstep: 1163.15 | bwd_inner_microstep: 1163.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 07:23:13,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.38 | bwd_microstep: 1257.03 | bwd_inner_microstep: 1257.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 07:23:15,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1514.38 | bwd_inner_microstep: 1514.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-10 07:23:17,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.32 | bwd_microstep: 976.48 | bwd_inner_microstep: 976.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 07:23:19,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801 [2024-06-10 07:23:21,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.85 | bwd_microstep: 1751.52 | bwd_inner_microstep: 1751.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3759 [2024-06-10 07:23:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1499.65 | bwd_inner_microstep: 1499.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2683 [2024-06-10 07:23:24,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.87 | bwd_microstep: 961.31 | bwd_inner_microstep: 961.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3587 [2024-06-10 07:23:27,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.37 | bwd_microstep: 1584.96 | bwd_inner_microstep: 1584.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3717 [2024-06-10 07:23:29,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.62 [2024-06-10 07:23:29,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1850.65 | bwd_inner_microstep: 1558.51 | bwd_allreduce_microstep: 292.10 | step_microstep: 38.38 [2024-06-10 07:23:29,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16368.75 | bwd: 44079.65 | bwd_inner: 43786.62 | bwd_allreduce: 292.33 | step: 39.95 {'loss': 1.2944, 'learning_rate': 3.61538245877133e-05, 'epoch': 0.22} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 07:23:31,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.72 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 07:23:33,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 07:23:35,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1376.75 | bwd_inner_microstep: 1376.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859 [2024-06-10 07:23:37,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.27 | bwd_microstep: 1566.52 | bwd_inner_microstep: 1566.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 07:23:39,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.58 | bwd_microstep: 1481.14 | bwd_inner_microstep: 1481.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 07:23:41,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.18 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-10 07:23:43,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.31 | bwd_microstep: 1185.22 | bwd_inner_microstep: 1185.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743 [2024-06-10 07:23:45,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1534.30 | bwd_inner_microstep: 1534.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 07:23:47,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 07:23:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.12 | bwd_microstep: 1352.86 | bwd_inner_microstep: 1352.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 07:23:50,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1400.99 | bwd_inner_microstep: 1400.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 07:23:53,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.48 | bwd_microstep: 1487.92 | bwd_inner_microstep: 1487.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 07:23:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.43 | bwd_microstep: 1386.61 | bwd_inner_microstep: 1386.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918 [2024-06-10 07:23:55,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.96 | bwd_microstep: 685.80 | bwd_inner_microstep: 685.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 07:23:57,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.38 | bwd_microstep: 1484.96 | bwd_inner_microstep: 1484.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 07:23:59,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1249.46 | bwd_inner_microstep: 1249.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846 [2024-06-10 07:24:01,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1460.30 | bwd_inner_microstep: 1460.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 07:24:03,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1311.94 | bwd_inner_microstep: 1311.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 07:24:05,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1390.00 | bwd_inner_microstep: 1389.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 07:24:07,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1293.14 | bwd_inner_microstep: 1293.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 07:24:09,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1282.67 | bwd_inner_microstep: 1282.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480 [2024-06-10 07:24:10,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1442.00 | bwd_inner_microstep: 1441.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816 [2024-06-10 07:24:13,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.26 | bwd_microstep: 1691.94 | bwd_inner_microstep: 1691.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 07:24:15,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1285.96 | bwd_inner_microstep: 1285.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681 [2024-06-10 07:24:17,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.08 | bwd_microstep: 1458.30 | bwd_inner_microstep: 1458.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909 [2024-06-10 07:24:18,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.62 | bwd_microstep: 685.46 | bwd_inner_microstep: 685.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3808 [2024-06-10 07:24:20,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.65 | bwd_microstep: 1857.94 | bwd_inner_microstep: 1857.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 07:24:22,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1650.43 | bwd_inner_microstep: 1650.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611 [2024-06-10 07:24:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.07 | bwd_microstep: 1598.36 | bwd_inner_microstep: 1598.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588 [2024-06-10 07:24:27,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 07:24:28,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3808 [2024-06-10 07:24:31,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 07:24:31,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.18 | bwd_microstep: 1771.66 | bwd_inner_microstep: 1550.23 | bwd_allreduce_microstep: 221.38 | step_microstep: 38.50 [2024-06-10 07:24:31,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16712.50 | bwd: 44820.40 | bwd_inner: 44598.05 | bwd_allreduce: 221.64 | step: 40.14 {'loss': 1.3066, 'learning_rate': 3.6131665915792374e-05, 'epoch': 0.23} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 07:24:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1480.19 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3898 [2024-06-10 07:24:35,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 07:24:37,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1379.39 | bwd_inner_microstep: 1379.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 07:24:38,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1244.02 | bwd_inner_microstep: 1243.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134 [2024-06-10 07:24:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.38 | bwd_microstep: 1742.01 | bwd_inner_microstep: 1741.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 07:24:43,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1278.16 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 07:24:45,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1386.26 | bwd_inner_microstep: 1386.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:24:46,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1247.84 | bwd_inner_microstep: 1247.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 07:24:48,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.90 | bwd_microstep: 1195.89 | bwd_inner_microstep: 1195.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3433 [2024-06-10 07:24:50,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1403.69 | bwd_inner_microstep: 1403.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-10 07:24:51,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.45 | bwd_microstep: 894.98 | bwd_inner_microstep: 894.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 07:24:53,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.90 | bwd_microstep: 1596.31 | bwd_inner_microstep: 1596.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503 [2024-06-10 07:24:55,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1445.41 | bwd_inner_microstep: 1445.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-10 07:24:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1575.62 | bwd_inner_microstep: 1575.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508 [2024-06-10 07:24:59,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.97 | bwd_microstep: 1225.38 | bwd_inner_microstep: 1225.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679 [2024-06-10 07:25:01,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1430.79 | bwd_inner_microstep: 1430.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092 [2024-06-10 07:25:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.90 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831 [2024-06-10 07:25:04,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1359.54 | bwd_inner_microstep: 1359.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 07:25:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 07:25:08,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1434.19 | bwd_inner_microstep: 1434.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 07:25:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.44 | bwd_microstep: 1456.95 | bwd_inner_microstep: 1456.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 07:25:12,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.66 | bwd_microstep: 1460.11 | bwd_inner_microstep: 1460.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 07:25:14,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.02 | bwd_microstep: 1556.58 | bwd_inner_microstep: 1556.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 07:25:16,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-10 07:25:19,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1608.66 | bwd_inner_microstep: 1608.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-10 07:25:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1327.72 | bwd_inner_microstep: 1327.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-10 07:25:23,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.43 | bwd_microstep: 1548.69 | bwd_inner_microstep: 1548.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588 [2024-06-10 07:25:25,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.79 | bwd_microstep: 1606.64 | bwd_inner_microstep: 1606.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3740 [2024-06-10 07:25:27,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.45 | bwd_microstep: 1731.34 | bwd_inner_microstep: 1731.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-10 07:25:29,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 972.03 | bwd_inner_microstep: 972.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3799 [2024-06-10 07:25:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.95 | bwd_microstep: 1823.36 | bwd_inner_microstep: 1823.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 07:25:33,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.17 | optimizer_step: 6.64 [2024-06-10 07:25:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.74 | bwd_microstep: 1529.72 | bwd_inner_microstep: 1522.06 | bwd_allreduce_microstep: 7.62 | step_microstep: 38.40 [2024-06-10 07:25:33,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16848.75 | bwd: 45102.99 | bwd_inner: 45094.41 | bwd_allreduce: 7.88 | step: 40.00 {'loss': 1.301, 'learning_rate': 3.610945042822794e-05, 'epoch': 0.23} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 07:25:35,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.75 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3981 [2024-06-10 07:25:37,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.91 | bwd_microstep: 1704.55 | bwd_inner_microstep: 1704.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 07:25:39,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772 [2024-06-10 07:25:41,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1345.50 | bwd_inner_microstep: 1345.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 07:25:43,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.78 | bwd_microstep: 1152.57 | bwd_inner_microstep: 1152.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-10 07:25:45,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.38 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2070 [2024-06-10 07:25:46,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.47 | bwd_microstep: 789.47 | bwd_inner_microstep: 789.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 07:25:47,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 07:25:49,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1251.97 | bwd_inner_microstep: 1251.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3500 [2024-06-10 07:25:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 07:25:52,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1343.49 | bwd_inner_microstep: 1343.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 07:25:55,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.35 | bwd_microstep: 1529.09 | bwd_inner_microstep: 1529.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656 [2024-06-10 07:25:57,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1616.75 | bwd_inner_microstep: 1616.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-10 07:25:59,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 07:26:01,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989 [2024-06-10 07:26:02,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.01 | bwd_microstep: 898.87 | bwd_inner_microstep: 898.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 07:26:04,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1394.17 | bwd_inner_microstep: 1394.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109 [2024-06-10 07:26:05,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.06 | bwd_microstep: 921.27 | bwd_inner_microstep: 921.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 07:26:07,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1398.84 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024 [2024-06-10 07:26:08,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.16 | bwd_microstep: 744.98 | bwd_inner_microstep: 744.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 07:26:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1292.09 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 07:26:12,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1462.77 | bwd_inner_microstep: 1462.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 07:26:14,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1297.83 | bwd_inner_microstep: 1297.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111 [2024-06-10 07:26:15,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.76 | bwd_microstep: 923.98 | bwd_inner_microstep: 923.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533 [2024-06-10 07:26:17,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.28 | bwd_microstep: 1520.25 | bwd_inner_microstep: 1520.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 07:26:19,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 1513.94 | bwd_inner_microstep: 1513.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3810 [2024-06-10 07:26:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.46 | bwd_microstep: 1386.86 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 07:26:23,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.40 | bwd_microstep: 1563.59 | bwd_inner_microstep: 1563.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 07:26:26,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.39 | bwd_microstep: 1600.59 | bwd_inner_microstep: 1600.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 07:26:28,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.25 | bwd_microstep: 1483.06 | bwd_inner_microstep: 1483.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 07:26:30,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1463.93 | bwd_inner_microstep: 1463.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 07:26:34,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.27 | optimizer_step: 6.60 [2024-06-10 07:26:34,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 3969.02 | bwd_inner_microstep: 1442.11 | bwd_allreduce_microstep: 2526.86 | step_microstep: 38.99 [2024-06-10 07:26:34,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15798.72 | bwd: 44870.26 | bwd_inner: 42342.46 | bwd_allreduce: 2527.10 | step: 40.57 █▏ | 385/1726 [6:43:02<23:18:42, 62.58s/it] 22%|██▏ | 386/1726 [6:44:04<23:14:13, 62.43s/it] 22%|██▏ | 386/1726 [6:44:04<23:14:13, 62.43s/it] 22%|██▏ | 387/1726 [6:45:05<23:01:06, 61.89s/it] 22%|██▏ | 387/1726 [6:45:05<23:01:06, 61.89s/it] 22%|██▏ | 388/1726 [6:46:06<22:52:46, 61.56s/it] 22%|██▏ | 388/1726 [6:46:06<22:52:46, 61.56s/it] 23%|██▎ | 389/1726 [6:47:08<22:53:55, 61.66s/it] 23%|██▎ | 389/1726 [6:47:08<22:53:55, 61.66s/it] 23%|██▎ | 390/1726 [6:48:10<22:57:14, 61.85s/it] 23%|██▎ | 390/1726 [6:48:10<22:57:14, 61.85s/it] 23%|██▎ | 391/1726 [6:49:11<22:50:38, 61.60s/it] {'loss': 1.3549, 'learning_rate': 3.608717820326285e-05, 'epoch': 0.23} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1971 [2024-06-10 07:26:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.58 | bwd_microstep: 730.33 | bwd_inner_microstep: 730.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 07:26:36,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 805.22 | bwd_inner_microstep: 805.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 07:26:38,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.10 | bwd_microstep: 1384.53 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437 [2024-06-10 07:26:40,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.26 | bwd_microstep: 1312.37 | bwd_inner_microstep: 1312.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 07:26:42,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.83 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 07:26:44,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.33 | bwd_microstep: 1481.49 | bwd_inner_microstep: 1481.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 07:26:46,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1296.52 | bwd_inner_microstep: 1296.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4087 [2024-06-10 07:26:48,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.64 | bwd_microstep: 1527.19 | bwd_inner_microstep: 1527.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635 [2024-06-10 07:26:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.63 | bwd_microstep: 1418.66 | bwd_inner_microstep: 1418.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 07:26:52,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1394.40 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 07:26:54,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1290.60 | bwd_inner_microstep: 1290.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 07:26:55,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 07:26:57,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-10 07:26:58,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.98 | bwd_microstep: 801.74 | bwd_inner_microstep: 801.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3427 [2024-06-10 07:27:00,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1297.23 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 07:27:02,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-10 07:27:04,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1450.79 | bwd_inner_microstep: 1450.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2670 [2024-06-10 07:27:06,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.52 | bwd_microstep: 1119.72 | bwd_inner_microstep: 1119.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633 [2024-06-10 07:27:08,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1538.00 | bwd_inner_microstep: 1537.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 07:27:10,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.54 | bwd_microstep: 1349.19 | bwd_inner_microstep: 1349.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3853 [2024-06-10 07:27:12,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.96 | bwd_microstep: 1627.39 | bwd_inner_microstep: 1627.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 07:27:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1409.37 | bwd_inner_microstep: 1409.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 07:27:16,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.89 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3549 [2024-06-10 07:27:18,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.62 | bwd_microstep: 1691.64 | bwd_inner_microstep: 1691.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3487 [2024-06-10 07:27:20,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1331.91 | bwd_inner_microstep: 1331.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-10 07:27:22,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1412.13 | bwd_inner_microstep: 1412.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 07:27:23,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3453 [2024-06-10 07:27:25,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.89 | bwd_microstep: 1229.06 | bwd_inner_microstep: 1229.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732 [2024-06-10 07:27:27,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1628.45 | bwd_inner_microstep: 1628.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1996 [2024-06-10 07:27:28,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 772.30 | bwd_inner_microstep: 772.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 07:27:30,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1492.32 | bwd_inner_microstep: 1492.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-10 07:27:35,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-10 07:27:35,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 4431.26 | bwd_inner_microstep: 1744.48 | bwd_allreduce_microstep: 2686.72 | step_microstep: 38.66 [2024-06-10 07:27:35,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15766.10 | bwd: 44936.57 | bwd_inner: 42248.84 | bwd_allreduce: 2687.00 | step: 40.22 {'loss': 1.2497, 'learning_rate': 3.6064849319339764e-05, 'epoch': 0.23} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 07:27:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 07:27:39,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1280.37 | bwd_inner_microstep: 1280.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4308 [2024-06-10 07:27:41,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1511.56 | bwd_inner_microstep: 1511.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 07:27:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1241.35 | bwd_inner_microstep: 1241.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 07:27:45,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.27 | bwd_microstep: 1444.46 | bwd_inner_microstep: 1444.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 07:27:47,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 07:27:48,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1186.84 | bwd_inner_microstep: 1186.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882 [2024-06-10 07:27:49,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.96 | bwd_microstep: 681.30 | bwd_inner_microstep: 681.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 07:27:51,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1534.11 | bwd_inner_microstep: 1534.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444 [2024-06-10 07:27:53,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.03 | bwd_microstep: 1219.20 | bwd_inner_microstep: 1219.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515 [2024-06-10 07:27:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1320.11 | bwd_inner_microstep: 1320.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 07:27:57,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1375.41 | bwd_inner_microstep: 1375.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 07:27:59,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1351.26 | bwd_inner_microstep: 1351.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070 [2024-06-10 07:28:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.45 | bwd_microstep: 754.70 | bwd_inner_microstep: 754.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479 [2024-06-10 07:28:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.25 | bwd_microstep: 1581.73 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3654 [2024-06-10 07:28:04,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.66 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552 [2024-06-10 07:28:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3842 [2024-06-10 07:28:08,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.17 | bwd_microstep: 1598.16 | bwd_inner_microstep: 1598.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 07:28:10,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1258.57 | bwd_inner_microstep: 1258.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 07:28:11,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.30 | bwd_microstep: 808.42 | bwd_inner_microstep: 808.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 07:28:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1394.67 | bwd_inner_microstep: 1394.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 07:28:15,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.17 | bwd_microstep: 1298.56 | bwd_inner_microstep: 1298.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 07:28:16,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 797.37 | bwd_inner_microstep: 797.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-10 07:28:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.68 | bwd_microstep: 1193.59 | bwd_inner_microstep: 1193.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721 [2024-06-10 07:28:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1338.69 | bwd_inner_microstep: 1338.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568 [2024-06-10 07:28:22,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.21 | bwd_microstep: 1595.67 | bwd_inner_microstep: 1595.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 07:28:23,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1258.14 | bwd_inner_microstep: 1258.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3591 [2024-06-10 07:28:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.94 | bwd_microstep: 1527.74 | bwd_inner_microstep: 1527.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2207 [2024-06-10 07:28:27,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.41 | bwd_microstep: 862.04 | bwd_inner_microstep: 862.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 07:28:29,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.39 | bwd_microstep: 1378.19 | bwd_inner_microstep: 1378.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 07:28:31,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453 [2024-06-10 07:28:37,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.63 [2024-06-10 07:28:37,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.38 | bwd_microstep: 6244.48 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 4901.59 | step_microstep: 38.80 [2024-06-10 07:28:37,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15506.07 | bwd: 46202.24 | bwd_inner: 41299.72 | bwd_allreduce: 4901.82 | step: 40.37 {'loss': 1.2801, 'learning_rate': 3.604246385510088e-05, 'epoch': 0.23} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958 [2024-06-10 07:28:39,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.11 | bwd_microstep: 889.33 | bwd_inner_microstep: 889.18 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 07:28:40,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 791.86 | bwd_inner_microstep: 791.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 07:28:42,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.37 | bwd_microstep: 1624.19 | bwd_inner_microstep: 1624.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 07:28:44,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1283.82 | bwd_inner_microstep: 1283.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 07:28:46,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 07:28:48,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1342.96 | bwd_inner_microstep: 1342.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 07:28:49,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 07:28:51,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.92 | bwd_microstep: 1476.15 | bwd_inner_microstep: 1476.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2192 [2024-06-10 07:28:53,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 890.23 | bwd_inner_microstep: 890.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739 [2024-06-10 07:28:55,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.38 | bwd_microstep: 1637.19 | bwd_inner_microstep: 1637.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-10 07:28:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 797.54 | bwd_inner_microstep: 797.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 07:28:57,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.10 | bwd_microstep: 797.90 | bwd_inner_microstep: 797.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142 [2024-06-10 07:28:58,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.08 | bwd_microstep: 930.45 | bwd_inner_microstep: 930.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 07:29:00,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1385.35 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429 [2024-06-10 07:29:02,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 07:29:04,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.46 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3653 [2024-06-10 07:29:06,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1573.32 | bwd_inner_microstep: 1573.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 07:29:08,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:29:10,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.24 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3541 [2024-06-10 07:29:12,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.84 | bwd_microstep: 1428.11 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3681 [2024-06-10 07:29:14,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.38 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 07:29:16,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1509.98 | bwd_inner_microstep: 1509.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 07:29:18,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2075 [2024-06-10 07:29:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 854.30 | bwd_inner_microstep: 854.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-10 07:29:21,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.18 | bwd_microstep: 808.08 | bwd_inner_microstep: 808.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2966 [2024-06-10 07:29:22,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.71 | bwd_microstep: 1204.37 | bwd_inner_microstep: 1204.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3529 [2024-06-10 07:29:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1227.68 | bwd_inner_microstep: 1227.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769 [2024-06-10 07:29:26,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.30 | bwd_microstep: 1576.19 | bwd_inner_microstep: 1576.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3579 [2024-06-10 07:29:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.53 | bwd_microstep: 1564.98 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3812 [2024-06-10 07:29:31,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.99 | bwd_microstep: 1708.06 | bwd_inner_microstep: 1708.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641 [2024-06-10 07:29:33,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.91 | bwd_microstep: 1439.88 | bwd_inner_microstep: 1439.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773 [2024-06-10 07:29:40,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 07:29:40,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.42 | bwd_microstep: 6238.02 | bwd_inner_microstep: 1980.02 | bwd_allreduce_microstep: 4257.94 | step_microstep: 38.72 [2024-06-10 07:29:40,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15597.52 | bwd: 46290.63 | bwd_inner: 42031.68 | bwd_allreduce: 4258.22 | step: 40.31 {'loss': 1.286, 'learning_rate': 3.602002188938769e-05, 'epoch': 0.23} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 07:29:41,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1367.95 | bwd_inner_microstep: 1367.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-10 07:29:43,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1445.21 | bwd_inner_microstep: 1445.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 07:29:45,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1273.89 | bwd_inner_microstep: 1273.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 07:29:47,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1269.10 | bwd_inner_microstep: 1269.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2431 [2024-06-10 07:29:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.05 | bwd_microstep: 846.61 | bwd_inner_microstep: 846.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883 [2024-06-10 07:29:49,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.05 | bwd_microstep: 708.49 | bwd_inner_microstep: 708.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 07:29:51,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 07:29:53,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.61 | bwd_microstep: 1275.61 | bwd_inner_microstep: 1275.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:29:55,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1383.08 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 07:29:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1272.66 | bwd_inner_microstep: 1272.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 07:29:58,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 794.22 | bwd_inner_microstep: 794.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 07:29:59,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 802.78 | bwd_inner_microstep: 802.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411 [2024-06-10 07:30:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1308.46 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910 [2024-06-10 07:30:02,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.67 | bwd_microstep: 777.53 | bwd_inner_microstep: 777.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3690 [2024-06-10 07:30:04,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1553.66 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 07:30:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 07:30:08,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1381.20 | bwd_inner_microstep: 1381.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3526 [2024-06-10 07:30:10,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.74 | bwd_microstep: 1358.76 | bwd_inner_microstep: 1358.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-10 07:30:11,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.53 | bwd_microstep: 698.77 | bwd_inner_microstep: 698.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 07:30:13,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.42 | bwd_microstep: 1506.48 | bwd_inner_microstep: 1506.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 07:30:14,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.90 | bwd_microstep: 801.11 | bwd_inner_microstep: 801.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677 [2024-06-10 07:30:16,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.63 | bwd_microstep: 1626.27 | bwd_inner_microstep: 1626.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 07:30:18,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1559.06 | bwd_inner_microstep: 1559.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 07:30:20,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1298.02 | bwd_inner_microstep: 1298.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 07:30:22,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 07:30:24,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.62 | bwd_microstep: 1755.94 | bwd_inner_microstep: 1755.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284 [2024-06-10 07:30:26,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.94 | bwd_microstep: 1076.94 | bwd_inner_microstep: 1076.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3466 [2024-06-10 07:30:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 07:30:30,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.20 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238 [2024-06-10 07:30:31,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.76 | bwd_microstep: 969.80 | bwd_inner_microstep: 969.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 07:30:32,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 699.54 | bwd_inner_microstep: 699.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3798 [2024-06-10 07:30:41,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.31 | optimizer_step: 6.61 [2024-06-10 07:30:41,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.56 | bwd_microstep: 7667.00 | bwd_inner_microstep: 1947.18 | bwd_allreduce_microstep: 5719.77 | step_microstep: 38.98 [2024-06-10 07:30:41,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14950.48 | bwd: 45807.18 | bwd_inner: 40086.50 | bwd_allreduce: 5720.00 | step: 40.51 {'loss': 1.3113, 'learning_rate': 3.59975235012407e-05, 'epoch': 0.23} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623 [2024-06-10 07:30:42,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.69 | bwd_microstep: 1301.50 | bwd_inner_microstep: 1301.37 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 07:30:44,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.84 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-10 07:30:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1327.24 | bwd_inner_microstep: 1327.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479 [2024-06-10 07:30:48,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.70 | bwd_microstep: 1183.60 | bwd_inner_microstep: 1183.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 07:30:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1447.66 | bwd_inner_microstep: 1447.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 07:30:51,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.18 | bwd_microstep: 789.76 | bwd_inner_microstep: 789.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 07:30:53,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3759 [2024-06-10 07:30:55,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-10 07:30:57,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1530.00 | bwd_inner_microstep: 1529.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069 [2024-06-10 07:30:58,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 821.50 | bwd_inner_microstep: 821.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 07:31:00,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.87 | bwd_microstep: 1153.20 | bwd_inner_microstep: 1153.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500 [2024-06-10 07:31:02,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.11 | bwd_microstep: 1441.50 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2012 [2024-06-10 07:31:03,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.32 | bwd_microstep: 854.57 | bwd_inner_microstep: 854.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 07:31:05,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1382.12 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-10 07:31:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.58 | bwd_microstep: 1577.82 | bwd_inner_microstep: 1577.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661 [2024-06-10 07:31:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.33 | bwd_microstep: 1717.57 | bwd_inner_microstep: 1717.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 07:31:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1500.78 | bwd_inner_microstep: 1500.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3669 [2024-06-10 07:31:13,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.53 | bwd_microstep: 1567.40 | bwd_inner_microstep: 1567.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 07:31:15,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.51 | bwd_microstep: 1453.84 | bwd_inner_microstep: 1453.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 07:31:17,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1502.94 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 07:31:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.08 | bwd_microstep: 1647.56 | bwd_inner_microstep: 1647.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3826 [2024-06-10 07:31:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.04 | bwd_microstep: 1855.21 | bwd_inner_microstep: 1855.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 07:31:24,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 07:31:26,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1383.68 | bwd_inner_microstep: 1383.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 07:31:28,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.34 | bwd_microstep: 1397.96 | bwd_inner_microstep: 1397.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 07:31:30,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.23 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476 [2024-06-10 07:31:32,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.89 | bwd_microstep: 1405.89 | bwd_inner_microstep: 1405.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591 [2024-06-10 07:31:34,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1339.72 | bwd_inner_microstep: 1339.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3573 [2024-06-10 07:31:36,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.55 | bwd_microstep: 1237.27 | bwd_inner_microstep: 1237.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 07:31:37,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1254.67 | bwd_inner_microstep: 1254.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3580 [2024-06-10 07:31:39,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.92 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 07:31:43,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 07:31:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 2973.76 | bwd_inner_microstep: 1451.43 | bwd_allreduce_microstep: 1522.28 | step_microstep: 38.89 [2024-06-10 07:31:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16403.77 | bwd: 45415.05 | bwd_inner: 43891.76 | bwd_allreduce: 1522.58 | step: 40.47 {'loss': 1.3275, 'learning_rate': 3.597496876989909e-05, 'epoch': 0.23} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 07:31:45,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1333.38 | bwd_inner_microstep: 1333.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 07:31:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1294.63 | bwd_inner_microstep: 1294.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 07:31:48,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1449.55 | bwd_inner_microstep: 1449.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475 [2024-06-10 07:31:50,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.45 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758 [2024-06-10 07:31:53,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.80 | bwd_microstep: 1643.35 | bwd_inner_microstep: 1643.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2484 [2024-06-10 07:31:54,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 953.70 | bwd_inner_microstep: 953.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3728 [2024-06-10 07:31:56,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1364.78 | bwd_inner_microstep: 1364.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 07:31:57,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.84 | bwd_microstep: 731.86 | bwd_inner_microstep: 731.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2914 [2024-06-10 07:31:58,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.87 | bwd_microstep: 1080.67 | bwd_inner_microstep: 1080.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409 [2024-06-10 07:32:00,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1305.62 | bwd_inner_microstep: 1305.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 07:32:02,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1344.47 | bwd_inner_microstep: 1344.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 07:32:04,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1345.88 | bwd_inner_microstep: 1345.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 07:32:06,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1499.10 | bwd_inner_microstep: 1499.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 07:32:08,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.46 | bwd_microstep: 1556.65 | bwd_inner_microstep: 1556.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467 [2024-06-10 07:32:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1342.48 | bwd_inner_microstep: 1342.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2104 [2024-06-10 07:32:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.68 | bwd_microstep: 920.32 | bwd_inner_microstep: 920.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 07:32:13,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1412.08 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-10 07:32:14,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.32 | bwd_microstep: 700.87 | bwd_inner_microstep: 700.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510 [2024-06-10 07:32:16,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.93 | bwd_microstep: 1225.47 | bwd_inner_microstep: 1225.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 07:32:18,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.79 | bwd_microstep: 1297.14 | bwd_inner_microstep: 1297.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3621 [2024-06-10 07:32:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.74 | bwd_microstep: 1250.20 | bwd_inner_microstep: 1250.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 07:32:21,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1400.90 | bwd_inner_microstep: 1400.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 07:32:24,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1556.66 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 07:32:25,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1398.49 | bwd_inner_microstep: 1398.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 07:32:28,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1537.26 | bwd_inner_microstep: 1537.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553 [2024-06-10 07:32:29,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1328.79 | bwd_inner_microstep: 1328.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286 [2024-06-10 07:32:31,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.19 | bwd_microstep: 881.63 | bwd_inner_microstep: 881.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571 [2024-06-10 07:32:33,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.81 | bwd_microstep: 1431.82 | bwd_inner_microstep: 1431.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066 [2024-06-10 07:32:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.11 | bwd_microstep: 818.03 | bwd_inner_microstep: 818.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827 [2024-06-10 07:32:36,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1616.74 | bwd_inner_microstep: 1616.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3618 [2024-06-10 07:32:38,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.24 | bwd_microstep: 1707.26 | bwd_inner_microstep: 1707.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 07:32:43,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 07:32:43,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 3579.38 | bwd_inner_microstep: 1870.90 | bwd_allreduce_microstep: 1708.44 | step_microstep: 38.86 [2024-06-10 07:32:43,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15704.74 | bwd: 43720.29 | bwd_inner: 42010.92 | bwd_allreduce: 1708.67 | step: 40.43 23%|██▎ | 391/1726 [6:49:11<22:50:38, 61.60s/it] 23%|██▎ | 392/1726 [6:50:12<22:45:55, 61.44s/it] 23%|██▎ | 392/1726 [6:50:12<22:45:55, 61.44s/it] 23%|██▎ | 393/1726 [6:51:14<22:48:58, 61.62s/it] 23%|██▎ | 393/1726 [6:51:14<22:48:58, 61.62s/it] 23%|██▎ | 394/1726 [6:52:16<22:52:03, 61.80s/it] 23%|██▎ | 394/1726 [6:52:16<22:52:03, 61.80s/it] 23%|██▎ | 395/1726 [6:53:17<22:46:20, 61.59s/it] 23%|██▎ | 395/1726 [6:53:17<22:46:20, 61.59s/it] 23%|██▎ | 396/1726 [6:54:20<22:49:08, 61.77s/it] 23%|██▎ | 396/1726 [6:54:20<22:49:08, 61.77s/it] 23{'loss': 1.3117, 'learning_rate': 3.5952357774800526e-05, 'epoch': 0.23} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 07:32:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 788.26 | bwd_inner_microstep: 788.16 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3910 [2024-06-10 07:32:46,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.84 | bwd_microstep: 1692.22 | bwd_inner_microstep: 1692.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3885 [2024-06-10 07:32:48,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1584.08 | bwd_inner_microstep: 1584.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 07:32:50,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1480.20 | bwd_inner_microstep: 1480.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-10 07:32:52,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1544.98 | bwd_inner_microstep: 1544.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822 [2024-06-10 07:32:54,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.03 | bwd_microstep: 1384.67 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 07:32:56,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1550.48 | bwd_inner_microstep: 1550.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 07:32:58,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.00 | bwd_inner_microstep: 1248.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 07:33:00,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.99 | bwd_microstep: 1392.03 | bwd_inner_microstep: 1392.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2060 [2024-06-10 07:33:01,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.54 | bwd_microstep: 755.28 | bwd_inner_microstep: 755.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 07:33:03,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.13 | bwd_microstep: 1617.53 | bwd_inner_microstep: 1617.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1860 [2024-06-10 07:33:04,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.07 | bwd_microstep: 708.01 | bwd_inner_microstep: 707.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-10 07:33:06,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1435.98 | bwd_inner_microstep: 1435.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-10 07:33:08,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3422 [2024-06-10 07:33:10,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.64 | bwd_microstep: 1474.35 | bwd_inner_microstep: 1474.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 07:33:12,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1290.80 | bwd_inner_microstep: 1290.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 07:33:14,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.06 | bwd_microstep: 1544.58 | bwd_inner_microstep: 1544.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 07:33:16,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1474.13 | bwd_inner_microstep: 1474.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1937 [2024-06-10 07:33:17,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.24 | bwd_microstep: 742.85 | bwd_inner_microstep: 742.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-10 07:33:19,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1494.52 | bwd_inner_microstep: 1494.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122 [2024-06-10 07:33:21,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 922.60 | bwd_inner_microstep: 922.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2286 [2024-06-10 07:33:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.93 | bwd_microstep: 814.81 | bwd_inner_microstep: 814.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176 [2024-06-10 07:33:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.35 | bwd_microstep: 855.91 | bwd_inner_microstep: 855.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239 [2024-06-10 07:33:24,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.30 | bwd_microstep: 903.53 | bwd_inner_microstep: 903.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 07:33:26,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1654.02 | bwd_inner_microstep: 1653.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 07:33:28,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240 [2024-06-10 07:33:30,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.02 | bwd_microstep: 964.87 | bwd_inner_microstep: 964.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2269 [2024-06-10 07:33:31,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 876.38 | bwd_inner_microstep: 876.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 07:33:33,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.97 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 07:33:35,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1399.70 | bwd_inner_microstep: 1399.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 857 [2024-06-10 07:33:36,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.75 | bwd_microstep: 348.33 | bwd_inner_microstep: 348.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263 [2024-06-10 07:33:43,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 07:33:43,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.57 | bwd_microstep: 6758.00 | bwd_inner_microstep: 1105.86 | bwd_allreduce_microstep: 5652.08 | step_microstep: 38.87 [2024-06-10 07:33:43,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14729.97 | bwd: 45111.87 | bwd_inner: 39458.77 | bwd_allreduce: 5652.37 | step: 40.50 {'loss': 1.3336, 'learning_rate': 3.5929690595580804e-05, 'epoch': 0.23} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949 [2024-06-10 07:33:44,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.39 | bwd_microstep: 881.63 | bwd_inner_microstep: 881.49 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 07:33:46,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 07:33:48,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1374.20 | bwd_inner_microstep: 1374.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601 [2024-06-10 07:33:50,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.09 | bwd_microstep: 1303.59 | bwd_inner_microstep: 1303.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 07:33:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.91 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 07:33:53,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1281.53 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 07:33:55,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1403.13 | bwd_inner_microstep: 1403.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662 [2024-06-10 07:33:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.40 | bwd_microstep: 1425.48 | bwd_inner_microstep: 1425.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 07:33:59,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1449.18 | bwd_inner_microstep: 1449.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 07:34:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1254.52 | bwd_inner_microstep: 1254.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 07:34:03,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961 [2024-06-10 07:34:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 889.91 | bwd_inner_microstep: 889.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895 [2024-06-10 07:34:05,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.19 | bwd_microstep: 774.57 | bwd_inner_microstep: 774.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949 [2024-06-10 07:34:06,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 704.56 | bwd_inner_microstep: 704.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 07:34:08,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1408.87 | bwd_inner_microstep: 1408.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 07:34:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1554.80 | bwd_inner_microstep: 1554.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 07:34:12,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 07:34:14,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.11 | bwd_microstep: 1429.91 | bwd_inner_microstep: 1429.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856 [2024-06-10 07:34:16,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.35 | bwd_microstep: 1664.72 | bwd_inner_microstep: 1664.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188 [2024-06-10 07:34:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.14 | bwd_microstep: 956.48 | bwd_inner_microstep: 956.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 07:34:19,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1256.25 | bwd_inner_microstep: 1256.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 07:34:22,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.67 | bwd_microstep: 1654.78 | bwd_inner_microstep: 1654.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2291 [2024-06-10 07:34:23,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.23 | bwd_microstep: 941.17 | bwd_inner_microstep: 941.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3543 [2024-06-10 07:34:25,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3604 [2024-06-10 07:34:27,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1271.03 | bwd_inner_microstep: 1271.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511 [2024-06-10 07:34:29,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1320.76 | bwd_inner_microstep: 1320.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3787 [2024-06-10 07:34:31,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.65 | bwd_microstep: 1751.45 | bwd_inner_microstep: 1751.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 07:34:33,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.05 | bwd_microstep: 1495.28 | bwd_inner_microstep: 1495.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2947 [2024-06-10 07:34:35,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.58 | bwd_microstep: 1290.05 | bwd_inner_microstep: 1290.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611 [2024-06-10 07:34:37,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.89 | bwd_microstep: 1605.38 | bwd_inner_microstep: 1605.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 07:34:40,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.14 | bwd_microstep: 1649.86 | bwd_inner_microstep: 1649.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 07:34:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 07:34:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.04 | bwd_microstep: 4006.96 | bwd_inner_microstep: 1657.59 | bwd_allreduce_microstep: 2349.31 | step_microstep: 38.66 [2024-06-10 07:34:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15928.58 | bwd: 45139.34 | bwd_inner: 42789.01 | bwd_allreduce: 2349.60 | step: 40.27 {'loss': 1.2773, 'learning_rate': 3.590696731207361e-05, 'epoch': 0.23} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3455 [2024-06-10 07:34:46,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.07 | bwd_microstep: 1476.24 | bwd_inner_microstep: 1476.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3958 [2024-06-10 07:34:48,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.95 | bwd_microstep: 1528.94 | bwd_inner_microstep: 1528.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2997 [2024-06-10 07:34:50,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.03 | bwd_microstep: 1109.77 | bwd_inner_microstep: 1109.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 07:34:52,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 07:34:54,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1552.13 | bwd_inner_microstep: 1552.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 07:34:56,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3748 [2024-06-10 07:34:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.44 | bwd_microstep: 1246.67 | bwd_inner_microstep: 1246.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 07:34:59,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1251.19 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-10 07:35:01,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1539.32 | bwd_inner_microstep: 1539.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800 [2024-06-10 07:35:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.32 | bwd_microstep: 1617.74 | bwd_inner_microstep: 1617.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 07:35:05,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.42 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 07:35:08,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.73 | bwd_microstep: 1527.28 | bwd_inner_microstep: 1527.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2116 [2024-06-10 07:35:09,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.50 | bwd_microstep: 927.97 | bwd_inner_microstep: 927.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 07:35:11,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1445.53 | bwd_inner_microstep: 1445.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424 [2024-06-10 07:35:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.18 | bwd_microstep: 1313.35 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 07:35:14,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.56 | bwd_microstep: 908.86 | bwd_inner_microstep: 908.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 07:35:16,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1387.31 | bwd_inner_microstep: 1387.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 07:35:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.79 | bwd_microstep: 717.70 | bwd_inner_microstep: 717.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 07:35:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1491.63 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 07:35:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 07:35:23,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1378.70 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291 [2024-06-10 07:35:24,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.03 | bwd_microstep: 914.78 | bwd_inner_microstep: 914.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2003 [2024-06-10 07:35:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.74 | bwd_microstep: 711.93 | bwd_inner_microstep: 711.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 07:35:27,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1298.40 | bwd_inner_microstep: 1298.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 07:35:29,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.54 | bwd_microstep: 1657.48 | bwd_inner_microstep: 1657.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893 [2024-06-10 07:35:30,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.21 | bwd_microstep: 684.88 | bwd_inner_microstep: 684.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-10 07:35:31,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.04 | bwd_microstep: 913.64 | bwd_inner_microstep: 913.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3547 [2024-06-10 07:35:33,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1361.55 | bwd_inner_microstep: 1361.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 07:35:35,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1489.07 | bwd_inner_microstep: 1489.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-10 07:35:38,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1605.83 | bwd_inner_microstep: 1605.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622 [2024-06-10 07:35:40,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.92 | bwd_microstep: 1604.13 | bwd_inner_microstep: 1604.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2240 [2024-06-10 07:35:46,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.62 [2024-06-10 07:35:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.84 | bwd_microstep: 6144.02 | bwd_inner_microstep: 1130.04 | bwd_allreduce_microstep: 5013.93 | step_microstep: 38.61 [2024-06-10 07:35:46,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15445.09 | bwd: 46426.16 | bwd_inner: 41411.31 | bwd_allreduce: 5014.16 | step: 40.30 {'loss': 1.3035, 'learning_rate': 3.5884188004310244e-05, 'epoch': 0.23} %|██▎ | 397/1726 [6:55:19<22:34:50, 61.17s/it] 23%|██▎ | 397/1726 [6:55:19<22:34:50, 61.17s/it] 23%|██▎ | 398/1726 [6:56:19<22:27:17, 60.87s/it] 23%|██▎ | 398/1726 [6:56:19<22:27:17, 60.87s/it] 23%|██▎ | 399/1726 [6:57:21<22:29:53, 61.04s/it] 23%|██▎ | 399/1726 [6:57:21<22:29:53, 61.04s/it] 23%|██▎ | 400/1726 [6:58:23<22:36:42, 61.39s/it] 23%|██▎ | 400/1726 [6:58:23<22:36:42, 61.39s/it][INFO|trainer.py:2936] 2024-06-10 07:35:50,274 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400 [INFO|configuration_utils.py:473] 2024-06-10 07:35:50,278 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/config.json [INFO|configuration_utils.py:594] 2024-06-10 07:35:50,281 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 07:35:58,231 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 07:35:58,244 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 07:35:58,246 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 07:35:58,247 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/added_tokens.json [2024-06-10 07:35:58,470] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step400 is about to be saved! [2024-06-10 07:35:58,483] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt [2024-06-10 07:35:58,483] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt... [2024-06-10 07:36:07,136] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/mp_rank_00_model_states.pt. [2024-06-10 07:36:07,141] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 07:36:19,250] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 07:36:19,262] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-400/global_step400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 07:36:19,262] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step400 is ready now! dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392 [2024-06-10 07:36:21,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1289.12 | bwd_inner_microstep: 1289.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878 [2024-06-10 07:36:23,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.66 | bwd_microstep: 1573.14 | bwd_inner_microstep: 1573.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 07:36:25,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1546.70 | bwd_inner_microstep: 1546.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 07:36:27,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.59 | bwd_microstep: 1444.26 | bwd_inner_microstep: 1444.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 07:36:29,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.41 | bwd_microstep: 1239.03 | bwd_inner_microstep: 1239.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744 [2024-06-10 07:36:31,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.87 | bwd_microstep: 1628.08 | bwd_inner_microstep: 1628.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 07:36:33,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1279.98 | bwd_inner_microstep: 1279.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:36:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.09 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 07:36:37,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.16 | bwd_microstep: 1486.09 | bwd_inner_microstep: 1486.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695 [2024-06-10 07:36:39,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.16 | bwd_microstep: 1521.11 | bwd_inner_microstep: 1521.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000 [2024-06-10 07:36:40,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.62 | bwd_microstep: 802.47 | bwd_inner_microstep: 802.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 07:36:41,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 798.27 | bwd_inner_microstep: 798.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 07:36:43,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1397.18 | bwd_inner_microstep: 1397.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3656 [2024-06-10 07:36:45,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1470.59 | bwd_inner_microstep: 1470.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596 [2024-06-10 07:36:47,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.18 | bwd_microstep: 1467.15 | bwd_inner_microstep: 1467.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3504 [2024-06-10 07:36:49,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2646 [2024-06-10 07:36:51,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.57 | bwd_microstep: 1112.12 | bwd_inner_microstep: 1112.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986 [2024-06-10 07:36:52,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.18 | bwd_microstep: 828.41 | bwd_inner_microstep: 828.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-10 07:36:54,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.09 | bwd_microstep: 1614.87 | bwd_inner_microstep: 1614.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 07:36:56,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1432.18 | bwd_inner_microstep: 1432.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 07:36:58,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1381.54 | bwd_inner_microstep: 1381.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610 [2024-06-10 07:37:00,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 1533.21 | bwd_inner_microstep: 1533.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3624 [2024-06-10 07:37:02,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.12 | bwd_microstep: 1564.13 | bwd_inner_microstep: 1564.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948 [2024-06-10 07:37:03,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 701.29 | bwd_inner_microstep: 701.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067 [2024-06-10 07:37:04,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.66 | bwd_microstep: 817.19 | bwd_inner_microstep: 817.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523 [2024-06-10 07:37:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1194.68 | bwd_inner_microstep: 1194.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3623 [2024-06-10 07:37:08,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.55 | bwd_microstep: 1542.79 | bwd_inner_microstep: 1542.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294 [2024-06-10 07:37:10,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.67 | bwd_microstep: 1072.84 | bwd_inner_microstep: 1072.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 07:37:12,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1647.64 | bwd_inner_microstep: 1647.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 07:37:14,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1338.34 | bwd_inner_microstep: 1338.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766 [2024-06-10 07:37:16,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.44 | bwd_microstep: 1611.60 | bwd_inner_microstep: 1611.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 07:37:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.45 | optimizer_step: 6.60 [2024-06-10 07:37:20,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 3845.50 | bwd_inner_microstep: 1591.49 | bwd_allreduce_microstep: 2253.93 | step_microstep: 39.33 [2024-06-10 07:37:20,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.10 | bwd: 45074.23 | bwd_inner: 42819.34 | bwd_allreduce: 2254.18 | step: 40.93 {'loss': 1.336, 'learning_rate': 3.5861352752519294e-05, 'epoch': 0.23} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 07:37:22,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.46 | bwd_microstep: 1366.10 | bwd_inner_microstep: 1366.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 07:37:24,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.44 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 07:37:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1277.09 | bwd_inner_microstep: 1277.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 07:37:28,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.73 | bwd_microstep: 1383.56 | bwd_inner_microstep: 1383.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 07:37:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.98 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 07:37:32,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 07:37:33,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 07:37:36,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.73 | bwd_microstep: 1528.09 | bwd_inner_microstep: 1528.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3732 [2024-06-10 07:37:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.69 | bwd_microstep: 1565.88 | bwd_inner_microstep: 1565.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 07:37:40,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.93 | bwd_microstep: 1476.00 | bwd_inner_microstep: 1475.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3658 [2024-06-10 07:37:42,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.27 | bwd_microstep: 1445.44 | bwd_inner_microstep: 1445.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 07:37:44,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1294.22 | bwd_inner_microstep: 1294.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 07:37:46,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.89 | bwd_microstep: 1474.72 | bwd_inner_microstep: 1474.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2117 [2024-06-10 07:37:47,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.40 | bwd_microstep: 1020.77 | bwd_inner_microstep: 1020.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-10 07:37:49,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1616.32 | bwd_inner_microstep: 1616.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 07:37:51,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1384.87 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-10 07:37:53,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1287.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 07:37:55,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1291.31 | bwd_inner_microstep: 1291.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 07:37:57,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.65 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-10 07:37:59,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.87 | bwd_microstep: 1430.41 | bwd_inner_microstep: 1430.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 07:38:01,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1535.76 | bwd_inner_microstep: 1535.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 07:38:02,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1291.40 | bwd_inner_microstep: 1291.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 07:38:04,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.46 | bwd_microstep: 1254.35 | bwd_inner_microstep: 1254.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 07:38:06,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1292.06 | bwd_inner_microstep: 1292.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2989 [2024-06-10 07:38:08,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.05 | bwd_microstep: 1206.83 | bwd_inner_microstep: 1206.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2031 [2024-06-10 07:38:09,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.54 | bwd_microstep: 760.66 | bwd_inner_microstep: 760.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573 [2024-06-10 07:38:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.09 | bwd_microstep: 1566.27 | bwd_inner_microstep: 1566.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2279 [2024-06-10 07:38:12,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.29 | bwd_microstep: 937.02 | bwd_inner_microstep: 936.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 07:38:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 4810 [2024-06-10 07:38:17,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 747.29 | bwd_microstep: 2000.87 | bwd_inner_microstep: 2000.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 07:38:19,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3579 [2024-06-10 07:38:21,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 07:38:21,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1483.36 | bwd_inner_microstep: 1475.60 | bwd_allreduce_microstep: 7.71 | step_microstep: 38.17 [2024-06-10 07:38:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16436.19 | bwd: 43829.07 | bwd_inner: 43820.44 | bwd_allreduce: 7.94 | step: 40.14 {'loss': 1.3144, 'learning_rate': 3.583846163712641e-05, 'epoch': 0.23} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 07:38:23,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.42 | bwd_microstep: 1476.85 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 07:38:24,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.50 | bwd_microstep: 699.01 | bwd_inner_microstep: 698.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3854 [2024-06-10 07:38:26,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1468.28 | bwd_inner_microstep: 1468.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-10 07:38:28,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.03 | bwd_microstep: 1655.46 | bwd_inner_microstep: 1655.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-10 07:38:31,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.24 | bwd_microstep: 1648.72 | bwd_inner_microstep: 1648.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 07:38:32,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.63 | bwd_microstep: 1154.47 | bwd_inner_microstep: 1154.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747 [2024-06-10 07:38:34,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.61 | bwd_microstep: 1434.15 | bwd_inner_microstep: 1434.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3744 [2024-06-10 07:38:36,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1338.60 | bwd_inner_microstep: 1338.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 07:38:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.20 | bwd_microstep: 1288.25 | bwd_inner_microstep: 1288.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3497 [2024-06-10 07:38:39,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1252.88 | bwd_inner_microstep: 1252.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3696 [2024-06-10 07:38:42,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.50 | bwd_microstep: 1734.36 | bwd_inner_microstep: 1734.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512 [2024-06-10 07:38:44,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.67 | bwd_microstep: 1429.79 | bwd_inner_microstep: 1429.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 07:38:46,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1348.47 | bwd_inner_microstep: 1348.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1924 [2024-06-10 07:38:47,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.07 | bwd_microstep: 758.84 | bwd_inner_microstep: 758.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 07:38:49,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.00 | bwd_microstep: 1302.31 | bwd_inner_microstep: 1302.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-10 07:38:51,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1590.00 | bwd_inner_microstep: 1589.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-10 07:38:53,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1410.26 | bwd_inner_microstep: 1410.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3162 [2024-06-10 07:38:55,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1348.18 | bwd_inner_microstep: 1348.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-10 07:38:56,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.91 | bwd_microstep: 878.56 | bwd_inner_microstep: 878.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 07:38:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.26 | bwd_microstep: 1295.66 | bwd_inner_microstep: 1295.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 07:39:00,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3485 [2024-06-10 07:39:01,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.11 | bwd_microstep: 1432.01 | bwd_inner_microstep: 1431.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827 [2024-06-10 07:39:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.70 | bwd_microstep: 1520.93 | bwd_inner_microstep: 1520.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 07:39:06,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 07:39:07,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1391.77 | bwd_inner_microstep: 1391.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 07:39:09,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2049 [2024-06-10 07:39:11,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.68 | bwd_microstep: 1007.83 | bwd_inner_microstep: 1007.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3802 [2024-06-10 07:39:13,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.55 | bwd_microstep: 1482.84 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 07:39:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1394.64 | bwd_inner_microstep: 1394.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3612 [2024-06-10 07:39:17,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1709.19 | bwd_inner_microstep: 1709.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3442 [2024-06-10 07:39:19,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1400.72 | bwd_inner_microstep: 1400.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-10 07:39:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-10 07:39:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.45 | bwd_microstep: 3111.26 | bwd_inner_microstep: 1868.07 | bwd_allreduce_microstep: 1243.14 | step_microstep: 38.12 [2024-06-10 07:39:23,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.97 | bwd: 45028.79 | bwd_inner: 43784.66 | bwd_allreduce: 1243.42 | step: 39.71 {'loss': 1.3116, 'learning_rate': 3.581551473875397e-05, 'epoch': 0.23} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 07:39:24,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1370.48 | bwd_inner_microstep: 1370.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 07:39:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1384.07 | bwd_inner_microstep: 1384.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 07:39:28,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1286.63 | bwd_inner_microstep: 1286.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3909 [2024-06-10 07:39:30,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.89 | bwd_microstep: 1538.58 | bwd_inner_microstep: 1538.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 07:39:32,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1481.29 | bwd_inner_microstep: 1481.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884 [2024-06-10 07:39:33,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 709.54 | bwd_inner_microstep: 709.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-10 07:39:36,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.35 | bwd_microstep: 1631.06 | bwd_inner_microstep: 1631.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 07:39:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 07:39:39,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1286.44 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 07:39:41,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.02 | bwd_microstep: 1384.92 | bwd_inner_microstep: 1384.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902 [2024-06-10 07:39:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.20 | bwd_microstep: 686.08 | bwd_inner_microstep: 686.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3673 [2024-06-10 07:39:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.39 | bwd_microstep: 1550.36 | bwd_inner_microstep: 1550.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3582 [2024-06-10 07:39:46,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1434.06 | bwd_inner_microstep: 1434.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 07:39:48,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1382.96 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 07:39:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.93 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2898 [2024-06-10 07:39:52,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.45 | bwd_microstep: 1185.44 | bwd_inner_microstep: 1185.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 07:39:54,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979 [2024-06-10 07:39:55,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.21 | bwd_microstep: 895.60 | bwd_inner_microstep: 895.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 07:39:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1508.19 | bwd_inner_microstep: 1508.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 07:39:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1368.09 | bwd_inner_microstep: 1368.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 07:40:01,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3620 [2024-06-10 07:40:03,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1676.93 | bwd_inner_microstep: 1676.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 07:40:05,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.50 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437 [2024-06-10 07:40:07,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1544.64 | bwd_inner_microstep: 1544.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2424 [2024-06-10 07:40:08,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.73 | bwd_microstep: 1134.34 | bwd_inner_microstep: 1134.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 07:40:10,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1644.32 | bwd_inner_microstep: 1644.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3821 [2024-06-10 07:40:13,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.91 | bwd_microstep: 1860.31 | bwd_inner_microstep: 1860.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3730 [2024-06-10 07:40:15,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.17 | bwd_microstep: 1664.28 | bwd_inner_microstep: 1664.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080 [2024-06-10 07:40:17,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.02 | bwd_microstep: 916.44 | bwd_inner_microstep: 916.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-10 07:40:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1304.28 | bwd_inner_microstep: 1304.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582 [2024-06-10 07:40:20,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1299.69 | bwd_inner_microstep: 1299.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 07:40:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 07:40:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.04 | bwd_microstep: 3985.08 | bwd_inner_microstep: 2025.88 | bwd_allreduce_microstep: 1959.12 | step_microstep: 38.36 [2024-06-10 07:40:25,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16174.29 | bwd: 45735.85 | bwd_inner: 43775.69 | bwd_allreduce: 1959.42 | step: 39.93 {'loss': 1.33, 'learning_rate': 3.579251213822085e-05, 'epoch': 0.23} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 07:40:27,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.94 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 07:40:29,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.61 | bwd_microstep: 1442.87 | bwd_inner_microstep: 1442.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2307 [2024-06-10 07:40:30,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 821.10 | bwd_inner_microstep: 821.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 07:40:32,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1552.10 | bwd_inner_microstep: 1552.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567 [2024-06-10 07:40:34,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1297.83 | bwd_inner_microstep: 1297.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 07:40:36,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1385.33 | bwd_inner_microstep: 1385.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-10 07:40:38,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.07 | bwd_microstep: 1213.57 | bwd_inner_microstep: 1213.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:40:39,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 07:40:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.22 | bwd_microstep: 792.90 | bwd_inner_microstep: 792.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-10 07:40:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 716.05 | bwd_inner_microstep: 716.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 07:40:43,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1390.68 | bwd_inner_microstep: 1390.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 07:40:45,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1429.53 | bwd_inner_microstep: 1429.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2679 [2024-06-10 07:40:47,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.58 | bwd_microstep: 1058.97 | bwd_inner_microstep: 1058.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2166 [2024-06-10 07:40:48,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.70 | bwd_microstep: 885.77 | bwd_inner_microstep: 885.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421 [2024-06-10 07:40:50,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1406.24 | bwd_inner_microstep: 1406.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512 [2024-06-10 07:40:52,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1585.64 | bwd_inner_microstep: 1585.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 07:40:54,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1288.39 | bwd_inner_microstep: 1288.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 07:40:56,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1390.97 | bwd_inner_microstep: 1390.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2177 [2024-06-10 07:40:57,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.90 | bwd_microstep: 829.25 | bwd_inner_microstep: 829.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 07:40:59,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1296.86 | bwd_inner_microstep: 1296.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2088 [2024-06-10 07:41:00,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.81 | bwd_microstep: 852.84 | bwd_inner_microstep: 852.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3636 [2024-06-10 07:41:02,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1318.65 | bwd_inner_microstep: 1318.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 07:41:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.75 | bwd_microstep: 1549.81 | bwd_inner_microstep: 1549.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819 [2024-06-10 07:41:06,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.00 | bwd_microstep: 1753.49 | bwd_inner_microstep: 1753.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 07:41:08,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1282.30 | bwd_inner_microstep: 1282.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 07:41:10,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1530.86 | bwd_inner_microstep: 1530.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-10 07:41:13,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.90 | bwd_microstep: 1756.91 | bwd_inner_microstep: 1756.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 07:41:15,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.96 | bwd_microstep: 1747.16 | bwd_inner_microstep: 1747.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 07:41:17,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.68 | bwd_microstep: 1530.65 | bwd_inner_microstep: 1530.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3560 [2024-06-10 07:41:19,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1589.34 | bwd_inner_microstep: 1589.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2254 [2024-06-10 07:41:21,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.16 | bwd_microstep: 1067.66 | bwd_inner_microstep: 1067.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3825 [2024-06-10 07:41:26,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.26 | optimizer_step: 6.64 [2024-06-10 07:41:26,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.30 | bwd_microstep: 3893.16 | bwd_inner_microstep: 2104.33 | bwd_allreduce_microstep: 1788.77 | step_microstep: 38.40 [2024-06-10 07:41:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15866.15 | bwd: 44524.19 | bwd_inner: 42734.51 | bwd_allreduce: 1789.01 | step: 40.05 {'loss': 1.3099, 'learning_rate': 3.5769453916542065e-05, 'epoch': 0.23} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413 [2024-06-10 07:41:27,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1180.09 | bwd_inner_microstep: 1180.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1926 [2024-06-10 07:41:28,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.84 | bwd_microstep: 738.87 | bwd_inner_microstep: 738.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2434 [2024-06-10 07:41:30,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.13 | bwd_microstep: 941.51 | bwd_inner_microstep: 941.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855 [2024-06-10 07:41:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.38 | bwd_microstep: 1559.42 | bwd_inner_microstep: 1559.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 07:41:34,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.13 | bwd_microstep: 1542.21 | bwd_inner_microstep: 1542.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 07:41:36,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1246.86 | bwd_inner_microstep: 1246.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402 [2024-06-10 07:41:37,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.62 | bwd_microstep: 1179.91 | bwd_inner_microstep: 1179.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 07:41:39,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.07 | bwd_microstep: 1442.41 | bwd_inner_microstep: 1442.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-10 07:41:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 07:41:43,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1390.07 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 07:41:45,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1389.21 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3441 [2024-06-10 07:41:47,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.82 | bwd_microstep: 1216.10 | bwd_inner_microstep: 1216.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443 [2024-06-10 07:41:48,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1310.99 | bwd_inner_microstep: 1310.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-10 07:41:50,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1218.55 | bwd_inner_microstep: 1218.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 07:41:52,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1377.01 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 07:41:54,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1351.21 | bwd_inner_microstep: 1351.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466 [2024-06-10 07:41:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.86 | bwd_microstep: 1573.78 | bwd_inner_microstep: 1573.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1961 [2024-06-10 07:41:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.82 | bwd_microstep: 890.48 | bwd_inner_microstep: 890.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3627 [2024-06-10 07:41:59,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1217.81 | bwd_inner_microstep: 1217.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511 [2024-06-10 07:42:01,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.03 | bwd_microstep: 1194.29 | bwd_inner_microstep: 1194.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-10 07:42:03,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1609.80 | bwd_inner_microstep: 1609.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880 [2024-06-10 07:42:05,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.05 | bwd_microstep: 1590.40 | bwd_inner_microstep: 1590.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 07:42:07,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616 [2024-06-10 07:42:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 1538.21 | bwd_inner_microstep: 1538.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 07:42:11,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.94 | bwd_microstep: 1255.30 | bwd_inner_microstep: 1255.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524 [2024-06-10 07:42:13,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1586.64 | bwd_inner_microstep: 1586.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 07:42:15,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1514.66 | bwd_inner_microstep: 1514.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-10 07:42:17,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1507.86 | bwd_inner_microstep: 1507.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3538 [2024-06-10 07:42:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.24 | bwd_microstep: 1622.00 | bwd_inner_microstep: 1621.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 07:42:20,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.11 | bwd_microstep: 697.31 | bwd_inner_microstep: 697.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 07:42:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1501.46 | bwd_inner_microstep: 1501.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 07:42:27,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.57 [2024-06-10 07:42:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 4300.92 | bwd_inner_microstep: 1701.48 | bwd_allreduce_microstep: 2599.38 | step_microstep: 38.22 [2024-06-10 07:42:27,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16024.56 | bwd: 45520.51 | bwd_inner: 42920.21 | bwd_allreduce: 2599.61 | step: 39.81 {'loss': 1.3202, 'learning_rate': 3.574634015492857e-05, 'epoch': 0.24} 23%|██▎ | 401/1726 [6:59:57<26:11:16, 71.15s/it] 23%|██▎ | 401/1726 [6:59:57<26:11:16, 71.15s/it] 23%|██▎ | 402/1726 [7:00:58<25:00:29, 68.00s/it] 23%|██▎ | 402/1726 [7:00:58<25:00:29, 68.00s/it] 23%|██▎ | 403/1726 [7:01:59<24:17:10, 66.09s/it] 23%|██▎ | 403/1726 [7:01:59<24:17:10, 66.09s/it] 23%|██▎ | 404/1726 [7:03:02<23:50:46, 64.94s/it] 23%|██▎ | 404/1726 [7:03:02<23:50:46, 64.94s/it] 23%|██▎ | 405/1726 [7:04:02<23:21:57, 63.68s/it] 23%|██▎ | 405/1726 [7:04:02<23:21:57, 63.68s/it] 24%|██▎ | 406/1726 [7:05:04<23:09:01, 63.14s/it] dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 07:42:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.46 | bwd_microstep: 1468.08 | bwd_inner_microstep: 1468.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3971 [2024-06-10 07:42:31,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1437.47 | bwd_inner_microstep: 1437.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 07:42:34,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1479.66 | bwd_inner_microstep: 1479.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-10 07:42:35,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 07:42:37,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.39 | bwd_microstep: 791.14 | bwd_inner_microstep: 791.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 07:42:38,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 07:42:40,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.56 | bwd_microstep: 1387.01 | bwd_inner_microstep: 1386.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-10 07:42:42,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.50 | bwd_microstep: 1154.49 | bwd_inner_microstep: 1154.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-10 07:42:44,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.66 | bwd_microstep: 1625.14 | bwd_inner_microstep: 1625.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 07:42:46,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1383.93 | bwd_inner_microstep: 1383.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-10 07:42:47,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.46 | bwd_microstep: 894.19 | bwd_inner_microstep: 894.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 07:42:49,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.42 | bwd_microstep: 1613.97 | bwd_inner_microstep: 1613.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3628 [2024-06-10 07:42:51,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.95 | bwd_microstep: 1458.43 | bwd_inner_microstep: 1458.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-10 07:42:53,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.20 | bwd_microstep: 899.87 | bwd_inner_microstep: 899.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 07:42:55,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.42 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 07:42:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.92 | bwd_microstep: 1287.97 | bwd_inner_microstep: 1287.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1979 [2024-06-10 07:42:58,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.71 | bwd_microstep: 830.66 | bwd_inner_microstep: 830.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 07:42:59,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 978.31 | bwd_inner_microstep: 978.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3662 [2024-06-10 07:43:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 07:43:03,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1599.39 | bwd_inner_microstep: 1599.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-10 07:43:05,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.38 | bwd_microstep: 1487.84 | bwd_inner_microstep: 1487.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 07:43:07,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1490.02 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 07:43:09,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1507.53 | bwd_inner_microstep: 1507.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 07:43:11,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1376.23 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770 [2024-06-10 07:43:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1567.04 | bwd_inner_microstep: 1567.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 07:43:15,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1413.35 | bwd_inner_microstep: 1413.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3608 [2024-06-10 07:43:17,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.86 | bwd_microstep: 1554.78 | bwd_inner_microstep: 1554.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 07:43:19,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.89 | bwd_microstep: 1261.84 | bwd_inner_microstep: 1261.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 07:43:21,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1406.74 | bwd_inner_microstep: 1406.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 07:43:23,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.69 | bwd_microstep: 1508.66 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 07:43:25,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1557.15 | bwd_inner_microstep: 1557.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2469 [2024-06-10 07:43:29,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.58 [2024-06-10 07:43:29,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.85 | bwd_microstep: 3036.34 | bwd_inner_microstep: 1190.93 | bwd_allreduce_microstep: 1845.36 | step_microstep: 37.85 [2024-06-10 07:43:29,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16061.73 | bwd: 45028.55 | bwd_inner: 43182.26 | bwd_allreduce: 1845.59 | step: 39.36 {'loss': 1.2825, 'learning_rate': 3.57231709347869e-05, 'epoch': 0.24} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-10 07:43:31,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1316.60 | bwd_inner_microstep: 1316.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 07:43:32,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1245.64 | bwd_inner_microstep: 1245.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 07:43:34,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1353.42 | bwd_inner_microstep: 1353.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 07:43:36,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 07:43:38,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.14 | bwd_inner_microstep: 1484.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-10 07:43:40,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1507.62 | bwd_inner_microstep: 1507.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 07:43:42,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1403.07 | bwd_inner_microstep: 1403.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4043 [2024-06-10 07:43:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.66 | bwd_microstep: 1722.13 | bwd_inner_microstep: 1722.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416 [2024-06-10 07:43:46,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907 [2024-06-10 07:43:47,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.98 | bwd_microstep: 778.18 | bwd_inner_microstep: 778.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499 [2024-06-10 07:43:49,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1447.16 | bwd_inner_microstep: 1447.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2967 [2024-06-10 07:43:51,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.52 | bwd_microstep: 1230.81 | bwd_inner_microstep: 1230.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 07:43:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1504.54 | bwd_inner_microstep: 1504.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2399 [2024-06-10 07:43:54,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.47 | bwd_microstep: 902.00 | bwd_inner_microstep: 901.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 07:43:56,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.60 | bwd_microstep: 1522.67 | bwd_inner_microstep: 1522.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 07:43:58,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.65 | bwd_inner_microstep: 1484.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3385 [2024-06-10 07:44:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.05 | bwd_microstep: 1177.50 | bwd_inner_microstep: 1177.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2656 [2024-06-10 07:44:02,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.69 | bwd_microstep: 1211.97 | bwd_inner_microstep: 1211.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087 [2024-06-10 07:44:03,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.40 | bwd_microstep: 1014.85 | bwd_inner_microstep: 1014.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2982 [2024-06-10 07:44:04,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.08 | bwd_microstep: 1016.58 | bwd_inner_microstep: 1016.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486 [2024-06-10 07:44:06,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.36 | bwd_microstep: 1188.53 | bwd_inner_microstep: 1188.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548 [2024-06-10 07:44:08,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1328.98 | bwd_inner_microstep: 1328.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3615 [2024-06-10 07:44:10,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.22 | bwd_microstep: 1707.97 | bwd_inner_microstep: 1707.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 07:44:12,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 1403.48 | bwd_inner_microstep: 1403.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-10 07:44:14,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.42 | bwd_microstep: 1302.37 | bwd_inner_microstep: 1302.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 07:44:16,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 07:44:18,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-10 07:44:19,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.94 | bwd_microstep: 875.70 | bwd_inner_microstep: 875.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 07:44:21,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1499.26 | bwd_inner_microstep: 1499.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807 [2024-06-10 07:44:23,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.49 | bwd_microstep: 1354.87 | bwd_inner_microstep: 1354.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424 [2024-06-10 07:44:25,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1408.32 | bwd_inner_microstep: 1408.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388 [2024-06-10 07:44:30,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 07:44:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 4883.81 | bwd_inner_microstep: 1515.68 | bwd_allreduce_microstep: 3368.07 | step_microstep: 38.28 [2024-06-10 07:44:30,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15716.29 | bwd: 45395.97 | bwd_inner: 42026.98 | bwd_allreduce: 3368.30 | step: 39.79 {'loss': 1.3278, 'learning_rate': 3.5699946337718934e-05, 'epoch': 0.24} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3383 [2024-06-10 07:44:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.80 | bwd_microstep: 1283.75 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3972 [2024-06-10 07:44:34,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.61 | bwd_microstep: 1599.80 | bwd_inner_microstep: 1599.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 07:44:36,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.42 | bwd_microstep: 1282.90 | bwd_inner_microstep: 1282.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 07:44:38,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 1553.65 | bwd_inner_microstep: 1553.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 07:44:40,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.33 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-10 07:44:42,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.20 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 07:44:44,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 07:44:46,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1386.17 | bwd_inner_microstep: 1386.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2177 [2024-06-10 07:44:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 950.64 | bwd_inner_microstep: 950.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 07:44:49,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1392.59 | bwd_inner_microstep: 1392.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 4050 [2024-06-10 07:44:51,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.39 | bwd_microstep: 1784.50 | bwd_inner_microstep: 1784.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650 [2024-06-10 07:44:53,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.09 | bwd_microstep: 1576.95 | bwd_inner_microstep: 1576.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3501 [2024-06-10 07:44:55,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.54 | bwd_microstep: 1189.48 | bwd_inner_microstep: 1189.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3411 [2024-06-10 07:44:57,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1535.77 | bwd_inner_microstep: 1535.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-10 07:44:59,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1606.00 | bwd_inner_microstep: 1605.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 07:45:01,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1290.79 | bwd_inner_microstep: 1290.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 07:45:03,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1410.37 | bwd_inner_microstep: 1410.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 07:45:05,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1510.87 | bwd_inner_microstep: 1510.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 07:45:07,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-10 07:45:09,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.08 | bwd_microstep: 1190.02 | bwd_inner_microstep: 1189.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 07:45:11,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1405.97 | bwd_inner_microstep: 1405.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 07:45:13,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1356.10 | bwd_inner_microstep: 1356.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 07:45:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.08 | bwd_microstep: 1612.73 | bwd_inner_microstep: 1612.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3541 [2024-06-10 07:45:17,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.53 | bwd_microstep: 1199.98 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 07:45:19,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1507.34 | bwd_inner_microstep: 1507.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3608 [2024-06-10 07:45:21,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.51 | bwd_microstep: 1704.74 | bwd_inner_microstep: 1704.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 07:45:23,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 973.92 | bwd_inner_microstep: 973.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804 [2024-06-10 07:45:24,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.75 | bwd_microstep: 1357.48 | bwd_inner_microstep: 1357.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 07:45:26,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1352.67 | bwd_inner_microstep: 1352.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 07:45:28,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-10 07:45:30,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.34 | bwd_microstep: 980.36 | bwd_inner_microstep: 980.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 07:45:32,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-10 07:45:32,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1443.96 | bwd_inner_microstep: 1436.29 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.69 [2024-06-10 07:45:32,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16586.44 | bwd: 44413.59 | bwd_inner: 44404.95 | bwd_allreduce: 7.93 | step: 39.28 {'loss': 1.3042, 'learning_rate': 3.567666644552159e-05, 'epoch': 0.24} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2886 [2024-06-10 07:45:33,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.50 | bwd_microstep: 1180.91 | bwd_inner_microstep: 1180.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3520 [2024-06-10 07:45:35,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1507.16 | bwd_inner_microstep: 1507.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734 [2024-06-10 07:45:37,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 583 [2024-06-10 07:45:38,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.22 | bwd_microstep: 254.36 | bwd_inner_microstep: 254.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 07:45:40,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1280.20 | bwd_inner_microstep: 1280.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 07:45:42,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.44 | bwd_microstep: 1529.74 | bwd_inner_microstep: 1529.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2625 [2024-06-10 07:45:43,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 950.93 | bwd_inner_microstep: 950.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2933 [2024-06-10 07:45:45,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.47 | bwd_microstep: 1064.76 | bwd_inner_microstep: 1064.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 07:45:47,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1528.00 | bwd_inner_microstep: 1527.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 07:45:48,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1249.93 | bwd_inner_microstep: 1249.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124 [2024-06-10 07:45:50,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.74 | bwd_microstep: 929.18 | bwd_inner_microstep: 929.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885 [2024-06-10 07:45:51,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.49 | bwd_microstep: 716.36 | bwd_inner_microstep: 716.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4003 [2024-06-10 07:45:53,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.88 | bwd_microstep: 1562.71 | bwd_inner_microstep: 1562.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 07:45:55,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 07:45:57,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1484.42 | bwd_inner_microstep: 1484.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 07:45:59,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1513.20 | bwd_inner_microstep: 1513.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 07:46:01,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.94 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 07:46:02,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1349.05 | bwd_inner_microstep: 1349.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531 [2024-06-10 07:46:04,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1199.16 | bwd_inner_microstep: 1199.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 07:46:06,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.76 | bwd_microstep: 1509.75 | bwd_inner_microstep: 1509.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052 [2024-06-10 07:46:07,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.59 | bwd_microstep: 910.56 | bwd_inner_microstep: 910.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 07:46:09,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.21 | bwd_inner_microstep: 1189.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 07:46:11,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1554.76 | bwd_inner_microstep: 1554.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 07:46:13,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1501.80 | bwd_inner_microstep: 1501.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 07:46:15,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1463.16 | bwd_inner_microstep: 1463.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 07:46:16,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.56 | bwd_microstep: 803.12 | bwd_inner_microstep: 803.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545 [2024-06-10 07:46:18,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1329.02 | bwd_inner_microstep: 1328.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2059 [2024-06-10 07:46:20,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.18 | bwd_microstep: 943.43 | bwd_inner_microstep: 943.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3590 [2024-06-10 07:46:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.24 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588 [2024-06-10 07:46:23,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1307.56 | bwd_inner_microstep: 1307.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817 [2024-06-10 07:46:26,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.92 | bwd_microstep: 1748.90 | bwd_inner_microstep: 1748.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 07:46:33,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-10 07:46:33,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 6783.99 | bwd_inner_microstep: 1568.10 | bwd_allreduce_microstep: 5215.83 | step_microstep: 38.70 [2024-06-10 07:46:33,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15187.03 | bwd: 45886.94 | bwd_inner: 40670.18 | bwd_allreduce: 5216.07 | step: 40.22 {'loss': 1.3165, 'learning_rate': 3.5653331340186515e-05, 'epoch': 0.24} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 07:46:35,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1333.64 | bwd_inner_microstep: 1333.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 07:46:37,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 07:46:39,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1450.01 | bwd_inner_microstep: 1449.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 07:46:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.75 | bwd_microstep: 1533.74 | bwd_inner_microstep: 1533.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 07:46:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.24 | bwd_microstep: 1279.10 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 07:46:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1384.95 | bwd_inner_microstep: 1384.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 07:46:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1249.27 | bwd_inner_microstep: 1249.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 07:46:48,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1350.05 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 07:46:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.05 | bwd_microstep: 1529.21 | bwd_inner_microstep: 1529.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 07:46:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 07:46:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1193.29 | bwd_inner_microstep: 1193.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073 [2024-06-10 07:46:55,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.31 | bwd_microstep: 818.50 | bwd_inner_microstep: 818.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2763 [2024-06-10 07:46:56,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.28 | bwd_microstep: 1006.94 | bwd_inner_microstep: 1006.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500 [2024-06-10 07:46:59,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1581.06 | bwd_inner_microstep: 1581.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 07:47:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1396.32 | bwd_inner_microstep: 1396.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 07:47:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.05 | bwd_microstep: 696.62 | bwd_inner_microstep: 696.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531 [2024-06-10 07:47:03,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1358.43 | bwd_inner_microstep: 1358.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849 [2024-06-10 07:47:06,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.60 | bwd_microstep: 1563.12 | bwd_inner_microstep: 1563.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-10 07:47:08,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.81 | bwd_microstep: 1527.98 | bwd_inner_microstep: 1527.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 07:47:09,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1297.00 | bwd_inner_microstep: 1296.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 07:47:11,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1434.55 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 07:47:13,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1402.32 | bwd_inner_microstep: 1402.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-10 07:47:15,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1433.40 | bwd_inner_microstep: 1433.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 07:47:17,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1397.03 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961 [2024-06-10 07:47:18,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.68 | bwd_microstep: 701.58 | bwd_inner_microstep: 701.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625 [2024-06-10 07:47:20,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1313.29 | bwd_inner_microstep: 1313.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 07:47:22,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1591.51 | bwd_inner_microstep: 1591.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 07:47:24,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1591.63 | bwd_inner_microstep: 1591.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 07:47:26,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1403.82 | bwd_inner_microstep: 1403.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 07:47:28,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.89 | bwd_microstep: 1260.74 | bwd_inner_microstep: 1260.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590 [2024-06-10 07:47:30,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566 [2024-06-10 07:47:34,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 07:47:34,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.47 | bwd_microstep: 3593.30 | bwd_inner_microstep: 1462.48 | bwd_allreduce_microstep: 2130.77 | step_microstep: 38.20 [2024-06-10 07:47:34,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16043.96 | bwd: 44995.61 | bwd_inner: 42863.93 | bwd_allreduce: 2131.00 | step: 39.70 {'loss': 1.2854, 'learning_rate': 3.5629941103899834e-05, 'epoch': 0.24} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 4602 [2024-06-10 07:47:37,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.26 | bwd_microstep: 1797.24 | bwd_inner_microstep: 1796.96 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.21 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475 [2024-06-10 07:47:39,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1217.02 | bwd_inner_microstep: 1217.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 07:47:41,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.61 | bwd_microstep: 1652.19 | bwd_inner_microstep: 1652.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 07:47:43,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 07:47:44,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1146.17 | bwd_inner_microstep: 1146.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756 [2024-06-10 07:47:47,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1540.81 | bwd_inner_microstep: 1538.78 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 07:47:49,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.71 | bwd_microstep: 1642.76 | bwd_inner_microstep: 1642.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 07:47:51,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.38 | bwd_microstep: 1301.57 | bwd_inner_microstep: 1301.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3892 [2024-06-10 07:47:53,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 1636.46 | bwd_inner_microstep: 1636.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 07:47:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1385.95 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3662 [2024-06-10 07:47:57,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.02 | bwd_microstep: 1357.72 | bwd_inner_microstep: 1357.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697 [2024-06-10 07:47:59,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1429.46 | bwd_inner_microstep: 1429.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 07:48:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.25 | bwd_microstep: 1488.03 | bwd_inner_microstep: 1488.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3656 [2024-06-10 07:48:03,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.77 | bwd_microstep: 1554.72 | bwd_inner_microstep: 1554.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 07:48:05,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3830 [2024-06-10 07:48:07,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.89 | bwd_microstep: 1796.27 | bwd_inner_microstep: 1796.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 07:48:09,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1289.69 | bwd_inner_microstep: 1289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 07:48:11,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1416.73 | bwd_inner_microstep: 1416.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837 [2024-06-10 07:48:13,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.19 | bwd_microstep: 1667.56 | bwd_inner_microstep: 1667.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294 [2024-06-10 07:48:14,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.58 | bwd_microstep: 976.70 | bwd_inner_microstep: 976.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 07:48:16,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1297.72 | bwd_inner_microstep: 1297.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 07:48:18,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1258.24 | bwd_inner_microstep: 1258.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 07:48:20,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1517.62 | bwd_inner_microstep: 1517.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 07:48:22,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1560.69 | bwd_inner_microstep: 1560.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031 [2024-06-10 07:48:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.94 | bwd_microstep: 813.08 | bwd_inner_microstep: 813.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759 [2024-06-10 07:48:26,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1647.17 | bwd_inner_microstep: 1647.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558 [2024-06-10 07:48:28,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.30 | bwd_microstep: 1534.95 | bwd_inner_microstep: 1534.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3763 [2024-06-10 07:48:30,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.18 | bwd_microstep: 1712.18 | bwd_inner_microstep: 1712.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524 [2024-06-10 07:48:32,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1590.36 | bwd_inner_microstep: 1590.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 07:48:34,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1344.16 | bwd_inner_microstep: 1344.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3804 [2024-06-10 07:48:37,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.46 | bwd_microstep: 1684.95 | bwd_inner_microstep: 1684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 07:48:39,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.67 [2024-06-10 07:48:39,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.13 | bwd_microstep: 1636.50 | bwd_inner_microstep: 1628.72 | bwd_allreduce_microstep: 7.73 | step_microstep: 37.73 [2024-06-10 07:48:39,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17359.77 | bwd: 46625.61 | bwd_inner: 46614.73 | bwd_allreduce: 8.17 | step: 39.96 24%|██▎ | 406/1726 [7:05:04<23:09:01, 63.14s/it] 24%|██▎ | 407/1726 [7:06:06<22:56:40, 62.62s/it] 24%|██▎ | 407/1726 [7:06:06<22:56:40, 62.62s/it] 24%|██▎ | 408/1726 [7:07:07<22:47:52, 62.27s/it] 24%|██▎ | 408/1726 [7:07:07<22:47:52, 62.27s/it] 24%|██▎ | 409/1726 [7:08:08<22:40:41, 61.99s/it] 24%|██▎ | 409/1726 [7:08:08<22:40:41, 61.99s/it] 24%|██▍ | 410/1726 [7:09:10<22:35:47, 61.81s/it] 24%|██▍ | 410/1726 [7:09:10<22:35:47, 61.81s/it] 24%|██▍ | 411/1726 [7:10:11<22:31:52, 61.68s/it] 24%|██▍ | 411/1726 [7:10:11<22:31:52, 61.68s/it] 24%|██▍ | 412/1726 [7:11:16<22:48:32, 62.49{'loss': 1.33, 'learning_rate': 3.560649581904184e-05, 'epoch': 0.24} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 07:48:40,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 790.07 | bwd_inner_microstep: 789.87 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557 [2024-06-10 07:48:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.63 | bwd_microstep: 1433.98 | bwd_inner_microstep: 1433.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3861 [2024-06-10 07:48:44,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 07:48:46,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.20 | bwd_microstep: 1643.50 | bwd_inner_microstep: 1643.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 07:48:48,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.85 | bwd_microstep: 1188.08 | bwd_inner_microstep: 1188.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-10 07:48:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.13 | bwd_microstep: 1542.06 | bwd_inner_microstep: 1542.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 07:48:52,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 07:48:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 07:48:56,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1416.87 | bwd_inner_microstep: 1416.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 07:48:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1495.27 | bwd_inner_microstep: 1495.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675 [2024-06-10 07:49:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 1551.71 | bwd_inner_microstep: 1551.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444 [2024-06-10 07:49:02,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.93 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1965 [2024-06-10 07:49:03,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.30 | bwd_microstep: 865.08 | bwd_inner_microstep: 865.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3657 [2024-06-10 07:49:05,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.81 | bwd_microstep: 1593.48 | bwd_inner_microstep: 1593.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 07:49:07,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1520.70 | bwd_inner_microstep: 1520.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3476 [2024-06-10 07:49:09,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.63 | bwd_microstep: 1367.65 | bwd_inner_microstep: 1367.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 07:49:11,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1389.47 | bwd_inner_microstep: 1389.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2692 [2024-06-10 07:49:12,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.56 | bwd_microstep: 1034.77 | bwd_inner_microstep: 1034.62 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 07:49:14,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.19 | bwd_inner_microstep: 1256.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527 [2024-06-10 07:49:16,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1326.80 | bwd_inner_microstep: 1326.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1007 [2024-06-10 07:49:17,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 163.45 | bwd_microstep: 428.20 | bwd_inner_microstep: 428.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3708 [2024-06-10 07:49:19,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1561.69 | bwd_inner_microstep: 1561.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686 [2024-06-10 07:49:21,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.78 | bwd_microstep: 1435.01 | bwd_inner_microstep: 1434.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-10 07:49:22,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 800.98 | bwd_inner_microstep: 800.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 07:49:24,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1382.61 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537 [2024-06-10 07:49:26,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.62 | bwd_microstep: 1451.37 | bwd_inner_microstep: 1451.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-10 07:49:27,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 880.51 | bwd_inner_microstep: 880.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473 [2024-06-10 07:49:29,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1426.76 | bwd_inner_microstep: 1426.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 07:49:31,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1559.55 | bwd_inner_microstep: 1559.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592 [2024-06-10 07:49:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.27 | bwd_microstep: 1703.58 | bwd_inner_microstep: 1703.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049 [2024-06-10 07:49:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.51 | bwd_microstep: 911.09 | bwd_inner_microstep: 911.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1394 [2024-06-10 07:49:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 07:49:40,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.85 | bwd_microstep: 4816.66 | bwd_inner_microstep: 606.70 | bwd_allreduce_microstep: 4209.90 | step_microstep: 38.80 [2024-06-10 07:49:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15343.45 | bwd: 45224.13 | bwd_inner: 41013.03 | bwd_allreduce: 4210.28 | step: 40.94 {'loss': 1.2455, 'learning_rate': 3.55829955681867e-05, 'epoch': 0.24} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3477 [2024-06-10 07:49:42,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.57 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.54 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 07:49:44,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 07:49:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1379.46 | bwd_inner_microstep: 1379.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834 [2024-06-10 07:49:48,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.04 | bwd_microstep: 1488.20 | bwd_inner_microstep: 1488.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 07:49:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1550.35 | bwd_inner_microstep: 1550.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 07:49:52,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.42 | bwd_microstep: 1545.89 | bwd_inner_microstep: 1545.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 07:49:54,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1532.25 | bwd_inner_microstep: 1532.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-10 07:49:56,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1339.84 | bwd_inner_microstep: 1339.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 07:49:58,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1391.72 | bwd_inner_microstep: 1391.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 739 [2024-06-10 07:49:58,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.07 | bwd_microstep: 303.93 | bwd_inner_microstep: 303.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3723 [2024-06-10 07:50:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1560.73 | bwd_inner_microstep: 1560.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 07:50:02,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3530 [2024-06-10 07:50:04,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.92 | bwd_microstep: 1445.47 | bwd_inner_microstep: 1445.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384 [2024-06-10 07:50:06,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1434.49 | bwd_inner_microstep: 1434.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3671 [2024-06-10 07:50:09,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.07 | bwd_microstep: 1824.76 | bwd_inner_microstep: 1824.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023 [2024-06-10 07:50:10,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.56 | bwd_microstep: 808.47 | bwd_inner_microstep: 808.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 07:50:12,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1493.34 | bwd_inner_microstep: 1493.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3833 [2024-06-10 07:50:14,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1363.91 | bwd_inner_microstep: 1363.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 07:50:16,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1400.72 | bwd_inner_microstep: 1400.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679 [2024-06-10 07:50:18,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1329.50 | bwd_inner_microstep: 1329.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 07:50:20,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1531.28 | bwd_inner_microstep: 1531.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476 [2024-06-10 07:50:22,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.57 | bwd_microstep: 1216.00 | bwd_inner_microstep: 1215.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3481 [2024-06-10 07:50:23,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.65 | bwd_microstep: 1330.20 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 07:50:25,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.42 | bwd_microstep: 1163.60 | bwd_inner_microstep: 1163.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009 [2024-06-10 07:50:26,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.60 | bwd_microstep: 803.65 | bwd_inner_microstep: 803.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 07:50:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1280.48 | bwd_inner_microstep: 1280.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453 [2024-06-10 07:50:30,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.62 | bwd_microstep: 1193.09 | bwd_inner_microstep: 1193.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3597 [2024-06-10 07:50:32,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1540.23 | bwd_inner_microstep: 1540.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 07:50:34,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.31 | bwd_microstep: 1496.43 | bwd_inner_microstep: 1496.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3475 [2024-06-10 07:50:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1332.46 | bwd_inner_microstep: 1332.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805 [2024-06-10 07:50:38,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.50 | bwd_microstep: 1753.93 | bwd_inner_microstep: 1753.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2275 [2024-06-10 07:50:42,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 07:50:42,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.78 | bwd_microstep: 3743.05 | bwd_inner_microstep: 1158.72 | bwd_allreduce_microstep: 2584.28 | step_microstep: 38.08 [2024-06-10 07:50:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16210.00 | bwd: 45952.43 | bwd_inner: 43367.18 | bwd_allreduce: 2584.55 | step: 39.79 {'loss': 1.3226, 'learning_rate': 3.5559440434102176e-05, 'epoch': 0.24} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 07:50:44,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1244.96 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:50:46,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1245.72 | bwd_inner_microstep: 1245.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3893 [2024-06-10 07:50:48,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1415.57 | bwd_inner_microstep: 1415.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 07:50:50,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.55 | bwd_microstep: 1481.54 | bwd_inner_microstep: 1481.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293 [2024-06-10 07:50:51,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.96 | bwd_microstep: 973.22 | bwd_inner_microstep: 973.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-10 07:50:53,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1633.81 | bwd_inner_microstep: 1633.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 07:50:55,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.64 | bwd_microstep: 1248.11 | bwd_inner_microstep: 1248.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 07:50:57,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1422.88 | bwd_inner_microstep: 1422.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958 [2024-06-10 07:50:58,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 897.01 | bwd_inner_microstep: 896.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3505 [2024-06-10 07:51:00,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1338.00 | bwd_inner_microstep: 1337.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 07:51:02,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-10 07:51:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.30 | bwd_microstep: 1584.07 | bwd_inner_microstep: 1584.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3420 [2024-06-10 07:51:06,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1395.45 | bwd_inner_microstep: 1395.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1980 [2024-06-10 07:51:07,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.48 | bwd_microstep: 848.30 | bwd_inner_microstep: 848.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 07:51:09,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 798.27 | bwd_inner_microstep: 798.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950 [2024-06-10 07:51:09,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 147.69 | bwd_microstep: 380.03 | bwd_inner_microstep: 380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 07:51:11,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-10 07:51:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.06 | bwd_microstep: 701.56 | bwd_inner_microstep: 701.39 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-10 07:51:13,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.39 | bwd_microstep: 701.63 | bwd_inner_microstep: 701.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 07:51:15,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1519.80 | bwd_inner_microstep: 1519.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 07:51:17,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1515.06 | bwd_inner_microstep: 1515.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077 [2024-06-10 07:51:18,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.28 | bwd_microstep: 915.52 | bwd_inner_microstep: 915.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 07:51:20,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1381.57 | bwd_inner_microstep: 1381.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086 [2024-06-10 07:51:21,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.28 | bwd_microstep: 821.13 | bwd_inner_microstep: 821.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3880 [2024-06-10 07:51:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1617.38 | bwd_inner_microstep: 1617.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-10 07:51:25,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1311.26 | bwd_inner_microstep: 1311.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 07:51:28,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1558.13 | bwd_inner_microstep: 1558.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 07:51:30,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1504.22 | bwd_inner_microstep: 1504.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3771 [2024-06-10 07:51:32,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.50 | bwd_microstep: 1746.16 | bwd_inner_microstep: 1746.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 07:51:34,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1378.11 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 07:51:36,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1516.83 | bwd_inner_microstep: 1516.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3305 [2024-06-10 07:51:44,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 07:51:44,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.10 | bwd_microstep: 7092.40 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 5747.19 | step_microstep: 38.03 [2024-06-10 07:51:44,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15018.58 | bwd: 46078.87 | bwd_inner: 40330.65 | bwd_allreduce: 5747.47 | step: 39.74 {'loss': 1.2906, 'learning_rate': 3.553583049974933e-05, 'epoch': 0.24} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 07:51:46,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.01 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.76 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843 [2024-06-10 07:51:48,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.02 | bwd_microstep: 1554.73 | bwd_inner_microstep: 1554.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 07:51:50,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 07:51:51,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 795.10 | bwd_inner_microstep: 795.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-10 07:51:52,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.59 | bwd_microstep: 788.17 | bwd_inner_microstep: 788.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 07:51:54,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.30 | bwd_microstep: 1279.25 | bwd_inner_microstep: 1279.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 07:51:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.53 | bwd_microstep: 1283.79 | bwd_inner_microstep: 1283.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 07:51:57,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.41 | bwd_microstep: 1149.58 | bwd_inner_microstep: 1149.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872 [2024-06-10 07:51:58,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.17 | bwd_microstep: 678.98 | bwd_inner_microstep: 678.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 07:52:00,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1282.31 | bwd_inner_microstep: 1282.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 07:52:02,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1278.51 | bwd_inner_microstep: 1278.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498 [2024-06-10 07:52:04,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3658 [2024-06-10 07:52:06,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1461.64 | bwd_inner_microstep: 1461.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 07:52:08,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1526.30 | bwd_inner_microstep: 1526.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967 [2024-06-10 07:52:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.43 | bwd_microstep: 856.68 | bwd_inner_microstep: 856.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 07:52:11,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1487.91 | bwd_inner_microstep: 1487.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3425 [2024-06-10 07:52:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1396.06 | bwd_inner_microstep: 1396.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 07:52:15,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1354.09 | bwd_inner_microstep: 1354.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-10 07:52:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1443.03 | bwd_inner_microstep: 1443.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 07:52:19,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1349.78 | bwd_inner_microstep: 1349.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3440 [2024-06-10 07:52:20,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.99 | bwd_microstep: 1188.33 | bwd_inner_microstep: 1188.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 07:52:22,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1400.10 | bwd_inner_microstep: 1400.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 07:52:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 07:52:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1409.26 | bwd_inner_microstep: 1409.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-10 07:52:28,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1359.13 | bwd_inner_microstep: 1359.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 07:52:30,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1354.70 | bwd_inner_microstep: 1354.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581 [2024-06-10 07:52:32,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1330.63 | bwd_inner_microstep: 1330.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 07:52:34,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1442.79 | bwd_inner_microstep: 1442.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 07:52:36,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1429.84 | bwd_inner_microstep: 1429.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3791 [2024-06-10 07:52:38,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.86 | bwd_microstep: 1354.74 | bwd_inner_microstep: 1354.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3765 [2024-06-10 07:52:40,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1345.77 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770 [2024-06-10 07:52:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-10 07:52:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.47 | bwd_microstep: 2312.12 | bwd_inner_microstep: 1514.24 | bwd_allreduce_microstep: 797.82 | step_microstep: 37.69 [2024-06-10 07:52:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15666.75 | bwd: 42679.09 | bwd_inner: 41880.25 | bwd_allreduce: 798.11 | step: 39.27 {'loss': 1.2971, 'learning_rate': 3.5512165848282225e-05, 'epoch': 0.24} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 07:52:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.07 | bwd_microstep: 1482.16 | bwd_inner_microstep: 1482.06 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 07:52:45,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.26 | bwd_microstep: 678.46 | bwd_inner_microstep: 678.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 07:52:47,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1376.88 | bwd_inner_microstep: 1376.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3913 [2024-06-10 07:52:49,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.12 | bwd_microstep: 1593.15 | bwd_inner_microstep: 1593.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 07:52:52,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.91 | bwd_microstep: 1485.30 | bwd_inner_microstep: 1485.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 07:52:53,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.53 | bwd_microstep: 808.48 | bwd_inner_microstep: 808.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2666 [2024-06-10 07:52:54,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.59 | bwd_microstep: 1073.69 | bwd_inner_microstep: 1073.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 07:52:56,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1485.01 | bwd_inner_microstep: 1484.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 07:52:58,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1250.99 | bwd_inner_microstep: 1250.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477 [2024-06-10 07:53:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.44 | bwd_microstep: 1247.30 | bwd_inner_microstep: 1247.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 07:53:01,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.72 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 07:53:03,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.98 | bwd_microstep: 792.22 | bwd_inner_microstep: 792.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-10 07:53:04,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1424.62 | bwd_inner_microstep: 1424.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-10 07:53:06,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1319.50 | bwd_inner_microstep: 1319.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3897 [2024-06-10 07:53:09,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1629.91 | bwd_inner_microstep: 1629.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 07:53:10,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1342.42 | bwd_inner_microstep: 1342.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669 [2024-06-10 07:53:13,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1521.77 | bwd_inner_microstep: 1521.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3839 [2024-06-10 07:53:15,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.36 | bwd_microstep: 1856.24 | bwd_inner_microstep: 1856.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3658 [2024-06-10 07:53:17,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.64 | bwd_microstep: 1567.13 | bwd_inner_microstep: 1567.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3697 [2024-06-10 07:53:20,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.84 | bwd_microstep: 1722.91 | bwd_inner_microstep: 1722.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 07:53:22,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.88 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3683 [2024-06-10 07:53:23,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1230.86 | bwd_inner_microstep: 1230.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 07:53:26,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1554.59 | bwd_inner_microstep: 1554.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2067 [2024-06-10 07:53:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.50 | bwd_microstep: 849.50 | bwd_inner_microstep: 849.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 07:53:29,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.45 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3561 [2024-06-10 07:53:30,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.67 | bwd_microstep: 1200.04 | bwd_inner_microstep: 1200.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3453 [2024-06-10 07:53:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.61 | bwd_microstep: 1219.95 | bwd_inner_microstep: 1219.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 07:53:34,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 972.36 | bwd_inner_microstep: 972.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049 [2024-06-10 07:53:35,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 910.26 | bwd_inner_microstep: 910.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597 [2024-06-10 07:53:37,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.58 | bwd_microstep: 1445.32 | bwd_inner_microstep: 1445.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-10 07:53:38,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1158.07 | bwd_inner_microstep: 1158.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 07:53:44,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.37 | optimizer_step: 6.63 [2024-06-10 07:53:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 4942.77 | bwd_inner_microstep: 1567.59 | bwd_allreduce_microstep: 3375.11 | step_microstep: 39.13 [2024-06-10 07:53:44,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15703.64 | bwd: 45481.76 | bwd_inner: 42105.63 | bwd_allreduce: 3375.41 | step: 40.82 {'loss': 1.2912, 'learning_rate': 3.5488446563047645e-05, 'epoch': 0.24} s/it] 24%|██▍ | 412/1726 [7:11:16<22:48:32, 62.49s/it] 24%|██▍ | 413/1726 [7:12:16<22:37:17, 62.02s/it] 24%|██▍ | 413/1726 [7:12:16<22:37:17, 62.02s/it] 24%|██▍ | 414/1726 [7:13:19<22:39:31, 62.17s/it] 24%|██▍ | 414/1726 [7:13:19<22:39:31, 62.17s/it] 24%|██▍ | 415/1726 [7:14:20<22:33:40, 61.95s/it] 24%|██▍ | 415/1726 [7:14:20<22:33:40, 61.95s/it] 24%|██▍ | 416/1726 [7:15:19<22:11:14, 60.97s/it] 24%|██▍ | 416/1726 [7:15:19<22:11:14, 60.97s/it] 24%|██▍ | 417/1726 [7:16:21<22:13:59, 61.15s/it] 24%|██▍ | 417/1726 [7:16:21<22:13:59, 61.15dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 07:53:46,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1464.38 | bwd_inner_microstep: 1464.27 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3393 [2024-06-10 07:53:48,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.35 | bwd_microstep: 1369.61 | bwd_inner_microstep: 1369.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 07:53:50,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1353.14 | bwd_inner_microstep: 1353.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757 [2024-06-10 07:53:52,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.06 | bwd_microstep: 1637.65 | bwd_inner_microstep: 1637.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 07:53:54,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1394.55 | bwd_inner_microstep: 1394.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 07:53:56,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.12 | bwd_microstep: 1281.05 | bwd_inner_microstep: 1281.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 07:53:57,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1254.10 | bwd_inner_microstep: 1254.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 07:53:59,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-10 07:54:01,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1514.19 | bwd_inner_microstep: 1514.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601 [2024-06-10 07:54:03,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1311.79 | bwd_inner_microstep: 1311.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 07:54:05,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1394.33 | bwd_inner_microstep: 1394.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 07:54:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.84 | bwd_microstep: 1424.81 | bwd_inner_microstep: 1424.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 07:54:09,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1389.92 | bwd_inner_microstep: 1389.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 07:54:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1959 [2024-06-10 07:54:12,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.24 | bwd_microstep: 838.66 | bwd_inner_microstep: 838.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 07:54:14,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.33 | bwd_microstep: 1287.43 | bwd_inner_microstep: 1287.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3840 [2024-06-10 07:54:16,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1486.95 | bwd_inner_microstep: 1486.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2284 [2024-06-10 07:54:17,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.41 | bwd_microstep: 880.15 | bwd_inner_microstep: 880.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 07:54:19,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 07:54:21,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1311.21 | bwd_inner_microstep: 1311.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 07:54:23,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1254.91 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3531 [2024-06-10 07:54:24,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.33 | bwd_microstep: 1424.70 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3669 [2024-06-10 07:54:26,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808 [2024-06-10 07:54:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.68 | bwd_microstep: 1356.31 | bwd_inner_microstep: 1356.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643 [2024-06-10 07:54:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1317.66 | bwd_inner_microstep: 1317.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 07:54:32,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 07:54:34,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1491.92 | bwd_inner_microstep: 1491.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 07:54:36,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.98 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 07:54:38,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.55 | bwd_microstep: 1654.66 | bwd_inner_microstep: 1654.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065 [2024-06-10 07:54:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.32 | bwd_microstep: 818.40 | bwd_inner_microstep: 818.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2234 [2024-06-10 07:54:41,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 959.54 | bwd_inner_microstep: 959.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3556 [2024-06-10 07:54:43,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.14 | optimizer_step: 6.61 [2024-06-10 07:54:43,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.84 | bwd_microstep: 1953.77 | bwd_inner_microstep: 1642.23 | bwd_allreduce_microstep: 311.49 | step_microstep: 37.64 [2024-06-10 07:54:43,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16019.21 | bwd: 43177.87 | bwd_inner: 42865.38 | bwd_allreduce: 311.78 | step: 39.24 {'loss': 1.3108, 'learning_rate': 3.546467272758479e-05, 'epoch': 0.24} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2908 [2024-06-10 07:54:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.69 | bwd_microstep: 1085.67 | bwd_inner_microstep: 1085.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 07:54:47,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1280.53 | bwd_inner_microstep: 1280.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2010 [2024-06-10 07:54:48,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.46 | bwd_microstep: 832.71 | bwd_inner_microstep: 832.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 07:54:50,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.16 | bwd_microstep: 1150.62 | bwd_inner_microstep: 1150.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 07:54:51,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 07:54:52,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 791.48 | bwd_inner_microstep: 791.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708 [2024-06-10 07:54:54,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.23 | bwd_microstep: 1422.83 | bwd_inner_microstep: 1422.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722 [2024-06-10 07:54:56,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1335.99 | bwd_inner_microstep: 1335.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481 [2024-06-10 07:54:57,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3682 [2024-06-10 07:54:59,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.91 | bwd_microstep: 1354.40 | bwd_inner_microstep: 1354.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2377 [2024-06-10 07:55:00,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.95 | bwd_microstep: 966.04 | bwd_inner_microstep: 966.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3503 [2024-06-10 07:55:03,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1549.23 | bwd_inner_microstep: 1549.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 07:55:04,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.79 | bwd_inner_microstep: 1348.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1979 [2024-06-10 07:55:06,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.37 | bwd_microstep: 780.12 | bwd_inner_microstep: 780.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3438 [2024-06-10 07:55:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.01 | bwd_microstep: 1203.09 | bwd_inner_microstep: 1203.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509 [2024-06-10 07:55:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.26 | bwd_microstep: 1685.43 | bwd_inner_microstep: 1685.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3639 [2024-06-10 07:55:12,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 07:55:14,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1448.97 | bwd_inner_microstep: 1448.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 07:55:15,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 800.03 | bwd_inner_microstep: 800.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-10 07:55:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.46 | bwd_microstep: 976.49 | bwd_inner_microstep: 976.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3873 [2024-06-10 07:55:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1489.81 | bwd_inner_microstep: 1489.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 07:55:20,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1608.64 | bwd_inner_microstep: 1608.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 07:55:22,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1377.66 | bwd_inner_microstep: 1377.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534 [2024-06-10 07:55:24,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1325.71 | bwd_inner_microstep: 1325.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 07:55:26,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1401.24 | bwd_inner_microstep: 1401.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 07:55:28,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1400.15 | bwd_inner_microstep: 1400.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 07:55:30,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 07:55:32,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1400.27 | bwd_inner_microstep: 1400.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2064 [2024-06-10 07:55:33,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.05 | bwd_microstep: 724.11 | bwd_inner_microstep: 724.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 07:55:35,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.55 | bwd_microstep: 1657.69 | bwd_inner_microstep: 1657.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2197 [2024-06-10 07:55:36,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.16 | bwd_microstep: 1016.89 | bwd_inner_microstep: 1016.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2002 [2024-06-10 07:55:43,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.57 [2024-06-10 07:55:43,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.85 | bwd_microstep: 6159.62 | bwd_inner_microstep: 811.73 | bwd_allreduce_microstep: 5347.84 | step_microstep: 38.11 [2024-06-10 07:55:43,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14650.72 | bwd: 44450.88 | bwd_inner: 39102.13 | bwd_allreduce: 5348.07 | step: 39.65 {'loss': 1.2494, 'learning_rate': 3.544084442562498e-05, 'epoch': 0.24} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994 [2024-06-10 07:55:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.29 | bwd_microstep: 884.00 | bwd_inner_microstep: 883.94 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895 [2024-06-10 07:55:46,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.47 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2396 [2024-06-10 07:55:48,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.68 | bwd_microstep: 1032.28 | bwd_inner_microstep: 1032.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 07:55:49,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3771 [2024-06-10 07:55:52,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.74 | bwd_microstep: 1499.07 | bwd_inner_microstep: 1499.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3738 [2024-06-10 07:55:53,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1363.73 | bwd_inner_microstep: 1363.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479 [2024-06-10 07:55:55,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1330.36 | bwd_inner_microstep: 1330.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405 [2024-06-10 07:55:57,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.23 | bwd_microstep: 1180.02 | bwd_inner_microstep: 1179.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 07:55:59,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495 [2024-06-10 07:56:01,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3514 [2024-06-10 07:56:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1512.53 | bwd_inner_microstep: 1512.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3773 [2024-06-10 07:56:05,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.00 | bwd_microstep: 1583.32 | bwd_inner_microstep: 1583.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 07:56:07,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.76 | bwd_microstep: 1582.90 | bwd_inner_microstep: 1582.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3454 [2024-06-10 07:56:09,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.99 | bwd_microstep: 1513.44 | bwd_inner_microstep: 1513.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3682 [2024-06-10 07:56:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1719.93 | bwd_inner_microstep: 1719.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 07:56:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.06 | bwd_microstep: 1524.17 | bwd_inner_microstep: 1524.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 07:56:16,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1438.54 | bwd_inner_microstep: 1438.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3622 [2024-06-10 07:56:18,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.00 | bwd_microstep: 1676.74 | bwd_inner_microstep: 1676.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825 [2024-06-10 07:56:20,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1385.72 | bwd_inner_microstep: 1385.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-10 07:56:21,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 711.53 | bwd_inner_microstep: 711.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3606 [2024-06-10 07:56:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1243.35 | bwd_inner_microstep: 1243.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615 [2024-06-10 07:56:24,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1438.71 | bwd_inner_microstep: 1438.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 07:56:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1307.29 | bwd_inner_microstep: 1307.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3527 [2024-06-10 07:56:28,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.30 | bwd_microstep: 1197.40 | bwd_inner_microstep: 1197.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611 [2024-06-10 07:56:30,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1620.91 | bwd_inner_microstep: 1620.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 07:56:32,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1495.00 | bwd_inner_microstep: 1494.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 07:56:34,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.18 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 07:56:36,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1536.70 | bwd_inner_microstep: 1536.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 07:56:38,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.07 | bwd_microstep: 1611.84 | bwd_inner_microstep: 1611.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488 [2024-06-10 07:56:40,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.27 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 07:56:42,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1299.41 | bwd_inner_microstep: 1299.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451 [2024-06-10 07:56:44,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.17 | optimizer_step: 6.59 [2024-06-10 07:56:44,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.90 | bwd_microstep: 1246.49 | bwd_inner_microstep: 1221.98 | bwd_allreduce_microstep: 24.47 | step_microstep: 37.68 [2024-06-10 07:56:44,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16489.00 | bwd: 44174.68 | bwd_inner: 44149.26 | bwd_allreduce: 24.72 | step: 39.20 {'loss': 1.2994, 'learning_rate': 3.541696174109137e-05, 'epoch': 0.24} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-10 07:56:46,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.56 | bwd_microstep: 1338.02 | bwd_inner_microstep: 1337.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 07:56:48,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.30 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2352 [2024-06-10 07:56:49,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.46 | bwd_microstep: 891.79 | bwd_inner_microstep: 891.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3408 [2024-06-10 07:56:50,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1213.74 | bwd_inner_microstep: 1213.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 07:56:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.16 | bwd_microstep: 1437.22 | bwd_inner_microstep: 1437.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 07:56:54,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.27 | bwd_microstep: 1153.24 | bwd_inner_microstep: 1153.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 07:56:55,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.18 | bwd_microstep: 797.26 | bwd_inner_microstep: 797.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 07:56:57,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-10 07:56:58,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.71 | bwd_microstep: 679.14 | bwd_inner_microstep: 679.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1971 [2024-06-10 07:56:59,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.46 | bwd_microstep: 735.40 | bwd_inner_microstep: 735.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-10 07:57:00,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.25 | bwd_microstep: 888.17 | bwd_inner_microstep: 888.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 07:57:02,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1347.96 | bwd_inner_microstep: 1347.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-10 07:57:04,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.41 | bwd_microstep: 1432.30 | bwd_inner_microstep: 1432.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3057 [2024-06-10 07:57:06,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.70 | bwd_microstep: 1297.67 | bwd_inner_microstep: 1297.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 07:57:08,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1515.80 | bwd_inner_microstep: 1515.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-10 07:57:10,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.65 | bwd_microstep: 1365.94 | bwd_inner_microstep: 1365.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2486 [2024-06-10 07:57:11,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.97 | bwd_microstep: 1001.66 | bwd_inner_microstep: 1001.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 07:57:12,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.62 | bwd_microstep: 805.54 | bwd_inner_microstep: 805.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 07:57:14,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.69 | bwd_microstep: 1292.86 | bwd_inner_microstep: 1292.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526 [2024-06-10 07:57:16,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1422.65 | bwd_inner_microstep: 1422.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 07:57:18,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-10 07:57:20,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1322.24 | bwd_inner_microstep: 1322.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-10 07:57:22,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1445.02 | bwd_inner_microstep: 1444.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478 [2024-06-10 07:57:23,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1314.52 | bwd_inner_microstep: 1314.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2043 [2024-06-10 07:57:26,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.62 | bwd_microstep: 1935.16 | bwd_inner_microstep: 1935.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182 [2024-06-10 07:57:27,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.32 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 07:57:29,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1457.26 | bwd_inner_microstep: 1457.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 07:57:31,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433 [2024-06-10 07:57:33,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2961 [2024-06-10 07:57:34,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.06 | bwd_microstep: 1262.08 | bwd_inner_microstep: 1262.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3765 [2024-06-10 07:57:37,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.29 | bwd_microstep: 1739.13 | bwd_inner_microstep: 1739.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3764 [2024-06-10 07:57:48,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.60 [2024-06-10 07:57:48,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.49 | bwd_microstep: 10421.32 | bwd_inner_microstep: 2295.46 | bwd_allreduce_microstep: 8125.80 | step_microstep: 39.06 [2024-06-10 07:57:48,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14688.87 | bwd: 48852.85 | bwd_inner: 40726.12 | bwd_allreduce: 8126.04 | step: 40.60 {'loss': 1.2845, 'learning_rate': 3.5393024758098645e-05, 'epoch': 0.24} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2108 [2024-06-10 07:57:49,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.03 | bwd_microstep: 913.31 | bwd_inner_microstep: 913.17 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401 [2024-06-10 07:57:51,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.38 | bwd_microstep: 1213.87 | bwd_inner_microstep: 1213.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416 [2024-06-10 07:57:52,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.64 | bwd_microstep: 1175.52 | bwd_inner_microstep: 1175.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4168 [2024-06-10 07:57:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.19 | bwd_microstep: 1747.79 | bwd_inner_microstep: 1747.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 07:57:57,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.32 | bwd_microstep: 1478.91 | bwd_inner_microstep: 1478.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2070 [2024-06-10 07:57:58,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.08 | bwd_microstep: 726.06 | bwd_inner_microstep: 726.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 07:58:00,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1401.92 | bwd_inner_microstep: 1401.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-10 07:58:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1564.61 | bwd_inner_microstep: 1564.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 713 [2024-06-10 07:58:02,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.00 | bwd_microstep: 291.87 | bwd_inner_microstep: 291.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-10 07:58:05,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1574.41 | bwd_inner_microstep: 1574.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3662 [2024-06-10 07:58:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.22 | bwd_microstep: 1422.23 | bwd_inner_microstep: 1422.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 07:58:09,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.54 | bwd_microstep: 1577.57 | bwd_inner_microstep: 1577.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914 [2024-06-10 07:58:10,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.08 | bwd_microstep: 780.51 | bwd_inner_microstep: 780.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 07:58:12,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1373.97 | bwd_inner_microstep: 1373.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3099 [2024-06-10 07:58:13,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1246.63 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 07:58:15,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1391.87 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 07:58:17,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 07:58:19,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 07:58:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.91 | bwd_microstep: 1192.89 | bwd_inner_microstep: 1192.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 07:58:23,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.83 | bwd_microstep: 1289.14 | bwd_inner_microstep: 1289.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 07:58:25,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1413.34 | bwd_inner_microstep: 1413.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046 [2024-06-10 07:58:26,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 808.94 | bwd_inner_microstep: 808.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3719 [2024-06-10 07:58:28,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1496.07 | bwd_inner_microstep: 1496.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 07:58:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1293.25 | bwd_inner_microstep: 1293.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 07:58:32,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.64 | bwd_microstep: 1381.89 | bwd_inner_microstep: 1381.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-10 07:58:34,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 07:58:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.05 | bwd_microstep: 1629.69 | bwd_inner_microstep: 1629.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3429 [2024-06-10 07:58:38,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1399.43 | bwd_inner_microstep: 1399.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554 [2024-06-10 07:58:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.42 | bwd_microstep: 1331.91 | bwd_inner_microstep: 1331.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3395 [2024-06-10 07:58:41,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1408.95 | bwd_inner_microstep: 1408.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-10 07:58:43,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777 [2024-06-10 07:58:50,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-10 07:58:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 5649.07 | bwd_inner_microstep: 1875.03 | bwd_allreduce_microstep: 3773.97 | step_microstep: 38.80 [2024-06-10 07:58:50,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15698.54 | bwd: 45907.44 | bwd_inner: 42132.43 | bwd_allreduce: 3774.27 | step: 40.51 {'loss': 1.2664, 'learning_rate': 3.5369033560952756e-05, 'epoch': 0.24} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 07:58:52,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1272.64 | bwd_inner_microstep: 1272.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860 [2024-06-10 07:58:54,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3896 [2024-06-10 07:58:56,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1417.89 | bwd_inner_microstep: 1417.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798 [2024-06-10 07:58:58,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.56 | bwd_microstep: 1445.15 | bwd_inner_microstep: 1445.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2297 [2024-06-10 07:58:59,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.61 | bwd_microstep: 910.93 | bwd_inner_microstep: 910.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630 [2024-06-10 07:59:01,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.03 | bwd_microstep: 1314.54 | bwd_inner_microstep: 1314.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955 [2024-06-10 07:59:02,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 733.05 | bwd_inner_microstep: 733.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3172 [2024-06-10 07:59:04,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1320.26 | bwd_inner_microstep: 1320.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 07:59:05,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1382.72 | bwd_inner_microstep: 1382.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2009 [2024-06-10 07:59:07,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.77 | bwd_microstep: 866.56 | bwd_inner_microstep: 866.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 07:59:09,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551 [2024-06-10 07:59:10,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.28 | bwd_microstep: 1238.25 | bwd_inner_microstep: 1238.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3471 [2024-06-10 07:59:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1360.02 | bwd_inner_microstep: 1359.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 07:59:14,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1485.34 | bwd_inner_microstep: 1485.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 07:59:16,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1346.47 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 07:59:18,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1384.07 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 07:59:20,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1399.49 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 07:59:22,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 07:59:24,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1400.34 | bwd_inner_microstep: 1400.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 07:59:26,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.16 | bwd_microstep: 1659.78 | bwd_inner_microstep: 1659.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 07:59:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1286.37 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 07:59:30,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558 [2024-06-10 07:59:32,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.80 | bwd_microstep: 1298.21 | bwd_inner_microstep: 1298.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 07:59:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.58 | bwd_microstep: 1407.79 | bwd_inner_microstep: 1407.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2166 [2024-06-10 07:59:35,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.27 | bwd_microstep: 853.78 | bwd_inner_microstep: 853.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826 [2024-06-10 07:59:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.63 | bwd_microstep: 1517.73 | bwd_inner_microstep: 1517.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 07:59:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.70 | bwd_microstep: 1256.02 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2460 [2024-06-10 07:59:40,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.65 | bwd_microstep: 1050.89 | bwd_inner_microstep: 1050.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 07:59:42,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3613 [2024-06-10 07:59:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1434.79 | bwd_inner_microstep: 1434.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 07:59:46,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1495.65 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2733 [2024-06-10 07:59:53,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.35 | optimizer_step: 6.59 [2024-06-10 07:59:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.75 | bwd_microstep: 5598.16 | bwd_inner_microstep: 1324.08 | bwd_allreduce_microstep: 4274.02 | step_microstep: 38.79 [2024-06-10 07:59:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15831.06 | bwd: 46625.80 | bwd_inner: 42350.85 | bwd_allreduce: 4274.26 | step: 40.39 s/it] 24%|██▍ | 418/1726 [7:17:20<22:02:31, 60.67s/it] 24%|██▍ | 418/1726 [7:17:20<22:02:31, 60.67s/it] 24%|██▍ | 419/1726 [7:18:20<21:53:26, 60.30s/it] 24%|██▍ | 419/1726 [7:18:20<21:53:26, 60.30s/it] 24%|██▍ | 420/1726 [7:19:21<21:57:03, 60.51s/it] 24%|██▍ | 420/1726 [7:19:21<21:57:03, 60.51s/it] 24%|██▍ | 421/1726 [7:20:25<22:18:02, 61.52s/it] 24%|██▍ | 421/1726 [7:20:25<22:18:02, 61.52s/it] 24%|██▍ | 422/1726 [7:21:26<22:19:52, 61.65s/it] 24%|██▍ | 422/1726 [7:21:26<22:19:52, 61.65s/it] 25%|██▍ | 423/1726 [7:22:29<22:26:19, 61.99s/it] {'loss': 1.2779, 'learning_rate': 3.534498823415056e-05, 'epoch': 0.25} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 07:59:54,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1371.82 | bwd_inner_microstep: 1371.68 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 07:59:56,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1279.92 | bwd_inner_microstep: 1279.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3984 [2024-06-10 07:59:59,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.15 | bwd_microstep: 1704.73 | bwd_inner_microstep: 1704.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 08:00:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765 [2024-06-10 08:00:02,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1473.43 | bwd_inner_microstep: 1473.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540 [2024-06-10 08:00:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1424.79 | bwd_inner_microstep: 1424.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1945 [2024-06-10 08:00:05,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.37 | bwd_microstep: 760.13 | bwd_inner_microstep: 760.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 08:00:07,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1248.43 | bwd_inner_microstep: 1248.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 08:00:09,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1528.61 | bwd_inner_microstep: 1528.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 08:00:11,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2174 [2024-06-10 08:00:13,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.95 | bwd_microstep: 952.16 | bwd_inner_microstep: 952.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:00:14,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 08:00:16,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.99 | bwd_microstep: 1380.74 | bwd_inner_microstep: 1380.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674 [2024-06-10 08:00:19,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.24 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3427 [2024-06-10 08:00:21,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1409.65 | bwd_inner_microstep: 1409.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3642 [2024-06-10 08:00:22,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650 [2024-06-10 08:00:24,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.54 | bwd_microstep: 1424.32 | bwd_inner_microstep: 1424.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645 [2024-06-10 08:00:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 1415.71 | bwd_inner_microstep: 1415.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 08:00:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1509.68 | bwd_inner_microstep: 1509.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-10 08:00:29,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.15 | bwd_microstep: 815.53 | bwd_inner_microstep: 815.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3458 [2024-06-10 08:00:31,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1355.74 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830 [2024-06-10 08:00:34,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.67 | bwd_microstep: 1757.07 | bwd_inner_microstep: 1757.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800 [2024-06-10 08:00:36,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.64 | bwd_microstep: 1646.54 | bwd_inner_microstep: 1646.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 08:00:38,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1353.75 | bwd_inner_microstep: 1353.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 08:00:40,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1259.16 | bwd_inner_microstep: 1259.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2206 [2024-06-10 08:00:41,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.00 | bwd_microstep: 767.19 | bwd_inner_microstep: 767.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2268 [2024-06-10 08:00:42,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.88 | bwd_microstep: 876.04 | bwd_inner_microstep: 876.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3771 [2024-06-10 08:00:44,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1562.34 | bwd_inner_microstep: 1562.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564 [2024-06-10 08:00:46,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1331.89 | bwd_inner_microstep: 1331.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278 [2024-06-10 08:00:47,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.59 | bwd_microstep: 1070.86 | bwd_inner_microstep: 1070.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 08:00:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.88 | bwd_microstep: 1405.60 | bwd_inner_microstep: 1405.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-10 08:00:56,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 08:00:56,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 6385.80 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 4859.97 | step_microstep: 38.28 [2024-06-10 08:00:56,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15880.71 | bwd: 47457.17 | bwd_inner: 42596.16 | bwd_allreduce: 4860.27 | step: 39.87 {'loss': 1.3103, 'learning_rate': 3.532088886237956e-05, 'epoch': 0.25} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619 [2024-06-10 08:00:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1428.20 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 08:01:00,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1476.54 | bwd_inner_microstep: 1476.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901 [2024-06-10 08:01:02,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1587.33 | bwd_inner_microstep: 1587.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 08:01:04,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.15 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 08:01:06,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746 [2024-06-10 08:01:08,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.58 | bwd_microstep: 1638.72 | bwd_inner_microstep: 1638.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 08:01:10,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1382.65 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 08:01:11,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1254.80 | bwd_inner_microstep: 1254.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1961 [2024-06-10 08:01:12,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.46 | bwd_microstep: 766.32 | bwd_inner_microstep: 766.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2084 [2024-06-10 08:01:14,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.87 | bwd_microstep: 883.58 | bwd_inner_microstep: 883.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 08:01:16,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 08:01:17,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1351.72 | bwd_inner_microstep: 1351.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 08:01:19,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.97 | bwd_microstep: 1454.86 | bwd_inner_microstep: 1454.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3517 [2024-06-10 08:01:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1430.70 | bwd_inner_microstep: 1430.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906 [2024-06-10 08:01:22,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.04 | bwd_microstep: 686.48 | bwd_inner_microstep: 686.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 08:01:24,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1402.77 | bwd_inner_microstep: 1402.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295 [2024-06-10 08:01:26,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 882.68 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 08:01:27,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1318.51 | bwd_inner_microstep: 1318.30 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 08:01:29,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1378.51 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930 [2024-06-10 08:01:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 697.92 | bwd_inner_microstep: 697.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 08:01:32,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180 [2024-06-10 08:01:33,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.63 | bwd_microstep: 858.44 | bwd_inner_microstep: 858.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901 [2024-06-10 08:01:34,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.86 | bwd_microstep: 686.62 | bwd_inner_microstep: 686.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3689 [2024-06-10 08:01:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.34 | bwd_microstep: 1725.24 | bwd_inner_microstep: 1725.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3653 [2024-06-10 08:01:39,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825 [2024-06-10 08:01:41,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3889 [2024-06-10 08:01:43,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.97 | bwd_microstep: 1636.69 | bwd_inner_microstep: 1636.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811 [2024-06-10 08:01:45,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.51 | bwd_microstep: 1758.90 | bwd_inner_microstep: 1758.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 08:01:48,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.33 | bwd_microstep: 1548.97 | bwd_inner_microstep: 1548.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 08:01:50,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1595.41 | bwd_inner_microstep: 1595.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 08:01:52,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1505.64 | bwd_inner_microstep: 1505.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 08:01:58,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 08:01:58,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.29 | bwd_microstep: 5342.64 | bwd_inner_microstep: 1615.47 | bwd_allreduce_microstep: 3727.12 | step_microstep: 38.12 [2024-06-10 08:01:58,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15592.93 | bwd: 45650.21 | bwd_inner: 41922.00 | bwd_allreduce: 3727.43 | step: 39.78 {'loss': 1.2599, 'learning_rate': 3.5296735530517646e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 08:02:00,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.89 | bwd_microstep: 1467.73 | bwd_inner_microstep: 1467.53 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2635 [2024-06-10 08:02:01,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.74 | bwd_microstep: 1049.96 | bwd_inner_microstep: 1049.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-10 08:02:02,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.31 | bwd_microstep: 809.21 | bwd_inner_microstep: 809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 08:02:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1481.17 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 08:02:06,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1245.89 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 08:02:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 08:02:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.80 | bwd_microstep: 1384.44 | bwd_inner_microstep: 1384.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 08:02:12,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.27 | bwd_microstep: 1152.96 | bwd_inner_microstep: 1152.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 08:02:13,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.71 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 08:02:15,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1395.93 | bwd_inner_microstep: 1395.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 08:02:17,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1429.01 | bwd_inner_microstep: 1428.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-10 08:02:19,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1417.67 | bwd_inner_microstep: 1417.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1921 [2024-06-10 08:02:20,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.28 | bwd_microstep: 760.38 | bwd_inner_microstep: 760.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3516 [2024-06-10 08:02:22,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.72 | bwd_microstep: 1238.85 | bwd_inner_microstep: 1238.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3678 [2024-06-10 08:02:24,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.10 | bwd_microstep: 1480.10 | bwd_inner_microstep: 1480.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:02:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.86 | bwd_microstep: 1381.16 | bwd_inner_microstep: 1381.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-10 08:02:27,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 913.04 | bwd_inner_microstep: 913.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080 [2024-06-10 08:02:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.16 | bwd_microstep: 921.78 | bwd_inner_microstep: 921.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3630 [2024-06-10 08:02:31,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.20 | bwd_microstep: 1708.40 | bwd_inner_microstep: 1708.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640 [2024-06-10 08:02:33,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1541.05 | bwd_inner_microstep: 1541.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 08:02:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1460.17 | bwd_inner_microstep: 1460.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813 [2024-06-10 08:02:37,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1480.33 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 08:02:39,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.94 | bwd_microstep: 1452.88 | bwd_inner_microstep: 1452.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 08:02:40,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.98 | bwd_microstep: 800.48 | bwd_inner_microstep: 800.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437 [2024-06-10 08:02:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.98 | bwd_microstep: 1313.40 | bwd_inner_microstep: 1313.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829 [2024-06-10 08:02:44,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1515.55 | bwd_inner_microstep: 1515.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 08:02:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1351.30 | bwd_inner_microstep: 1351.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 08:02:48,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.36 | bwd_microstep: 1661.76 | bwd_inner_microstep: 1661.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 08:02:50,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-10 08:02:52,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1442.99 | bwd_inner_microstep: 1442.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751 [2024-06-10 08:02:54,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.91 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2895 [2024-06-10 08:02:57,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 08:02:57,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.19 | bwd_microstep: 2262.42 | bwd_inner_microstep: 1269.59 | bwd_allreduce_microstep: 992.78 | step_microstep: 37.73 [2024-06-10 08:02:57,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15746.69 | bwd: 43198.01 | bwd_inner: 42204.13 | bwd_allreduce: 993.09 | step: 39.43 {'loss': 1.2757, 'learning_rate': 3.527252832363271e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 08:02:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1465.11 | bwd_inner_microstep: 1465.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 08:03:01,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.47 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3811 [2024-06-10 08:03:03,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 08:03:05,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.20 | bwd_microstep: 1480.04 | bwd_inner_microstep: 1480.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 08:03:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1377.03 | bwd_inner_microstep: 1377.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 08:03:09,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.60 | bwd_microstep: 1532.22 | bwd_inner_microstep: 1532.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-10 08:03:10,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.80 | bwd_microstep: 817.28 | bwd_inner_microstep: 817.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3701 [2024-06-10 08:03:12,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1328.24 | bwd_inner_microstep: 1328.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-10 08:03:14,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 08:03:16,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.08 | bwd_microstep: 1274.21 | bwd_inner_microstep: 1274.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513 [2024-06-10 08:03:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1583.02 | bwd_inner_microstep: 1582.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448 [2024-06-10 08:03:20,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1411.42 | bwd_inner_microstep: 1411.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 08:03:22,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 08:03:24,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1480.82 | bwd_inner_microstep: 1480.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 08:03:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1480.65 | bwd_inner_microstep: 1480.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 08:03:28,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1344.63 | bwd_inner_microstep: 1344.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-10 08:03:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.64 | bwd_microstep: 809.59 | bwd_inner_microstep: 809.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 08:03:31,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1404.55 | bwd_inner_microstep: 1404.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 08:03:33,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1291.91 | bwd_inner_microstep: 1291.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2218 [2024-06-10 08:03:34,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.60 | bwd_microstep: 961.53 | bwd_inner_microstep: 961.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-10 08:03:36,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1499.09 | bwd_inner_microstep: 1499.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 08:03:38,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.11 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842 [2024-06-10 08:03:40,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1459.75 | bwd_inner_microstep: 1459.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 08:03:42,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1494.03 | bwd_inner_microstep: 1494.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576 [2024-06-10 08:03:44,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1338.98 | bwd_inner_microstep: 1338.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011 [2024-06-10 08:03:45,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 811.14 | bwd_inner_microstep: 811.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 08:03:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1497.32 | bwd_inner_microstep: 1497.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 08:03:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1488.08 | bwd_inner_microstep: 1488.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590 [2024-06-10 08:03:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1432.98 | bwd_inner_microstep: 1432.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 08:03:53,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.51 | bwd_microstep: 1499.44 | bwd_inner_microstep: 1499.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 08:03:55,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1391.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3606 [2024-06-10 08:03:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-10 08:03:57,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.82 | bwd_microstep: 1612.27 | bwd_inner_microstep: 1604.54 | bwd_allreduce_microstep: 7.69 | step_microstep: 37.75 [2024-06-10 08:03:57,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16298.92 | bwd: 43582.98 | bwd_inner: 43574.39 | bwd_allreduce: 7.92 | step: 39.35 {'loss': 1.2866, 'learning_rate': 3.524826732698241e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-10 08:03:59,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1435.73 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462 [2024-06-10 08:04:01,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.54 | bwd_microstep: 1210.48 | bwd_inner_microstep: 1210.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 08:04:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 971.40 | bwd_inner_microstep: 971.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766 [2024-06-10 08:04:04,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1340.49 | bwd_inner_microstep: 1340.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 08:04:06,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1245.38 | bwd_inner_microstep: 1245.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586 [2024-06-10 08:04:08,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1212.70 | bwd_inner_microstep: 1212.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 08:04:10,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.19 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 08:04:12,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.58 | bwd_microstep: 1621.29 | bwd_inner_microstep: 1621.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3055 [2024-06-10 08:04:13,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.13 | bwd_microstep: 1138.56 | bwd_inner_microstep: 1138.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 08:04:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1510.77 | bwd_inner_microstep: 1510.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1992 [2024-06-10 08:04:17,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.47 | bwd_microstep: 772.44 | bwd_inner_microstep: 772.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 08:04:18,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1280.74 | bwd_inner_microstep: 1280.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 08:04:20,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.54 | bwd_microstep: 1398.99 | bwd_inner_microstep: 1398.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637 [2024-06-10 08:04:22,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1409.38 | bwd_inner_microstep: 1409.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652 [2024-06-10 08:04:25,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.67 | bwd_microstep: 1624.94 | bwd_inner_microstep: 1624.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 08:04:26,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1287.69 | bwd_inner_microstep: 1287.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2309 [2024-06-10 08:04:28,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.65 | bwd_microstep: 886.59 | bwd_inner_microstep: 886.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2485 [2024-06-10 08:04:29,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.76 | bwd_microstep: 965.13 | bwd_inner_microstep: 965.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2099 [2024-06-10 08:04:30,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.35 | bwd_microstep: 825.23 | bwd_inner_microstep: 825.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154 [2024-06-10 08:04:31,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.58 | bwd_microstep: 854.99 | bwd_inner_microstep: 854.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668 [2024-06-10 08:04:33,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1328.25 | bwd_inner_microstep: 1328.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2327 [2024-06-10 08:04:34,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 891.68 | bwd_inner_microstep: 891.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598 [2024-06-10 08:04:36,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1339.15 | bwd_inner_microstep: 1339.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 08:04:37,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.19 | bwd_microstep: 734.24 | bwd_inner_microstep: 734.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 08:04:39,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1506.72 | bwd_inner_microstep: 1506.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1895 [2024-06-10 08:04:40,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.64 | bwd_microstep: 750.19 | bwd_inner_microstep: 750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 08:04:43,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1645.10 | bwd_inner_microstep: 1645.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 08:04:44,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1286.45 | bwd_inner_microstep: 1286.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 08:04:46,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1344.42 | bwd_inner_microstep: 1344.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 08:04:48,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1632.74 | bwd_inner_microstep: 1632.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 08:04:50,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1493.21 | bwd_inner_microstep: 1493.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 08:05:00,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.58 [2024-06-10 08:05:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 9157.38 | bwd_inner_microstep: 1567.11 | bwd_allreduce_microstep: 7590.21 | step_microstep: 38.77 [2024-06-10 08:05:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14957.71 | bwd: 47582.92 | bwd_inner: 39991.67 | bwd_allreduce: 7590.52 | step: 41.48 {'loss': 1.3058, 'learning_rate': 3.522395262601386e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 08:05:02,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.79 | bwd_microstep: 1506.61 | bwd_inner_microstep: 1506.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2394 [2024-06-10 08:05:04,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.13 | bwd_microstep: 997.07 | bwd_inner_microstep: 997.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842 [2024-06-10 08:05:06,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.70 | bwd_microstep: 1555.96 | bwd_inner_microstep: 1555.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 08:05:08,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.33 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-10 08:05:09,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.34 | bwd_microstep: 974.60 | bwd_inner_microstep: 974.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:05:11,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 08:05:13,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1384.84 | bwd_inner_microstep: 1384.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 08:05:15,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1293.67 | bwd_inner_microstep: 1293.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 08:05:16,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.32 | bwd_microstep: 1401.74 | bwd_inner_microstep: 1401.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 08:05:17,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.09 | bwd_microstep: 680.35 | bwd_inner_microstep: 680.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 08:05:19,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1388.56 | bwd_inner_microstep: 1388.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093 [2024-06-10 08:05:21,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.17 | bwd_microstep: 919.01 | bwd_inner_microstep: 918.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-10 08:05:23,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1413.38 | bwd_inner_microstep: 1413.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 08:05:25,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.07 | bwd_microstep: 1633.60 | bwd_inner_microstep: 1633.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2193 [2024-06-10 08:05:26,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.78 | bwd_microstep: 987.16 | bwd_inner_microstep: 987.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3523 [2024-06-10 08:05:28,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.66 | bwd_microstep: 1254.77 | bwd_inner_microstep: 1254.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501 [2024-06-10 08:05:30,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1416.50 | bwd_inner_microstep: 1416.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433 [2024-06-10 08:05:32,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.31 | bwd_microstep: 1185.39 | bwd_inner_microstep: 1185.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2956 [2024-06-10 08:05:33,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.55 | bwd_microstep: 1010.77 | bwd_inner_microstep: 1010.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621 [2024-06-10 08:05:35,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.89 | bwd_microstep: 1435.04 | bwd_inner_microstep: 1435.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183 [2024-06-10 08:05:36,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.33 | bwd_microstep: 857.04 | bwd_inner_microstep: 857.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 08:05:38,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2187 [2024-06-10 08:05:39,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.44 | bwd_microstep: 765.94 | bwd_inner_microstep: 765.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 08:05:41,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550 [2024-06-10 08:05:43,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 1421.58 | bwd_inner_microstep: 1421.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-10 08:05:44,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.81 | bwd_microstep: 981.82 | bwd_inner_microstep: 981.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 08:05:46,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 08:05:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.28 | bwd_microstep: 1478.06 | bwd_inner_microstep: 1478.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 08:05:50,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1405.75 | bwd_inner_microstep: 1405.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604 [2024-06-10 08:05:52,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1706.66 | bwd_inner_microstep: 1706.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3584 [2024-06-10 08:05:54,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1458.09 | bwd_inner_microstep: 1458.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622 [2024-06-10 08:06:01,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 08:06:01,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 5977.52 | bwd_inner_microstep: 1817.16 | bwd_allreduce_microstep: 4160.30 | step_microstep: 38.32 [2024-06-10 08:06:01,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15286.31 | bwd: 45134.22 | bwd_inner: 40972.99 | bwd_allreduce: 4160.53 | step: 39.87 25%|██▍ | 423/1726 [7:22:29<22:26:19, 61.99s/it] 25%|██▍ | 424/1726 [7:23:33<22:36:14, 62.50s/it] 25%|██▍ | 424/1726 [7:23:33<22:36:14, 62.50s/it] 25%|██▍ | 425/1726 [7:24:35<22:29:15, 62.23s/it] 25%|██▍ | 425/1726 [7:24:35<22:29:15, 62.23s/it] 25%|██▍ | 426/1726 [7:25:34<22:09:09, 61.35s/it] 25%|██▍ | 426/1726 [7:25:34<22:09:09, 61.35s/it] 25%|██▍ | 427/1726 [7:26:34<22:00:51, 61.01s/it] 25%|██▍ | 427/1726 [7:26:34<22:00:51, 61.01s/it] 25%|██▍ | 428/1726 [7:27:37<22:11:57, 61.57s/it] 25%|██▍ | 428/1726 [7:27:37<22:11:57, 61.57s/it] 25%|██▍ | 429/1726 [7:28:38<22:05:41, {'loss': 1.2706, 'learning_rate': 3.5199584306363296e-05, 'epoch': 0.25} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 08:06:03,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1372.74 | bwd_inner_microstep: 1372.65 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 08:06:05,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.04 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893 [2024-06-10 08:06:07,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1579.23 | bwd_inner_microstep: 1579.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 08:06:09,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.50 | bwd_microstep: 1274.33 | bwd_inner_microstep: 1274.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 08:06:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1273.55 | bwd_inner_microstep: 1273.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 08:06:13,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.94 | bwd_microstep: 1628.78 | bwd_inner_microstep: 1628.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 08:06:14,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1296.57 | bwd_inner_microstep: 1296.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914 [2024-06-10 08:06:15,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 714.24 | bwd_inner_microstep: 714.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 08:06:17,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3696 [2024-06-10 08:06:20,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1629.01 | bwd_inner_microstep: 1628.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3812 [2024-06-10 08:06:22,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483 [2024-06-10 08:06:23,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.11 | bwd_microstep: 1315.07 | bwd_inner_microstep: 1315.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 08:06:26,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.77 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452 [2024-06-10 08:06:27,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.50 | bwd_microstep: 1314.17 | bwd_inner_microstep: 1314.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909 [2024-06-10 08:06:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.28 | bwd_microstep: 780.48 | bwd_inner_microstep: 780.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 08:06:30,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652 [2024-06-10 08:06:33,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.75 | bwd_microstep: 1619.31 | bwd_inner_microstep: 1619.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 08:06:35,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1400.15 | bwd_inner_microstep: 1400.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2409 [2024-06-10 08:06:36,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.45 | bwd_microstep: 938.12 | bwd_inner_microstep: 938.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958 [2024-06-10 08:06:37,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.64 | bwd_microstep: 768.27 | bwd_inner_microstep: 768.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516 [2024-06-10 08:06:39,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.03 | bwd_microstep: 1198.92 | bwd_inner_microstep: 1198.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 08:06:41,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1460.74 | bwd_inner_microstep: 1460.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 08:06:42,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1384.58 | bwd_inner_microstep: 1384.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2014 [2024-06-10 08:06:44,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.75 | bwd_microstep: 842.23 | bwd_inner_microstep: 842.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:06:46,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017 [2024-06-10 08:06:47,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.96 | bwd_microstep: 717.24 | bwd_inner_microstep: 717.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070 [2024-06-10 08:06:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.88 | bwd_microstep: 851.94 | bwd_inner_microstep: 851.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2270 [2024-06-10 08:06:49,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.95 | bwd_microstep: 1068.79 | bwd_inner_microstep: 1068.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3433 [2024-06-10 08:06:51,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1217.47 | bwd_inner_microstep: 1217.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 08:06:53,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1555.42 | bwd_inner_microstep: 1555.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3586 [2024-06-10 08:06:56,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.90 | bwd_microstep: 1750.19 | bwd_inner_microstep: 1750.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806 [2024-06-10 08:07:01,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 08:07:01,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.40 | bwd_microstep: 4726.96 | bwd_inner_microstep: 1806.32 | bwd_allreduce_microstep: 2920.59 | step_microstep: 39.67 [2024-06-10 08:07:01,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15404.97 | bwd: 44313.30 | bwd_inner: 41391.72 | bwd_allreduce: 2920.87 | step: 41.44 {'loss': 1.3353, 'learning_rate': 3.517516245385582e-05, 'epoch': 0.25} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2428 [2024-06-10 08:07:02,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.41 | bwd_microstep: 937.08 | bwd_inner_microstep: 936.90 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1137 [2024-06-10 08:07:03,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.87 | bwd_microstep: 458.89 | bwd_inner_microstep: 458.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 08:07:05,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.74 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1939 [2024-06-10 08:07:06,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 743.67 | bwd_inner_microstep: 743.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3748 [2024-06-10 08:07:08,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1444.84 | bwd_inner_microstep: 1444.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 775 [2024-06-10 08:07:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.14 | bwd_microstep: 306.45 | bwd_inner_microstep: 306.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 753 [2024-06-10 08:07:09,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.26 | bwd_microstep: 302.88 | bwd_inner_microstep: 302.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4080 [2024-06-10 08:07:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.49 | bwd_microstep: 1628.38 | bwd_inner_microstep: 1628.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405 [2024-06-10 08:07:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 1372.13 | bwd_inner_microstep: 1372.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 08:07:15,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3399 [2024-06-10 08:07:17,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1358.57 | bwd_inner_microstep: 1358.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 08:07:19,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3104 [2024-06-10 08:07:20,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.30 | bwd_microstep: 1058.97 | bwd_inner_microstep: 1058.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438 [2024-06-10 08:07:22,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1315.22 | bwd_inner_microstep: 1315.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 08:07:24,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1416.38 | bwd_inner_microstep: 1416.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:07:26,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 08:07:28,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.53 | bwd_microstep: 1658.23 | bwd_inner_microstep: 1658.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 08:07:30,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1294.49 | bwd_inner_microstep: 1294.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514 [2024-06-10 08:07:32,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1421.75 | bwd_inner_microstep: 1421.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3725 [2024-06-10 08:07:34,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1369.56 | bwd_inner_microstep: 1369.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 608 [2024-06-10 08:07:34,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.66 | bwd_microstep: 262.34 | bwd_inner_microstep: 262.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:07:36,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1556.59 | bwd_inner_microstep: 1556.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 08:07:38,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1463.19 | bwd_inner_microstep: 1463.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3610 [2024-06-10 08:07:41,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1642.70 | bwd_inner_microstep: 1642.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3622 [2024-06-10 08:07:43,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1468.27 | bwd_inner_microstep: 1468.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 08:07:45,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.50 | bwd_microstep: 1354.32 | bwd_inner_microstep: 1354.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-10 08:07:47,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1599.41 | bwd_inner_microstep: 1599.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 08:07:48,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.27 | bwd_microstep: 976.21 | bwd_inner_microstep: 976.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804 [2024-06-10 08:07:51,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.90 | bwd_microstep: 1748.47 | bwd_inner_microstep: 1748.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452 [2024-06-10 08:07:52,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1407.52 | bwd_inner_microstep: 1407.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2725 [2024-06-10 08:07:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.00 | bwd_microstep: 1137.44 | bwd_inner_microstep: 1137.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816 [2024-06-10 08:08:02,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 08:08:02,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 7046.25 | bwd_inner_microstep: 1560.94 | bwd_allreduce_microstep: 5485.24 | step_microstep: 38.98 [2024-06-10 08:08:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14872.60 | bwd: 45404.71 | bwd_inner: 39918.40 | bwd_allreduce: 5485.55 | step: 40.77 {'loss': 1.2834, 'learning_rate': 3.515068715450508e-05, 'epoch': 0.25} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1932 [2024-06-10 08:08:03,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.68 | bwd_microstep: 877.25 | bwd_inner_microstep: 877.10 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 08:08:05,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.15 | bwd_microstep: 1472.24 | bwd_inner_microstep: 1472.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 08:08:07,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1278.56 | bwd_inner_microstep: 1278.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 08:08:09,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.41 | bwd_microstep: 1651.56 | bwd_inner_microstep: 1651.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481 [2024-06-10 08:08:11,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.16 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 08:08:12,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 792.31 | bwd_inner_microstep: 792.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 08:08:13,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 797.84 | bwd_inner_microstep: 797.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016 [2024-06-10 08:08:14,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 899.61 | bwd_inner_microstep: 899.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 08:08:16,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1483.95 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 08:08:18,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1382.63 | bwd_inner_microstep: 1382.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 08:08:20,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1347.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3674 [2024-06-10 08:08:22,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.16 | bwd_microstep: 1788.27 | bwd_inner_microstep: 1788.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502 [2024-06-10 08:08:24,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1417.76 | bwd_inner_microstep: 1417.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 08:08:25,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 799.26 | bwd_inner_microstep: 799.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841 [2024-06-10 08:08:28,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.30 | bwd_microstep: 1560.15 | bwd_inner_microstep: 1560.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 08:08:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1353.88 | bwd_inner_microstep: 1353.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3908 [2024-06-10 08:08:32,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.60 | bwd_microstep: 1685.89 | bwd_inner_microstep: 1685.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2413 [2024-06-10 08:08:33,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.69 | bwd_microstep: 1103.89 | bwd_inner_microstep: 1103.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-10 08:08:35,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.16 | bwd_microstep: 1568.73 | bwd_inner_microstep: 1568.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-10 08:08:38,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1535.34 | bwd_inner_microstep: 1535.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 08:08:40,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1381.10 | bwd_inner_microstep: 1381.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 08:08:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1284.59 | bwd_inner_microstep: 1284.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3610 [2024-06-10 08:08:43,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.26 | bwd_microstep: 1212.95 | bwd_inner_microstep: 1212.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 08:08:45,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1376.89 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811 [2024-06-10 08:08:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.95 | bwd_microstep: 1415.30 | bwd_inner_microstep: 1415.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2358 [2024-06-10 08:08:48,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.26 | bwd_microstep: 896.22 | bwd_inner_microstep: 896.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2276 [2024-06-10 08:08:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.33 | bwd_microstep: 907.64 | bwd_inner_microstep: 907.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2642 [2024-06-10 08:08:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.04 | bwd_microstep: 1166.99 | bwd_inner_microstep: 1166.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588 [2024-06-10 08:08:53,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.75 | bwd_microstep: 1566.55 | bwd_inner_microstep: 1566.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573 [2024-06-10 08:08:55,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1568.46 | bwd_inner_microstep: 1568.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 08:08:57,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.93 | bwd_microstep: 1599.53 | bwd_inner_microstep: 1599.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3784 [2024-06-10 08:09:03,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 08:09:03,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.37 | bwd_microstep: 4572.15 | bwd_inner_microstep: 1984.23 | bwd_allreduce_microstep: 2587.87 | step_microstep: 38.01 [2024-06-10 08:09:03,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15719.50 | bwd: 44990.05 | bwd_inner: 42401.17 | bwd_allreduce: 2588.15 | step: 39.68 {'loss': 1.2642, 'learning_rate': 3.5126158494512926e-05, 'epoch': 0.25} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 08:09:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.78 | bwd_microstep: 1597.35 | bwd_inner_microstep: 1597.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 08:09:07,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.93 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933 [2024-06-10 08:09:08,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.21 | bwd_microstep: 819.81 | bwd_inner_microstep: 819.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875 [2024-06-10 08:09:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 706.68 | bwd_inner_microstep: 706.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-10 08:09:11,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.20 | bwd_microstep: 1178.31 | bwd_inner_microstep: 1178.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016 [2024-06-10 08:09:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.02 | bwd_microstep: 806.27 | bwd_inner_microstep: 806.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 08:09:13,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 08:09:15,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.90 | bwd_microstep: 1152.29 | bwd_inner_microstep: 1152.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 889 [2024-06-10 08:09:16,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.07 | bwd_microstep: 368.60 | bwd_inner_microstep: 368.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 08:09:17,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.65 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2692 [2024-06-10 08:09:19,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.42 | bwd_microstep: 1225.19 | bwd_inner_microstep: 1225.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-10 08:09:21,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.97 | bwd_microstep: 1426.82 | bwd_inner_microstep: 1426.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 08:09:23,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 08:09:25,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 08:09:26,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.07 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3507 [2024-06-10 08:09:28,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1353.03 | bwd_inner_microstep: 1353.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 08:09:30,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1400.76 | bwd_inner_microstep: 1400.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 08:09:32,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1397.84 | bwd_inner_microstep: 1397.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 08:09:34,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1515.60 | bwd_inner_microstep: 1515.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526 [2024-06-10 08:09:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1417.35 | bwd_inner_microstep: 1417.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 08:09:38,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1558.24 | bwd_inner_microstep: 1558.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 08:09:40,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141 [2024-06-10 08:09:41,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.97 | bwd_microstep: 835.41 | bwd_inner_microstep: 835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600 [2024-06-10 08:09:43,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.71 | bwd_microstep: 1539.42 | bwd_inner_microstep: 1539.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535 [2024-06-10 08:09:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.34 | bwd_microstep: 1456.31 | bwd_inner_microstep: 1456.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 08:09:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816 [2024-06-10 08:09:50,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.96 | bwd_microstep: 1690.47 | bwd_inner_microstep: 1690.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281 [2024-06-10 08:09:51,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.29 | bwd_microstep: 909.14 | bwd_inner_microstep: 909.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922 [2024-06-10 08:09:52,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.43 | bwd_microstep: 822.18 | bwd_inner_microstep: 822.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 08:09:54,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1345.31 | bwd_inner_microstep: 1345.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 08:09:56,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.88 | bwd_microstep: 1431.01 | bwd_inner_microstep: 1430.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3774 [2024-06-10 08:10:03,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-10 08:10:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 6260.86 | bwd_inner_microstep: 1777.96 | bwd_allreduce_microstep: 4482.85 | step_microstep: 38.06 [2024-06-10 08:10:03,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15036.57 | bwd: 44720.21 | bwd_inner: 40236.44 | bwd_allreduce: 4483.08 | step: 39.64 {'loss': 1.3244, 'learning_rate': 3.5101576560269195e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 08:10:05,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.98 | bwd_microstep: 1467.15 | bwd_inner_microstep: 1467.05 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3852 [2024-06-10 08:10:07,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1360.79 | bwd_inner_microstep: 1360.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 08:10:09,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1383.62 | bwd_inner_microstep: 1383.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-10 08:10:11,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.20 | bwd_microstep: 1539.53 | bwd_inner_microstep: 1539.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:10:13,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.17 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 08:10:14,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1244.59 | bwd_inner_microstep: 1244.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 08:10:16,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1245.84 | bwd_inner_microstep: 1245.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3475 [2024-06-10 08:10:18,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1229.43 | bwd_inner_microstep: 1229.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 08:10:20,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.60 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 08:10:21,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.91 | bwd_microstep: 728.64 | bwd_inner_microstep: 728.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3506 [2024-06-10 08:10:23,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1437.15 | bwd_inner_microstep: 1437.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3953 [2024-06-10 08:10:25,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.82 | bwd_microstep: 1845.19 | bwd_inner_microstep: 1845.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3671 [2024-06-10 08:10:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.99 | bwd_microstep: 1657.60 | bwd_inner_microstep: 1657.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3428 [2024-06-10 08:10:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.47 | bwd_microstep: 1216.59 | bwd_inner_microstep: 1216.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 08:10:31,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1320.07 | bwd_inner_microstep: 1320.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 08:10:33,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1512.50 | bwd_inner_microstep: 1512.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 08:10:35,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 08:10:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1391.51 | bwd_inner_microstep: 1391.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 08:10:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.17 | bwd_microstep: 1383.64 | bwd_inner_microstep: 1383.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:10:41,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1555.49 | bwd_inner_microstep: 1555.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 08:10:43,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1282.91 | bwd_inner_microstep: 1282.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 08:10:44,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1279.63 | bwd_inner_microstep: 1279.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782 [2024-06-10 08:10:46,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.60 | bwd_microstep: 1447.23 | bwd_inner_microstep: 1447.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 08:10:49,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.77 | bwd_microstep: 1486.83 | bwd_inner_microstep: 1486.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 08:10:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1403.84 | bwd_inner_microstep: 1403.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548 [2024-06-10 08:10:52,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1329.53 | bwd_inner_microstep: 1329.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607 [2024-06-10 08:10:54,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.69 | bwd_microstep: 1572.66 | bwd_inner_microstep: 1572.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-10 08:10:57,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1610.89 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 08:10:59,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2280 [2024-06-10 08:11:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.94 | bwd_microstep: 1022.93 | bwd_inner_microstep: 1022.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583 [2024-06-10 08:11:02,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.21 | bwd_microstep: 1239.02 | bwd_inner_microstep: 1238.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3455 [2024-06-10 08:11:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.15 | optimizer_step: 6.63 [2024-06-10 08:11:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 2165.68 | bwd_inner_microstep: 1570.41 | bwd_allreduce_microstep: 595.23 | step_microstep: 37.81 [2024-06-10 08:11:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16552.34 | bwd: 44928.67 | bwd_inner: 44332.45 | bwd_allreduce: 595.50 | step: 39.41 {'loss': 1.3248, 'learning_rate': 3.507694143835132e-05, 'epoch': 0.25} 61.33s/it] 25%|██▍ | 429/1726 [7:28:38<22:05:41, 61.33s/it] 25%|██▍ | 430/1726 [7:29:38<21:56:29, 60.95s/it] 25%|██▍ | 430/1726 [7:29:38<21:56:29, 60.95s/it] 25%|██▍ | 431/1726 [7:30:38<21:53:23, 60.85s/it] 25%|██▍ | 431/1726 [7:30:38<21:53:23, 60.85s/it] 25%|██▌ | 432/1726 [7:31:39<21:53:42, 60.91s/it] 25%|██▌ | 432/1726 [7:31:39<21:53:42, 60.91s/it] 25%|██▌ | 433/1726 [7:32:40<21:47:27, 60.67s/it] 25%|██▌ | 433/1726 [7:32:40<21:47:27, 60.67s/it] 25%|██▌ | 434/1726 [7:33:41<21:53:53, 61.02s/it] 25%|██▌ | 434/1726 [7:33:41<21:53:53, dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 08:11:06,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.45 | bwd_microstep: 1270.85 | bwd_inner_microstep: 1270.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3392 [2024-06-10 08:11:08,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.19 | bwd_microstep: 1146.54 | bwd_inner_microstep: 1146.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 08:11:10,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-10 08:11:12,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.03 | bwd_microstep: 1562.38 | bwd_inner_microstep: 1562.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 08:11:14,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1437.87 | bwd_inner_microstep: 1437.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 08:11:16,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1246.52 | bwd_inner_microstep: 1246.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1895 [2024-06-10 08:11:17,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 777.02 | bwd_inner_microstep: 776.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 08:11:19,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1286.35 | bwd_inner_microstep: 1286.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088 [2024-06-10 08:11:20,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.77 | bwd_microstep: 729.78 | bwd_inner_microstep: 729.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1991 [2024-06-10 08:11:21,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.07 | bwd_microstep: 862.68 | bwd_inner_microstep: 862.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 08:11:23,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1286.65 | bwd_inner_microstep: 1286.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:11:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1378.42 | bwd_inner_microstep: 1378.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 08:11:27,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1513.74 | bwd_inner_microstep: 1513.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2815 [2024-06-10 08:11:28,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.80 | bwd_microstep: 1078.21 | bwd_inner_microstep: 1078.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883 [2024-06-10 08:11:29,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.11 | bwd_microstep: 709.74 | bwd_inner_microstep: 709.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 08:11:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 791.62 | bwd_inner_microstep: 791.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433 [2024-06-10 08:11:32,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.40 | bwd_microstep: 1156.03 | bwd_inner_microstep: 1156.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3833 [2024-06-10 08:11:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1358.45 | bwd_inner_microstep: 1358.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634 [2024-06-10 08:11:34,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.65 | bwd_microstep: 264.78 | bwd_inner_microstep: 264.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3439 [2024-06-10 08:11:36,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1298.98 | bwd_inner_microstep: 1298.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 08:11:38,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.10 | bwd_microstep: 1460.75 | bwd_inner_microstep: 1460.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:11:40,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.27 | bwd_inner_microstep: 1557.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 08:11:42,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3845 [2024-06-10 08:11:45,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.32 | bwd_microstep: 1698.75 | bwd_inner_microstep: 1698.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2286 [2024-06-10 08:11:46,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.34 | bwd_microstep: 783.80 | bwd_inner_microstep: 783.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806 [2024-06-10 08:11:48,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.85 | bwd_microstep: 1413.52 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-10 08:11:50,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1454.58 | bwd_inner_microstep: 1454.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3682 [2024-06-10 08:11:52,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.97 | bwd_microstep: 1458.72 | bwd_inner_microstep: 1458.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 08:11:54,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1500.22 | bwd_inner_microstep: 1500.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 08:11:56,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1607.64 | bwd_inner_microstep: 1607.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 08:11:58,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1375.80 | bwd_inner_microstep: 1375.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767 [2024-06-10 08:12:05,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.31 | optimizer_step: 6.59 [2024-06-10 08:12:05,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.88 | bwd_microstep: 6290.94 | bwd_inner_microstep: 1752.60 | bwd_allreduce_microstep: 4538.29 | step_microstep: 38.26 [2024-06-10 08:12:05,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15072.64 | bwd: 44953.67 | bwd_inner: 40414.46 | bwd_allreduce: 4538.53 | step: 39.83 {'loss': 1.2968, 'learning_rate': 3.5052253215524086e-05, 'epoch': 0.25} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 08:12:07,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1347.18 | bwd_inner_microstep: 1346.97 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 08:12:09,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866 [2024-06-10 08:12:11,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.15 | bwd_microstep: 1564.97 | bwd_inner_microstep: 1564.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 08:12:12,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 08:12:15,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1479.55 | bwd_inner_microstep: 1479.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 08:12:16,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.88 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958 [2024-06-10 08:12:17,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.16 | bwd_microstep: 764.34 | bwd_inner_microstep: 764.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113 [2024-06-10 08:12:18,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.42 | bwd_microstep: 828.44 | bwd_inner_microstep: 828.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 08:12:20,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.94 | bwd_microstep: 1286.48 | bwd_inner_microstep: 1286.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 08:12:22,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.93 | bwd_microstep: 1384.71 | bwd_inner_microstep: 1384.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684 [2024-06-10 08:12:24,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1528.19 | bwd_inner_microstep: 1528.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3510 [2024-06-10 08:12:26,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1445.35 | bwd_inner_microstep: 1445.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-10 08:12:28,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.23 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646 [2024-06-10 08:12:31,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.20 | bwd_microstep: 1611.63 | bwd_inner_microstep: 1611.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3567 [2024-06-10 08:12:33,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681 [2024-06-10 08:12:35,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.90 | bwd_microstep: 1617.46 | bwd_inner_microstep: 1617.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3528 [2024-06-10 08:12:37,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.63 | bwd_microstep: 1619.33 | bwd_inner_microstep: 1619.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.32 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 08:12:39,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1376.24 | bwd_inner_microstep: 1376.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 08:12:41,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1508.26 | bwd_inner_microstep: 1508.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 08:12:43,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1490.56 | bwd_inner_microstep: 1490.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 08:12:45,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.20 | bwd_microstep: 1402.16 | bwd_inner_microstep: 1402.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2116 [2024-06-10 08:12:46,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.02 | bwd_microstep: 892.41 | bwd_inner_microstep: 892.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-10 08:12:49,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.11 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819 [2024-06-10 08:12:51,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1517.86 | bwd_inner_microstep: 1517.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3713 [2024-06-10 08:12:53,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1495.74 | bwd_inner_microstep: 1495.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 08:12:55,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1490.78 | bwd_inner_microstep: 1490.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 08:12:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1258.55 | bwd_inner_microstep: 1258.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-10 08:12:59,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.63 | bwd_microstep: 1644.17 | bwd_inner_microstep: 1644.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 08:13:01,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1412.01 | bwd_inner_microstep: 1411.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3638 [2024-06-10 08:13:03,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1650.33 | bwd_inner_microstep: 1650.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3608 [2024-06-10 08:13:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.49 | bwd_microstep: 1534.60 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 08:13:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 08:13:07,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.15 | bwd_microstep: 1472.91 | bwd_inner_microstep: 1464.84 | bwd_allreduce_microstep: 8.03 | step_microstep: 38.63 [2024-06-10 08:13:07,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16791.16 | bwd: 45046.66 | bwd_inner: 45037.56 | bwd_allreduce: 8.34 | step: 42.55 {'loss': 1.2959, 'learning_rate': 3.502751197873927e-05, 'epoch': 0.25} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 08:13:09,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1243.64 | bwd_inner_microstep: 1243.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:13:11,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1383.97 | bwd_inner_microstep: 1383.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 08:13:12,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 790.31 | bwd_inner_microstep: 790.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:13:14,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1552.91 | bwd_inner_microstep: 1552.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 08:13:16,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 08:13:18,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 1385.69 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2247 [2024-06-10 08:13:19,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.01 | bwd_microstep: 868.89 | bwd_inner_microstep: 868.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3730 [2024-06-10 08:13:21,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1438.56 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2450 [2024-06-10 08:13:22,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.08 | bwd_microstep: 919.78 | bwd_inner_microstep: 919.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 08:13:24,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.18 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705 [2024-06-10 08:13:26,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.88 | bwd_microstep: 1628.32 | bwd_inner_microstep: 1628.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412 [2024-06-10 08:13:28,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.89 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 08:13:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.56 | bwd_microstep: 1286.25 | bwd_inner_microstep: 1286.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3446 [2024-06-10 08:13:32,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1301.14 | bwd_inner_microstep: 1301.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 08:13:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.87 | bwd_microstep: 1281.86 | bwd_inner_microstep: 1281.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526 [2024-06-10 08:13:35,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.36 | bwd_microstep: 1436.41 | bwd_inner_microstep: 1436.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 08:13:37,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.00 | bwd_microstep: 1318.37 | bwd_inner_microstep: 1318.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 647 [2024-06-10 08:13:38,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.76 | bwd_microstep: 274.48 | bwd_inner_microstep: 274.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 08:13:40,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1404.48 | bwd_inner_microstep: 1404.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 08:13:41,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1297.18 | bwd_inner_microstep: 1297.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457 [2024-06-10 08:13:43,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.77 | bwd_microstep: 1404.47 | bwd_inner_microstep: 1404.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 08:13:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1406.49 | bwd_inner_microstep: 1406.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 08:13:47,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1393.58 | bwd_inner_microstep: 1393.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545 [2024-06-10 08:13:49,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1327.71 | bwd_inner_microstep: 1327.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1914 [2024-06-10 08:13:50,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.31 | bwd_microstep: 749.81 | bwd_inner_microstep: 749.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-10 08:13:52,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1604.87 | bwd_inner_microstep: 1604.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572 [2024-06-10 08:13:54,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.98 | bwd_microstep: 1565.67 | bwd_inner_microstep: 1565.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3769 [2024-06-10 08:13:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1604.82 | bwd_inner_microstep: 1604.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3813 [2024-06-10 08:13:59,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.26 | bwd_microstep: 1817.20 | bwd_inner_microstep: 1817.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 08:14:01,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1560.01 | bwd_inner_microstep: 1559.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-10 08:14:03,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1638.06 | bwd_inner_microstep: 1638.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2264 [2024-06-10 08:14:08,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 08:14:08,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.99 | bwd_microstep: 3869.61 | bwd_inner_microstep: 1174.72 | bwd_allreduce_microstep: 2694.84 | step_microstep: 38.16 [2024-06-10 08:14:08,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.13 | bwd: 44612.07 | bwd_inner: 41916.32 | bwd_allreduce: 2695.07 | step: 39.72 {'loss': 1.2882, 'learning_rate': 3.500271781513539e-05, 'epoch': 0.25} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 08:14:10,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.60 | bwd_microstep: 1280.82 | bwd_inner_microstep: 1280.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 08:14:11,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.69 | bwd_microstep: 1152.48 | bwd_inner_microstep: 1152.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842 [2024-06-10 08:14:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.16 | bwd_microstep: 1557.91 | bwd_inner_microstep: 1557.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-10 08:14:15,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.62 | bwd_microstep: 1277.46 | bwd_inner_microstep: 1277.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 08:14:17,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.55 | bwd_microstep: 1383.96 | bwd_inner_microstep: 1383.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4128 [2024-06-10 08:14:19,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.52 | bwd_microstep: 1601.16 | bwd_inner_microstep: 1601.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 08:14:21,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1483.63 | bwd_inner_microstep: 1483.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 08:14:22,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 794.02 | bwd_inner_microstep: 793.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 08:14:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.19 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3797 [2024-06-10 08:14:26,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.57 | bwd_microstep: 1746.16 | bwd_inner_microstep: 1746.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3412 [2024-06-10 08:14:28,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1444.40 | bwd_inner_microstep: 1444.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 08:14:30,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1485.76 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3434 [2024-06-10 08:14:32,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1542.04 | bwd_inner_microstep: 1542.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2871 [2024-06-10 08:14:34,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.64 | bwd_microstep: 1080.07 | bwd_inner_microstep: 1080.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487 [2024-06-10 08:14:35,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1315.56 | bwd_inner_microstep: 1315.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3411 [2024-06-10 08:14:37,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.55 | bwd_microstep: 1299.89 | bwd_inner_microstep: 1299.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3497 [2024-06-10 08:14:39,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.47 | bwd_microstep: 1367.49 | bwd_inner_microstep: 1367.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-10 08:14:41,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.15 | bwd_microstep: 1617.08 | bwd_inner_microstep: 1617.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 08:14:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1312.80 | bwd_inner_microstep: 1312.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 08:14:45,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.73 | bwd_microstep: 1165.18 | bwd_inner_microstep: 1165.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2083 [2024-06-10 08:14:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.15 | bwd_microstep: 759.49 | bwd_inner_microstep: 759.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103 [2024-06-10 08:14:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.58 | bwd_microstep: 827.83 | bwd_inner_microstep: 827.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1918 [2024-06-10 08:14:48,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.97 | bwd_microstep: 720.97 | bwd_inner_microstep: 720.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 08:14:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1395.98 | bwd_inner_microstep: 1395.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-10 08:14:52,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1355.63 | bwd_inner_microstep: 1355.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679 [2024-06-10 08:14:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1327.82 | bwd_inner_microstep: 1327.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 08:14:56,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1461.70 | bwd_inner_microstep: 1461.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3726 [2024-06-10 08:14:58,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1469.73 | bwd_inner_microstep: 1469.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-10 08:15:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1637.98 | bwd_inner_microstep: 1637.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291 [2024-06-10 08:15:01,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.16 | bwd_microstep: 913.85 | bwd_inner_microstep: 913.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3816 [2024-06-10 08:15:04,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.65 | bwd_microstep: 1725.21 | bwd_inner_microstep: 1725.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3819 [2024-06-10 08:15:47,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-10 08:15:47,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.81 | bwd_microstep: 43331.91 | bwd_inner_microstep: 1617.81 | bwd_allreduce_microstep: 41714.03 | step_microstep: 38.95 [2024-06-10 08:15:47,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15635.64 | bwd: 83632.24 | bwd_inner: 41917.25 | bwd_allreduce: 41714.26 | step: 40.71 {'loss': 1.2628, 'learning_rate': 3.4977870812037355e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 08:15:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1473.23 | bwd_inner_microstep: 1473.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 08:15:51,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.67 | bwd_microstep: 1238.66 | bwd_inner_microstep: 1238.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3909 [2024-06-10 08:15:53,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.90 | bwd_microstep: 1514.82 | bwd_inner_microstep: 1514.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 08:15:55,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1478.80 | bwd_inner_microstep: 1478.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 08:15:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.41 | bwd_microstep: 1274.29 | bwd_inner_microstep: 1274.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 08:15:59,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1276.62 | bwd_inner_microstep: 1276.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 08:16:01,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1338.56 | bwd_inner_microstep: 1338.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 08:16:02,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075 [2024-06-10 08:16:04,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.59 | bwd_microstep: 817.68 | bwd_inner_microstep: 817.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4119 [2024-06-10 08:16:06,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.96 | bwd_microstep: 1639.69 | bwd_inner_microstep: 1639.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3717 [2024-06-10 08:16:08,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.96 | bwd_microstep: 1590.00 | bwd_inner_microstep: 1589.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985 [2024-06-10 08:16:10,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.26 | bwd_microstep: 1594.73 | bwd_inner_microstep: 1594.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3698 [2024-06-10 08:16:13,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.41 | bwd_microstep: 1720.81 | bwd_inner_microstep: 1720.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-10 08:16:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.51 | bwd_microstep: 1516.51 | bwd_inner_microstep: 1516.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3653 [2024-06-10 08:16:17,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.14 | bwd_microstep: 1710.03 | bwd_inner_microstep: 1710.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 08:16:19,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1447.78 | bwd_inner_microstep: 1447.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2617 [2024-06-10 08:16:21,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.96 | bwd_microstep: 1044.47 | bwd_inner_microstep: 1044.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 08:16:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1506.17 | bwd_inner_microstep: 1506.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978 [2024-06-10 08:16:24,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.99 | bwd_microstep: 734.79 | bwd_inner_microstep: 734.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3541 [2024-06-10 08:16:25,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 08:16:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.53 | bwd_microstep: 1596.44 | bwd_inner_microstep: 1596.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824 [2024-06-10 08:16:30,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1355.27 | bwd_inner_microstep: 1355.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288 [2024-06-10 08:16:31,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.02 | bwd_microstep: 878.84 | bwd_inner_microstep: 878.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 08:16:33,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1304.02 | bwd_inner_microstep: 1303.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 08:16:34,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1388.25 | bwd_inner_microstep: 1388.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474 [2024-06-10 08:16:36,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.18 | bwd_microstep: 1189.67 | bwd_inner_microstep: 1189.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 08:16:38,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1380.12 | bwd_inner_microstep: 1380.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728 [2024-06-10 08:16:40,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.77 | bwd_microstep: 1462.51 | bwd_inner_microstep: 1462.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3401 [2024-06-10 08:16:42,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1388.52 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-10 08:16:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.98 | bwd_microstep: 1630.18 | bwd_inner_microstep: 1630.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 08:16:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.70 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3793 [2024-06-10 08:16:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.66 [2024-06-10 08:16:48,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1535.26 | bwd_inner_microstep: 1527.33 | bwd_allreduce_microstep: 7.88 | step_microstep: 37.75 [2024-06-10 08:16:48,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16445.52 | bwd: 44010.89 | bwd_inner: 44002.08 | bwd_allreduce: 8.11 | step: 39.36 {'loss': 1.2565, 'learning_rate': 3.4952971056956186e-05, 'epoch': 0.25} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3538 [2024-06-10 08:16:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1586.83 | bwd_inner_microstep: 1586.75 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3983 [2024-06-10 08:16:53,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.55 | bwd_microstep: 1605.15 | bwd_inner_microstep: 1605.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 08:16:55,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1479.57 | bwd_inner_microstep: 1479.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 08:16:57,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1556.45 | bwd_inner_microstep: 1556.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 782 [2024-06-10 08:16:57,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 127.75 | bwd_microstep: 311.82 | bwd_inner_microstep: 311.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758 [2024-06-10 08:17:00,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.01 | bwd_microstep: 1634.99 | bwd_inner_microstep: 1634.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-10 08:17:01,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.42 | bwd_microstep: 1187.42 | bwd_inner_microstep: 1187.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3525 [2024-06-10 08:17:03,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1438.94 | bwd_inner_microstep: 1438.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695 [2024-06-10 08:17:05,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.82 | bwd_microstep: 1614.00 | bwd_inner_microstep: 1613.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663 [2024-06-10 08:17:08,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1615.38 | bwd_inner_microstep: 1615.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680 [2024-06-10 08:17:10,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.65 | bwd_microstep: 1623.54 | bwd_inner_microstep: 1623.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3386 [2024-06-10 08:17:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.34 | bwd_microstep: 1272.63 | bwd_inner_microstep: 1272.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709 [2024-06-10 08:17:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.52 | bwd_microstep: 1619.43 | bwd_inner_microstep: 1619.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 08:17:16,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 1348.72 | bwd_inner_microstep: 1348.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466 [2024-06-10 08:17:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.10 | bwd_microstep: 1573.03 | bwd_inner_microstep: 1573.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651 [2024-06-10 08:17:20,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1545.58 | bwd_inner_microstep: 1545.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468 [2024-06-10 08:17:22,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.61 | bwd_microstep: 1244.50 | bwd_inner_microstep: 1244.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526 [2024-06-10 08:17:23,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1196.96 | bwd_inner_microstep: 1196.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3442 [2024-06-10 08:17:25,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.68 | bwd_microstep: 1379.80 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 08:17:26,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 695.92 | bwd_inner_microstep: 695.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 08:17:28,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1374.16 | bwd_inner_microstep: 1374.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 08:17:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.36 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 08:17:32,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1400.07 | bwd_inner_microstep: 1400.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 08:17:34,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1554.71 | bwd_inner_microstep: 1554.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 08:17:36,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.52 | bwd_microstep: 1381.02 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2170 [2024-06-10 08:17:37,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.80 | bwd_microstep: 886.32 | bwd_inner_microstep: 886.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-10 08:17:39,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1415.02 | bwd_inner_microstep: 1415.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 08:17:41,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014 [2024-06-10 08:17:42,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.69 | bwd_microstep: 898.38 | bwd_inner_microstep: 898.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3815 [2024-06-10 08:17:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.37 | bwd_microstep: 1854.71 | bwd_inner_microstep: 1854.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 08:17:47,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1549.94 | bwd_inner_microstep: 1549.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435 [2024-06-10 08:17:50,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 08:17:50,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.43 | bwd_microstep: 2736.55 | bwd_inner_microstep: 1340.26 | bwd_allreduce_microstep: 1396.24 | step_microstep: 37.88 [2024-06-10 08:17:50,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16343.49 | bwd: 45245.94 | bwd_inner: 43848.72 | bwd_allreduce: 1396.52 | step: 39.58 61.02s/it] 25%|██▌ | 435/1726 [7:34:42<21:48:34, 60.82s/it] 25%|██▌ | 435/1726 [7:34:42<21:48:34, 60.82s/it] 25%|██▌ | 436/1726 [7:35:44<21:56:27, 61.23s/it] 25%|██▌ | 436/1726 [7:35:44<21:56:27, 61.23s/it] 25%|██▌ | 437/1726 [7:36:45<21:51:28, 61.05s/it] 25%|██▌ | 437/1726 [7:36:45<21:51:28, 61.05s/it] 25%|██▌ | 438/1726 [7:38:24<25:58:55, 72.62s/it] 25%|██▌ | 438/1726 [7:38:24<25:58:55, 72.62s/it] 25%|██▌ | 439/1726 [7:39:25<24:41:41, 69.08s/it] 25%|██▌ | 439/1726 [7:39:25<24:41:41, 69.08s/it] 25%|██▌ | 440/1726 [7:40:27<23:54:45, 66.94s/it] {'loss': 1.3234, 'learning_rate': 3.492801863758868e-05, 'epoch': 0.25} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 08:17:52,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.65 | bwd_microstep: 1472.35 | bwd_inner_microstep: 1472.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3938 [2024-06-10 08:17:54,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1592.54 | bwd_inner_microstep: 1592.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 08:17:56,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1249.81 | bwd_inner_microstep: 1249.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 08:17:58,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1483.68 | bwd_inner_microstep: 1483.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 08:18:00,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 08:18:02,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1541.04 | bwd_inner_microstep: 1541.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783 [2024-06-10 08:18:04,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1446.83 | bwd_inner_microstep: 1446.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493 [2024-06-10 08:18:06,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.93 | bwd_microstep: 1192.20 | bwd_inner_microstep: 1192.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896 [2024-06-10 08:18:08,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.62 | bwd_microstep: 1583.72 | bwd_inner_microstep: 1583.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 08:18:10,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1389.13 | bwd_inner_microstep: 1389.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1886 [2024-06-10 08:18:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.22 | bwd_microstep: 773.30 | bwd_inner_microstep: 773.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 08:18:13,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.84 | bwd_microstep: 1521.15 | bwd_inner_microstep: 1521.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639 [2024-06-10 08:18:15,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 08:18:17,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-10 08:18:19,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1496.52 | bwd_inner_microstep: 1496.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3871 [2024-06-10 08:18:21,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.34 | bwd_microstep: 1558.84 | bwd_inner_microstep: 1558.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 08:18:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.41 | bwd_microstep: 1491.14 | bwd_inner_microstep: 1491.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 08:18:25,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1406.63 | bwd_inner_microstep: 1406.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1272 [2024-06-10 08:18:26,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 187.84 | bwd_microstep: 487.58 | bwd_inner_microstep: 487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 08:18:28,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 08:18:30,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1557.01 | bwd_inner_microstep: 1556.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 08:18:32,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.91 | bwd_microstep: 1293.88 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 08:18:34,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1498.40 | bwd_inner_microstep: 1498.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3730 [2024-06-10 08:18:35,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1304.06 | bwd_inner_microstep: 1304.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510 [2024-06-10 08:18:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1347.90 | bwd_inner_microstep: 1347.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2724 [2024-06-10 08:18:39,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.21 | bwd_microstep: 1134.35 | bwd_inner_microstep: 1134.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 08:18:41,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1110 [2024-06-10 08:18:41,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.73 | bwd_microstep: 442.42 | bwd_inner_microstep: 442.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577 [2024-06-10 08:18:43,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1558.70 | bwd_inner_microstep: 1558.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 08:18:46,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1655.29 | bwd_inner_microstep: 1655.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3582 [2024-06-10 08:18:48,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.90 | bwd_microstep: 1661.36 | bwd_inner_microstep: 1661.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 08:18:50,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.19 | optimizer_step: 6.63 [2024-06-10 08:18:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.55 | bwd_microstep: 1698.17 | bwd_inner_microstep: 1690.36 | bwd_allreduce_microstep: 7.76 | step_microstep: 37.60 [2024-06-10 08:18:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16251.33 | bwd: 43520.92 | bwd_inner: 43512.26 | bwd_allreduce: 7.99 | step: 39.19 {'loss': 1.2654, 'learning_rate': 3.490301364181714e-05, 'epoch': 0.26} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 08:18:52,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1276.48 | bwd_inner_microstep: 1276.39 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398 [2024-06-10 08:18:54,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.04 | bwd_microstep: 1151.03 | bwd_inner_microstep: 1151.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904 [2024-06-10 08:18:56,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1585.32 | bwd_inner_microstep: 1585.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-10 08:18:58,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1560.48 | bwd_inner_microstep: 1560.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-10 08:18:59,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.97 | bwd_microstep: 972.81 | bwd_inner_microstep: 972.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 08:19:01,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.38 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-10 08:19:02,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 707.64 | bwd_inner_microstep: 707.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 08:19:03,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 799.24 | bwd_inner_microstep: 799.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3696 [2024-06-10 08:19:06,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442 [2024-06-10 08:19:07,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.28 | bwd_microstep: 1317.14 | bwd_inner_microstep: 1317.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 08:19:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.10 | bwd_microstep: 1278.39 | bwd_inner_microstep: 1278.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 08:19:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1445.46 | bwd_inner_microstep: 1445.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491 [2024-06-10 08:19:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1966 [2024-06-10 08:19:14,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.53 | bwd_microstep: 762.58 | bwd_inner_microstep: 762.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1975 [2024-06-10 08:19:15,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 705.10 | bwd_inner_microstep: 705.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3843 [2024-06-10 08:19:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1365.22 | bwd_inner_microstep: 1365.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2735 [2024-06-10 08:19:19,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 944.54 | bwd_inner_microstep: 944.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082 [2024-06-10 08:19:20,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.23 | bwd_microstep: 916.99 | bwd_inner_microstep: 916.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 08:19:22,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1287.10 | bwd_inner_microstep: 1287.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 08:19:24,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1516.95 | bwd_inner_microstep: 1516.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 537 [2024-06-10 08:19:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.49 | bwd_microstep: 245.33 | bwd_inner_microstep: 245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2092 [2024-06-10 08:19:25,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.56 | bwd_microstep: 790.75 | bwd_inner_microstep: 790.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 08:19:26,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.59 | bwd_microstep: 700.00 | bwd_inner_microstep: 699.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905 [2024-06-10 08:19:27,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.45 | bwd_microstep: 686.59 | bwd_inner_microstep: 686.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2183 [2024-06-10 08:19:28,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.25 | bwd_microstep: 808.76 | bwd_inner_microstep: 808.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011 [2024-06-10 08:19:29,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.94 | bwd_microstep: 841.00 | bwd_inner_microstep: 840.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 08:19:31,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769 [2024-06-10 08:19:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1570.30 | bwd_inner_microstep: 1570.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 08:19:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1498.01 | bwd_inner_microstep: 1497.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-10 08:19:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.23 | bwd_microstep: 973.40 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570 [2024-06-10 08:19:39,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1524.23 | bwd_inner_microstep: 1524.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 08:19:51,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.37 | optimizer_step: 6.63 [2024-06-10 08:19:51,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 11121.23 | bwd_inner_microstep: 1749.39 | bwd_allreduce_microstep: 9371.77 | step_microstep: 39.03 [2024-06-10 08:19:51,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 13809.40 | bwd: 46275.79 | bwd_inner: 36903.02 | bwd_allreduce: 9372.06 | step: 40.86 {'loss': 1.2614, 'learning_rate': 3.4877956157709024e-05, 'epoch': 0.26} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 08:19:53,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.19 | bwd_microstep: 1375.62 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 08:19:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.93 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818 [2024-06-10 08:19:57,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 08:19:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752 [2024-06-10 08:20:01,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1535.03 | bwd_inner_microstep: 1535.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 08:20:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.30 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733 [2024-06-10 08:20:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1428.66 | bwd_inner_microstep: 1428.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-10 08:20:06,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.27 | bwd_microstep: 1190.39 | bwd_inner_microstep: 1190.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373 [2024-06-10 08:20:08,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.42 | bwd_microstep: 998.52 | bwd_inner_microstep: 998.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3589 [2024-06-10 08:20:10,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.23 | bwd_microstep: 1366.16 | bwd_inner_microstep: 1366.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184 [2024-06-10 08:20:11,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.19 | bwd_microstep: 951.13 | bwd_inner_microstep: 951.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 08:20:13,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.68 | bwd_microstep: 1480.99 | bwd_inner_microstep: 1480.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 08:20:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.90 | bwd_microstep: 1610.85 | bwd_inner_microstep: 1610.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3656 [2024-06-10 08:20:18,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.56 | bwd_microstep: 1819.89 | bwd_inner_microstep: 1819.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 08:20:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1471.94 | bwd_inner_microstep: 1471.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 08:20:22,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1294.41 | bwd_inner_microstep: 1294.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2032 [2024-06-10 08:20:23,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.48 | bwd_microstep: 838.39 | bwd_inner_microstep: 838.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3611 [2024-06-10 08:20:25,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1610.37 | bwd_inner_microstep: 1610.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699 [2024-06-10 08:20:27,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1332.45 | bwd_inner_microstep: 1332.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522 [2024-06-10 08:20:29,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1323.37 | bwd_inner_microstep: 1323.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 08:20:31,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1509.90 | bwd_inner_microstep: 1509.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555 [2024-06-10 08:20:32,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.25 | bwd_microstep: 1202.26 | bwd_inner_microstep: 1202.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 08:20:34,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.89 | bwd_microstep: 879.31 | bwd_inner_microstep: 879.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450 [2024-06-10 08:20:35,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.07 | bwd_microstep: 1284.52 | bwd_inner_microstep: 1284.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 08:20:37,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 08:20:39,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1351.11 | bwd_inner_microstep: 1351.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 08:20:41,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1357.55 | bwd_inner_microstep: 1357.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 08:20:43,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1560.66 | bwd_inner_microstep: 1560.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2024 [2024-06-10 08:20:44,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.31 | bwd_microstep: 846.60 | bwd_inner_microstep: 846.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 08:20:46,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592 [2024-06-10 08:20:48,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1439.07 | bwd_inner_microstep: 1439.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 08:20:50,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.26 | optimizer_step: 6.60 [2024-06-10 08:20:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.14 | bwd_microstep: 990.03 | bwd_inner_microstep: 816.64 | bwd_allreduce_microstep: 173.32 | step_microstep: 39.08 [2024-06-10 08:20:50,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15880.34 | bwd: 42599.71 | bwd_inner: 42425.32 | bwd_allreduce: 173.63 | step: 40.77 {'loss': 1.3337, 'learning_rate': 3.485284627351667e-05, 'epoch': 0.26} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 08:20:51,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.73 | bwd_microstep: 1142.21 | bwd_inner_microstep: 1142.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4055 [2024-06-10 08:20:53,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.35 | bwd_microstep: 1618.26 | bwd_inner_microstep: 1618.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 08:20:55,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1245.21 | bwd_inner_microstep: 1245.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 08:20:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.49 | bwd_microstep: 1491.66 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844 [2024-06-10 08:20:59,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.75 | bwd_microstep: 1662.38 | bwd_inner_microstep: 1662.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 08:21:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.09 | bwd_microstep: 1151.16 | bwd_inner_microstep: 1151.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 08:21:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1532.61 | bwd_inner_microstep: 1532.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443 [2024-06-10 08:21:05,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.53 | bwd_microstep: 1284.31 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 08:21:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 08:21:09,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958 [2024-06-10 08:21:10,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 797.34 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 08:21:11,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1258.21 | bwd_inner_microstep: 1258.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-10 08:21:13,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1287.18 | bwd_inner_microstep: 1287.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514 [2024-06-10 08:21:15,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.57 | bwd_microstep: 1449.77 | bwd_inner_microstep: 1449.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 08:21:17,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.90 | bwd_microstep: 1351.61 | bwd_inner_microstep: 1351.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2938 [2024-06-10 08:21:19,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.32 | bwd_microstep: 1181.09 | bwd_inner_microstep: 1181.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481 [2024-06-10 08:21:21,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 1427.77 | bwd_inner_microstep: 1427.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 08:21:23,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.55 | bwd_microstep: 1500.53 | bwd_inner_microstep: 1500.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827 [2024-06-10 08:21:25,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.01 | bwd_microstep: 1754.69 | bwd_inner_microstep: 1754.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 08:21:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1382.20 | bwd_inner_microstep: 1382.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-10 08:21:29,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.90 | bwd_microstep: 1288.30 | bwd_inner_microstep: 1288.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3737 [2024-06-10 08:21:31,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3565 [2024-06-10 08:21:33,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.18 | bwd_microstep: 1237.39 | bwd_inner_microstep: 1237.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3734 [2024-06-10 08:21:34,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1242.15 | bwd_inner_microstep: 1242.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2409 [2024-06-10 08:21:36,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.58 | bwd_microstep: 1135.69 | bwd_inner_microstep: 1135.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611 [2024-06-10 08:21:38,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.39 | bwd_microstep: 1539.85 | bwd_inner_microstep: 1539.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2201 [2024-06-10 08:21:39,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.90 | bwd_microstep: 955.68 | bwd_inner_microstep: 955.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411 [2024-06-10 08:21:41,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1370.23 | bwd_inner_microstep: 1370.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 08:21:43,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.04 | bwd_microstep: 1648.43 | bwd_inner_microstep: 1648.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 08:21:45,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.50 | bwd_microstep: 1399.38 | bwd_inner_microstep: 1399.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 08:21:47,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1508.64 | bwd_inner_microstep: 1508.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 08:21:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 08:21:53,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.24 | bwd_microstep: 4599.69 | bwd_inner_microstep: 1718.78 | bwd_allreduce_microstep: 2880.86 | step_microstep: 38.09 [2024-06-10 08:21:53,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16255.97 | bwd: 46449.91 | bwd_inner: 43568.14 | bwd_allreduce: 2881.09 | step: 39.66 {'loss': 1.2171, 'learning_rate': 3.482768407767695e-05, 'epoch': 0.26} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461 [2024-06-10 08:21:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.42 | bwd_microstep: 1564.12 | bwd_inner_microstep: 1563.92 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 08:21:57,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875 [2024-06-10 08:21:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.17 | bwd_microstep: 1577.32 | bwd_inner_microstep: 1577.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 08:22:01,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.29 | bwd_microstep: 1240.34 | bwd_inner_microstep: 1240.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2452 [2024-06-10 08:22:02,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.99 | bwd_microstep: 1042.02 | bwd_inner_microstep: 1041.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 08:22:04,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998 [2024-06-10 08:22:05,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.70 | bwd_microstep: 707.33 | bwd_inner_microstep: 707.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2199 [2024-06-10 08:22:06,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.88 | bwd_microstep: 764.86 | bwd_inner_microstep: 764.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 08:22:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1246.71 | bwd_inner_microstep: 1246.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 08:22:09,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-10 08:22:11,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.53 | bwd_microstep: 1419.10 | bwd_inner_microstep: 1419.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 08:22:12,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 08:22:14,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1478.18 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3933 [2024-06-10 08:22:17,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1557.40 | bwd_inner_microstep: 1557.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2305 [2024-06-10 08:22:18,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.10 | bwd_microstep: 1077.73 | bwd_inner_microstep: 1077.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 08:22:20,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1347.89 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 08:22:22,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1509.38 | bwd_inner_microstep: 1509.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3613 [2024-06-10 08:22:24,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.13 | bwd_microstep: 1466.53 | bwd_inner_microstep: 1466.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-10 08:22:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.62 | bwd_microstep: 1319.15 | bwd_inner_microstep: 1319.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 08:22:28,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.24 | bwd_microstep: 1451.67 | bwd_inner_microstep: 1451.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 08:22:29,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.00 | bwd_microstep: 977.11 | bwd_inner_microstep: 977.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980 [2024-06-10 08:22:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.72 | bwd_microstep: 704.76 | bwd_inner_microstep: 704.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 08:22:32,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 08:22:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.98 | bwd_microstep: 1659.25 | bwd_inner_microstep: 1659.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 08:22:37,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.86 | bwd_microstep: 1595.88 | bwd_inner_microstep: 1595.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-10 08:22:39,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.39 | bwd_microstep: 1645.35 | bwd_inner_microstep: 1645.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2267 [2024-06-10 08:22:40,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 810.71 | bwd_inner_microstep: 810.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 08:22:42,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.80 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2916 [2024-06-10 08:22:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.45 | bwd_microstep: 1190.28 | bwd_inner_microstep: 1190.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2236 [2024-06-10 08:22:45,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.44 | bwd_microstep: 864.84 | bwd_inner_microstep: 864.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 08:22:47,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1294.17 | bwd_inner_microstep: 1294.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 08:22:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 08:22:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 5682.22 | bwd_inner_microstep: 786.28 | bwd_allreduce_microstep: 4895.89 | step_microstep: 37.97 [2024-06-10 08:22:53,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14934.13 | bwd: 44745.18 | bwd_inner: 39848.24 | bwd_allreduce: 4896.19 | step: 39.79 {'loss': 1.3222, 'learning_rate': 3.4802469658810984e-05, 'epoch': 0.26} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3378 [2024-06-10 08:22:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.77 | bwd_microstep: 1232.88 | bwd_inner_microstep: 1232.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 08:22:56,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1377.12 | bwd_inner_microstep: 1377.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907 [2024-06-10 08:22:58,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-10 08:23:00,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1455.21 | bwd_inner_microstep: 1455.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 08:23:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.13 | bwd_microstep: 1447.76 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 08:23:04,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1251.03 | bwd_inner_microstep: 1251.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-10 08:23:06,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.93 | bwd_microstep: 1147.43 | bwd_inner_microstep: 1147.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062 [2024-06-10 08:23:07,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.95 | bwd_microstep: 816.36 | bwd_inner_microstep: 816.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 08:23:09,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.39 | bwd_microstep: 1292.49 | bwd_inner_microstep: 1292.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1978 [2024-06-10 08:23:10,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.79 | bwd_microstep: 704.47 | bwd_inner_microstep: 704.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3432 [2024-06-10 08:23:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 08:23:13,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1338.93 | bwd_inner_microstep: 1338.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775 [2024-06-10 08:23:15,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.02 | bwd_microstep: 1568.59 | bwd_inner_microstep: 1568.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 08:23:17,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.56 | bwd_microstep: 1385.04 | bwd_inner_microstep: 1385.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-10 08:23:19,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1520.16 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 08:23:21,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3635 [2024-06-10 08:23:24,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.38 | bwd_microstep: 1711.45 | bwd_inner_microstep: 1711.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 08:23:25,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1277.83 | bwd_inner_microstep: 1277.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 08:23:27,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.94 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 08:23:30,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.65 | bwd_microstep: 1659.72 | bwd_inner_microstep: 1659.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931 [2024-06-10 08:23:32,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1495.42 | bwd_inner_microstep: 1495.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 08:23:34,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.06 | bwd_microstep: 1528.32 | bwd_inner_microstep: 1528.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3810 [2024-06-10 08:23:36,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.09 | bwd_microstep: 1582.20 | bwd_inner_microstep: 1582.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972 [2024-06-10 08:23:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.69 | bwd_microstep: 704.85 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 08:23:39,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1495.93 | bwd_inner_microstep: 1495.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 08:23:41,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.13 | bwd_microstep: 1397.58 | bwd_inner_microstep: 1397.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-10 08:23:43,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.36 | bwd_microstep: 1453.07 | bwd_inner_microstep: 1453.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3422 [2024-06-10 08:23:45,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.59 | bwd_microstep: 1198.71 | bwd_inner_microstep: 1198.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-10 08:23:47,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.62 | bwd_microstep: 1632.93 | bwd_inner_microstep: 1632.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-10 08:23:49,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.00 | bwd_microstep: 1487.29 | bwd_inner_microstep: 1487.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 08:23:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.84 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 08:23:55,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.46 | optimizer_step: 6.60 [2024-06-10 08:23:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 3025.75 | bwd_inner_microstep: 1574.59 | bwd_allreduce_microstep: 1451.09 | step_microstep: 43.03 [2024-06-10 08:23:55,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16361.87 | bwd: 45308.33 | bwd_inner: 43856.30 | bwd_allreduce: 1451.33 | step: 44.79 25%|██▌ | 440/1726 [7:40:27<23:54:45, 66.94s/it] 26%|██▌ | 441/1726 [7:41:27<23:09:52, 64.90s/it] 26%|██▌ | 441/1726 [7:41:27<23:09:52, 64.90s/it] 26%|██▌ | 442/1726 [7:42:27<22:40:05, 63.56s/it] 26%|██▌ | 442/1726 [7:42:27<22:40:05, 63.56s/it] 26%|██▌ | 443/1726 [7:43:26<22:08:39, 62.14s/it] 26%|██▌ | 443/1726 [7:43:26<22:08:39, 62.14s/it] 26%|██▌ | 444/1726 [7:44:29<22:13:28, 62.41s/it] 26%|██▌ | 444/1726 [7:44:29<22:13:28, 62.41s/it] 26%|██▌ | 445/1726 [7:45:29<21:57:09, 61.69s/it] 26%|██▌ | 445/1726 [7:45:29<21:57:09, 61.69s/it] 26%|██▌ | 446/1726 [7:46:31<21:58{'loss': 1.2795, 'learning_rate': 3.477720310572383e-05, 'epoch': 0.26} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 08:23:57,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.22 | bwd_microstep: 1489.72 | bwd_inner_microstep: 1489.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2338 [2024-06-10 08:23:58,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.57 | bwd_microstep: 984.11 | bwd_inner_microstep: 984.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 08:24:00,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.72 | bwd_microstep: 1479.38 | bwd_inner_microstep: 1479.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 08:24:02,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1475.58 | bwd_inner_microstep: 1475.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 08:24:03,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 787.21 | bwd_inner_microstep: 787.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2705 [2024-06-10 08:24:05,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.71 | bwd_microstep: 1032.34 | bwd_inner_microstep: 1032.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 08:24:07,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1296.70 | bwd_inner_microstep: 1296.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 08:24:08,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1313.01 | bwd_inner_microstep: 1312.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1454 [2024-06-10 08:24:09,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 210.12 | bwd_microstep: 540.34 | bwd_inner_microstep: 540.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-10 08:24:10,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.58 | bwd_microstep: 711.89 | bwd_inner_microstep: 711.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3668 [2024-06-10 08:24:13,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.06 | bwd_microstep: 1772.85 | bwd_inner_microstep: 1772.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480 [2024-06-10 08:24:14,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.75 | bwd_microstep: 1408.37 | bwd_inner_microstep: 1408.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3712 [2024-06-10 08:24:17,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.21 | bwd_microstep: 1779.04 | bwd_inner_microstep: 1779.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:24:19,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 08:24:21,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1556.18 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 08:24:23,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.57 | bwd_microstep: 1296.51 | bwd_inner_microstep: 1296.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 08:24:25,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.76 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 08:24:26,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.25 | bwd_microstep: 803.76 | bwd_inner_microstep: 803.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967 [2024-06-10 08:24:27,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.93 | bwd_microstep: 734.08 | bwd_inner_microstep: 734.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816 [2024-06-10 08:24:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1515.08 | bwd_inner_microstep: 1515.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 08:24:31,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3723 [2024-06-10 08:24:33,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 08:24:35,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.53 | bwd_microstep: 1657.98 | bwd_inner_microstep: 1657.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 08:24:37,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1300.79 | bwd_inner_microstep: 1300.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 08:24:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.07 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3623 [2024-06-10 08:24:41,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.19 | bwd_microstep: 1579.36 | bwd_inner_microstep: 1579.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3709 [2024-06-10 08:24:43,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.11 | bwd_microstep: 1730.28 | bwd_inner_microstep: 1730.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672 [2024-06-10 08:24:45,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1415.91 | bwd_inner_microstep: 1415.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 08:24:47,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.76 | bwd_microstep: 1555.19 | bwd_inner_microstep: 1555.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590 [2024-06-10 08:24:50,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.40 | bwd_microstep: 1706.48 | bwd_inner_microstep: 1706.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769 [2024-06-10 08:24:52,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.08 | bwd_microstep: 1635.37 | bwd_inner_microstep: 1635.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 08:24:56,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.18 | optimizer_step: 6.58 [2024-06-10 08:24:56,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.86 | bwd_microstep: 3075.82 | bwd_inner_microstep: 1805.79 | bwd_allreduce_microstep: 1269.99 | step_microstep: 37.89 [2024-06-10 08:24:56,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16049.81 | bwd: 44449.92 | bwd_inner: 43178.97 | bwd_allreduce: 1270.23 | step: 39.50 {'loss': 1.2685, 'learning_rate': 3.475188450740417e-05, 'epoch': 0.26} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 08:24:57,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1336.19 | bwd_inner_microstep: 1336.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600 [2024-06-10 08:25:00,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.06 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880 [2024-06-10 08:25:02,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1582.05 | bwd_inner_microstep: 1582.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 08:25:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.11 | bwd_microstep: 1651.66 | bwd_inner_microstep: 1651.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 08:25:06,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 08:25:08,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.63 | bwd_microstep: 1347.83 | bwd_inner_microstep: 1347.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 08:25:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1531.80 | bwd_inner_microstep: 1531.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 08:25:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1384.54 | bwd_inner_microstep: 1384.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-10 08:25:13,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1154.13 | bwd_inner_microstep: 1154.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1908 [2024-06-10 08:25:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 718.31 | bwd_inner_microstep: 718.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3693 [2024-06-10 08:25:16,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.39 | bwd_microstep: 1488.15 | bwd_inner_microstep: 1488.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 08:25:18,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3629 [2024-06-10 08:25:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 08:25:22,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 08:25:23,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.39 | bwd_microstep: 781.10 | bwd_inner_microstep: 781.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 08:25:25,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1520.40 | bwd_inner_microstep: 1520.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478 [2024-06-10 08:25:27,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.27 | bwd_microstep: 1313.45 | bwd_inner_microstep: 1313.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 08:25:29,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.67 | bwd_microstep: 1546.91 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 08:25:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1515.45 | bwd_inner_microstep: 1515.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 08:25:33,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1393.28 | bwd_inner_microstep: 1393.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-10 08:25:35,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1511.69 | bwd_inner_microstep: 1511.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2037 [2024-06-10 08:25:36,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.17 | bwd_microstep: 745.73 | bwd_inner_microstep: 745.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 08:25:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1454.25 | bwd_inner_microstep: 1454.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603 [2024-06-10 08:25:40,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1309.35 | bwd_inner_microstep: 1309.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2100 [2024-06-10 08:25:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 823.95 | bwd_inner_microstep: 823.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 08:25:43,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1281.27 | bwd_inner_microstep: 1281.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634 [2024-06-10 08:25:45,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.27 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3568 [2024-06-10 08:25:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1240.85 | bwd_inner_microstep: 1240.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3538 [2024-06-10 08:25:48,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.26 | bwd_microstep: 1536.22 | bwd_inner_microstep: 1536.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567 [2024-06-10 08:25:51,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1594.36 | bwd_inner_microstep: 1594.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807 [2024-06-10 08:25:53,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.31 | bwd_microstep: 1608.50 | bwd_inner_microstep: 1608.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-10 08:25:57,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-10 08:25:57,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 3481.04 | bwd_inner_microstep: 1904.83 | bwd_allreduce_microstep: 1576.16 | step_microstep: 38.02 [2024-06-10 08:25:57,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16085.36 | bwd: 44999.63 | bwd_inner: 43422.57 | bwd_allreduce: 1576.39 | step: 39.56 {'loss': 1.2854, 'learning_rate': 3.4726513953023944e-05, 'epoch': 0.26} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 08:25:59,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.29 | bwd_microstep: 1442.81 | bwd_inner_microstep: 1442.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 08:26:01,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.50 | bwd_microstep: 1249.07 | bwd_inner_microstep: 1249.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 08:26:02,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1284.93 | bwd_inner_microstep: 1284.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 08:26:04,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.04 | bwd_microstep: 1346.99 | bwd_inner_microstep: 1346.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-10 08:26:06,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.06 | bwd_microstep: 1374.61 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:26:08,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1383.60 | bwd_inner_microstep: 1383.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023 [2024-06-10 08:26:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.29 | bwd_microstep: 715.07 | bwd_inner_microstep: 715.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 963 [2024-06-10 08:26:10,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 149.69 | bwd_microstep: 385.81 | bwd_inner_microstep: 385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 08:26:11,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.23 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710 [2024-06-10 08:26:13,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1426.30 | bwd_inner_microstep: 1426.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670 [2024-06-10 08:26:16,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.92 | bwd_microstep: 1716.43 | bwd_inner_microstep: 1716.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-10 08:26:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.23 | bwd_microstep: 1621.45 | bwd_inner_microstep: 1621.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 08:26:20,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1611.99 | bwd_inner_microstep: 1611.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3682 [2024-06-10 08:26:22,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1481.03 | bwd_inner_microstep: 1481.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-10 08:26:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1407.33 | bwd_inner_microstep: 1407.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3417 [2024-06-10 08:26:26,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.22 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-10 08:26:28,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.56 | bwd_microstep: 1500.94 | bwd_inner_microstep: 1500.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2561 [2024-06-10 08:26:30,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.04 | bwd_microstep: 1068.31 | bwd_inner_microstep: 1068.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2719 [2024-06-10 08:26:31,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.25 | bwd_microstep: 1034.72 | bwd_inner_microstep: 1034.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 08:26:32,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.81 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672 [2024-06-10 08:26:34,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1554.85 | bwd_inner_microstep: 1554.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191 [2024-06-10 08:26:36,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.96 | bwd_microstep: 957.97 | bwd_inner_microstep: 957.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-10 08:26:37,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1327.71 | bwd_inner_microstep: 1327.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 08:26:40,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 1638.88 | bwd_inner_microstep: 1638.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 08:26:42,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-10 08:26:44,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1512.10 | bwd_inner_microstep: 1512.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 08:26:46,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1407.48 | bwd_inner_microstep: 1407.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745 [2024-06-10 08:26:48,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.52 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 08:26:50,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1376.92 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 08:26:52,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.93 | bwd_microstep: 1501.54 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3772 [2024-06-10 08:26:54,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.58 | bwd_microstep: 1465.80 | bwd_inner_microstep: 1465.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2274 [2024-06-10 08:26:56,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 08:26:56,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.76 | bwd_microstep: 2107.11 | bwd_inner_microstep: 1022.95 | bwd_allreduce_microstep: 1084.12 | step_microstep: 37.81 [2024-06-10 08:26:56,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15670.01 | bwd: 43151.21 | bwd_inner: 42066.19 | bwd_allreduce: 1084.34 | step: 39.37 {'loss': 1.2651, 'learning_rate': 3.470109153193815e-05, 'epoch': 0.26} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 08:26:58,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1339.17 | bwd_inner_microstep: 1339.10 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4668 [2024-06-10 08:27:00,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1482.82 | bwd_inner_microstep: 1482.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925 [2024-06-10 08:27:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.85 | bwd_microstep: 1589.74 | bwd_inner_microstep: 1589.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 08:27:04,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1246.97 | bwd_inner_microstep: 1246.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 08:27:06,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.47 | bwd_microstep: 1432.83 | bwd_inner_microstep: 1432.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 08:27:08,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1344.77 | bwd_inner_microstep: 1344.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 08:27:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.35 | bwd_microstep: 1280.65 | bwd_inner_microstep: 1280.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 08:27:12,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1481.48 | bwd_inner_microstep: 1481.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 08:27:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910 [2024-06-10 08:27:15,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.11 | bwd_microstep: 753.87 | bwd_inner_microstep: 753.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3580 [2024-06-10 08:27:16,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1366.29 | bwd_inner_microstep: 1366.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-10 08:27:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3416 [2024-06-10 08:27:20,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1374.88 | bwd_inner_microstep: 1374.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 08:27:22,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2132 [2024-06-10 08:27:23,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.75 | bwd_microstep: 927.96 | bwd_inner_microstep: 927.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 08:27:25,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.82 | bwd_microstep: 1453.08 | bwd_inner_microstep: 1453.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 08:27:27,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1384.75 | bwd_inner_microstep: 1384.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 08:27:29,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 08:27:31,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1462.71 | bwd_inner_microstep: 1462.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 08:27:32,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.64 | bwd_microstep: 801.53 | bwd_inner_microstep: 801.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 08:27:34,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1185.37 | bwd_inner_microstep: 1185.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 08:27:36,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1510.85 | bwd_inner_microstep: 1510.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 08:27:38,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1296.64 | bwd_inner_microstep: 1296.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 08:27:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603 [2024-06-10 08:27:42,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 1215.06 | bwd_inner_microstep: 1215.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3549 [2024-06-10 08:27:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.53 | bwd_microstep: 1540.60 | bwd_inner_microstep: 1540.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475 [2024-06-10 08:27:46,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.33 | bwd_microstep: 1318.81 | bwd_inner_microstep: 1318.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-10 08:27:48,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3810 [2024-06-10 08:27:50,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1418.25 | bwd_inner_microstep: 1418.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3822 [2024-06-10 08:27:52,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.56 | bwd_microstep: 1722.96 | bwd_inner_microstep: 1722.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 08:27:54,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1486.58 | bwd_inner_microstep: 1486.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-10 08:27:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 08:27:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.00 | bwd_microstep: 3992.35 | bwd_inner_microstep: 1688.97 | bwd_allreduce_microstep: 2303.33 | step_microstep: 38.34 [2024-06-10 08:27:59,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16300.00 | bwd: 45837.05 | bwd_inner: 43532.75 | bwd_allreduce: 2303.59 | step: 40.14 {'loss': 1.2745, 'learning_rate': 3.467561733368439e-05, 'epoch': 0.26} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487 [2024-06-10 08:28:00,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.22 | bwd_microstep: 1340.14 | bwd_inner_microstep: 1340.06 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 08:28:03,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1490.82 | bwd_inner_microstep: 1490.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:28:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1381.19 | bwd_inner_microstep: 1381.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 08:28:07,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.87 | bwd_microstep: 1539.82 | bwd_inner_microstep: 1539.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487 [2024-06-10 08:28:08,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.69 | bwd_microstep: 1187.03 | bwd_inner_microstep: 1187.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-10 08:28:10,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1415.61 | bwd_inner_microstep: 1415.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 08:28:12,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448 [2024-06-10 08:28:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1299.04 | bwd_inner_microstep: 1299.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485 [2024-06-10 08:28:16,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.37 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 08:28:18,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1470.02 | bwd_inner_microstep: 1470.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-10 08:28:20,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.25 | bwd_microstep: 1522.73 | bwd_inner_microstep: 1522.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3656 [2024-06-10 08:28:22,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1560.79 | bwd_inner_microstep: 1560.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-10 08:28:24,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.35 | bwd_microstep: 1524.21 | bwd_inner_microstep: 1524.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 08:28:26,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1280.41 | bwd_inner_microstep: 1280.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-10 08:28:28,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1523.25 | bwd_inner_microstep: 1523.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3464 [2024-06-10 08:28:30,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.57 | bwd_microstep: 1606.10 | bwd_inner_microstep: 1606.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643 [2024-06-10 08:28:32,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.89 | bwd_microstep: 1318.22 | bwd_inner_microstep: 1318.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 08:28:34,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.45 | bwd_microstep: 1628.19 | bwd_inner_microstep: 1628.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 08:28:36,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.27 | bwd_microstep: 1500.75 | bwd_inner_microstep: 1500.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-10 08:28:38,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1410.33 | bwd_inner_microstep: 1410.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 08:28:40,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.48 | bwd_microstep: 1453.71 | bwd_inner_microstep: 1453.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 08:28:42,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.42 | bwd_microstep: 1416.67 | bwd_inner_microstep: 1416.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909 [2024-06-10 08:28:43,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.23 | bwd_microstep: 686.38 | bwd_inner_microstep: 686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2080 [2024-06-10 08:28:44,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.94 | bwd_microstep: 727.18 | bwd_inner_microstep: 727.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-10 08:28:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.28 | bwd_microstep: 1423.51 | bwd_inner_microstep: 1423.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 08:28:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1346.23 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3568 [2024-06-10 08:28:50,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.69 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794 [2024-06-10 08:28:52,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.17 | bwd_microstep: 1446.58 | bwd_inner_microstep: 1446.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-10 08:28:54,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.81 | bwd_microstep: 1645.79 | bwd_inner_microstep: 1645.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 08:28:56,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1547.60 | bwd_inner_microstep: 1547.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782 [2024-06-10 08:28:58,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.66 | bwd_microstep: 1449.74 | bwd_inner_microstep: 1449.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624 [2024-06-10 08:29:00,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.64 [2024-06-10 08:29:00,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.29 | bwd_microstep: 1576.83 | bwd_inner_microstep: 1569.10 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.69 [2024-06-10 08:29:00,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16723.07 | bwd: 44698.12 | bwd_inner: 44689.47 | bwd_allreduce: 7.95 | step: 39.44 {'loss': 1.269, 'learning_rate': 3.465009144798268e-05, 'epoch': 0.26} :18, 61.80s/it] 26%|██▌ | 446/1726 [7:46:31<21:58:18, 61.80s/it] 26%|██▌ | 447/1726 [7:47:32<21:51:17, 61.51s/it] 26%|██▌ | 447/1726 [7:47:32<21:51:17, 61.51s/it] 26%|██▌ | 448/1726 [7:48:34<21:49:38, 61.49s/it] 26%|██▌ | 448/1726 [7:48:34<21:49:38, 61.49s/it] 26%|██▌ | 449/1726 [7:49:33<21:33:45, 60.79s/it] 26%|██▌ | 449/1726 [7:49:33<21:33:45, 60.79s/it] 26%|██▌ | 450/1726 [7:50:35<21:43:37, 61.30s/it] 26%|██▌ | 450/1726 [7:50:35<21:43:37, 61.30s/it] 26%|██▌ | 451/1726 [7:51:37<21:45:36, 61.44s/it] 26%|██▌ | 451/1726 [7:51:37<21:45dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 08:29:02,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1279.72 | bwd_inner_microstep: 1279.58 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2632 [2024-06-10 08:29:04,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1050.46 | bwd_inner_microstep: 1050.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 08:29:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791 [2024-06-10 08:29:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1645.38 | bwd_inner_microstep: 1645.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 08:29:09,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.04 | bwd_microstep: 1273.66 | bwd_inner_microstep: 1273.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-10 08:29:11,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.54 | bwd_microstep: 1443.44 | bwd_inner_microstep: 1443.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 08:29:13,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1243.89 | bwd_inner_microstep: 1243.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4035 [2024-06-10 08:29:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.13 | bwd_microstep: 1419.83 | bwd_inner_microstep: 1419.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884 [2024-06-10 08:29:16,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.15 | bwd_microstep: 682.44 | bwd_inner_microstep: 682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 08:29:18,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 08:29:20,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1378.35 | bwd_inner_microstep: 1378.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3562 [2024-06-10 08:29:22,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.92 | bwd_microstep: 1447.17 | bwd_inner_microstep: 1447.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-10 08:29:24,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.70 | bwd_inner_microstep: 1538.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 08:29:26,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1389.49 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499 [2024-06-10 08:29:28,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 08:29:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1281.24 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 08:29:32,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1556.06 | bwd_inner_microstep: 1556.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101 [2024-06-10 08:29:33,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.25 | bwd_microstep: 826.91 | bwd_inner_microstep: 826.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 08:29:35,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 08:29:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.98 | bwd_microstep: 1286.56 | bwd_inner_microstep: 1286.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 08:29:39,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.74 | bwd_inner_microstep: 1507.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2254 [2024-06-10 08:29:40,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.84 | bwd_microstep: 871.37 | bwd_inner_microstep: 871.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 08:29:41,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.47 | bwd_microstep: 801.63 | bwd_inner_microstep: 801.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 08:29:43,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1297.08 | bwd_inner_microstep: 1297.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 08:29:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1288.78 | bwd_inner_microstep: 1288.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 08:29:47,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.34 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 08:29:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1397.87 | bwd_inner_microstep: 1397.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281 [2024-06-10 08:29:50,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.21 | bwd_microstep: 908.74 | bwd_inner_microstep: 908.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 08:29:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.21 | bwd_microstep: 802.80 | bwd_inner_microstep: 802.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 08:29:53,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1487.92 | bwd_inner_microstep: 1487.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3949 [2024-06-10 08:29:56,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.23 | bwd_microstep: 1593.15 | bwd_inner_microstep: 1593.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3583 [2024-06-10 08:30:01,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.62 [2024-06-10 08:30:01,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.61 | bwd_microstep: 4271.29 | bwd_inner_microstep: 1782.33 | bwd_allreduce_microstep: 2488.90 | step_microstep: 38.89 [2024-06-10 08:30:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15620.81 | bwd: 44221.19 | bwd_inner: 41731.24 | bwd_allreduce: 2489.19 | step: 40.51 {'loss': 1.309, 'learning_rate': 3.462451396473505e-05, 'epoch': 0.26} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 08:30:02,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.87 | bwd_microstep: 879.20 | bwd_inner_microstep: 879.06 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948 [2024-06-10 08:30:04,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.26 | bwd_microstep: 1592.64 | bwd_inner_microstep: 1592.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 08:30:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1381.32 | bwd_inner_microstep: 1381.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 08:30:08,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1542.19 | bwd_inner_microstep: 1542.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 08:30:10,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1284.94 | bwd_inner_microstep: 1284.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 08:30:12,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1250.98 | bwd_inner_microstep: 1250.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722 [2024-06-10 08:30:13,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1335.07 | bwd_inner_microstep: 1335.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 08:30:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.90 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2657 [2024-06-10 08:30:17,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.75 | bwd_microstep: 1101.78 | bwd_inner_microstep: 1101.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 08:30:19,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1389.17 | bwd_inner_microstep: 1389.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-10 08:30:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.22 | bwd_microstep: 1432.87 | bwd_inner_microstep: 1432.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-10 08:30:23,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1413.43 | bwd_inner_microstep: 1413.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3001 [2024-06-10 08:30:24,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1171.51 | bwd_inner_microstep: 1171.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461 [2024-06-10 08:30:26,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.93 | bwd_microstep: 1437.64 | bwd_inner_microstep: 1437.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687 [2024-06-10 08:30:28,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.58 | bwd_microstep: 1524.80 | bwd_inner_microstep: 1524.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3503 [2024-06-10 08:30:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1616.70 | bwd_inner_microstep: 1616.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084 [2024-06-10 08:30:32,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 818.29 | bwd_inner_microstep: 818.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429 [2024-06-10 08:30:34,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 08:30:36,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1514.12 | bwd_inner_microstep: 1514.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-10 08:30:38,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 08:30:40,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1489.80 | bwd_inner_microstep: 1489.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-10 08:30:42,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.37 | bwd_microstep: 1325.62 | bwd_inner_microstep: 1325.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 08:30:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1455.42 | bwd_inner_microstep: 1455.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-10 08:30:46,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1526.34 | bwd_inner_microstep: 1526.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 08:30:48,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.07 | bwd_inner_microstep: 1399.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465 [2024-06-10 08:30:49,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.53 | bwd_microstep: 1184.11 | bwd_inner_microstep: 1184.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 08:30:52,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.95 | bwd_microstep: 1533.99 | bwd_inner_microstep: 1533.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621 [2024-06-10 08:30:54,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.86 | bwd_microstep: 1443.74 | bwd_inner_microstep: 1443.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 08:30:56,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593 [2024-06-10 08:30:58,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.82 | bwd_microstep: 1704.49 | bwd_inner_microstep: 1704.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769 [2024-06-10 08:31:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1365.91 | bwd_inner_microstep: 1365.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-10 08:31:02,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.65 [2024-06-10 08:31:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.91 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 7.62 | step_microstep: 37.65 [2024-06-10 08:31:02,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16595.33 | bwd: 44427.71 | bwd_inner: 44419.08 | bwd_allreduce: 7.90 | step: 39.24 {'loss': 1.3053, 'learning_rate': 3.459888497402526e-05, 'epoch': 0.26} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 08:31:04,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1444.89 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2356 [2024-06-10 08:31:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.53 | bwd_microstep: 893.17 | bwd_inner_microstep: 893.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 08:31:07,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1457.59 | bwd_inner_microstep: 1457.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925 [2024-06-10 08:31:09,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.68 | bwd_microstep: 1596.35 | bwd_inner_microstep: 1596.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 08:31:11,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.33 | bwd_microstep: 1288.39 | bwd_inner_microstep: 1288.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3830 [2024-06-10 08:31:13,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1322.00 | bwd_inner_microstep: 1321.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 08:31:15,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1300.27 | bwd_inner_microstep: 1300.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 08:31:17,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1254.03 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2178 [2024-06-10 08:31:18,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.01 | bwd_microstep: 985.91 | bwd_inner_microstep: 985.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420 [2024-06-10 08:31:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.87 | bwd_microstep: 1315.54 | bwd_inner_microstep: 1315.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 08:31:22,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 08:31:23,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 791.84 | bwd_inner_microstep: 791.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 08:31:25,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688 [2024-06-10 08:31:26,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.99 | bwd_microstep: 1424.00 | bwd_inner_microstep: 1423.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3620 [2024-06-10 08:31:29,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.00 | bwd_microstep: 1810.98 | bwd_inner_microstep: 1810.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 08:31:31,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.59 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 08:31:33,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.16 | bwd_microstep: 1558.54 | bwd_inner_microstep: 1558.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 08:31:35,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1294.55 | bwd_inner_microstep: 1294.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695 [2024-06-10 08:31:37,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1424.48 | bwd_inner_microstep: 1424.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626 [2024-06-10 08:31:39,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.30 | bwd_microstep: 1475.05 | bwd_inner_microstep: 1475.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 08:31:41,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1295.61 | bwd_inner_microstep: 1295.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529 [2024-06-10 08:31:42,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1329.23 | bwd_inner_microstep: 1329.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3730 [2024-06-10 08:31:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.34 | bwd_microstep: 1277.25 | bwd_inner_microstep: 1277.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3613 [2024-06-10 08:31:46,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1275.64 | bwd_inner_microstep: 1275.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 08:31:47,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.34 | bwd_microstep: 695.94 | bwd_inner_microstep: 695.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-10 08:31:49,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1439.73 | bwd_inner_microstep: 1439.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766 [2024-06-10 08:31:51,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1375.91 | bwd_inner_microstep: 1375.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 08:31:53,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1509.94 | bwd_inner_microstep: 1509.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 08:31:55,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.23 | bwd_microstep: 1502.34 | bwd_inner_microstep: 1502.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801 [2024-06-10 08:31:57,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1451.22 | bwd_inner_microstep: 1451.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 08:31:59,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.89 | bwd_microstep: 1648.05 | bwd_inner_microstep: 1648.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803 [2024-06-10 08:32:02,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.14 | optimizer_step: 6.58 [2024-06-10 08:32:02,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 2310.14 | bwd_inner_microstep: 1819.88 | bwd_allreduce_microstep: 490.22 | step_microstep: 37.54 [2024-06-10 08:32:02,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16162.79 | bwd: 43703.07 | bwd_inner: 43211.90 | bwd_allreduce: 490.46 | step: 39.18 {'loss': 1.2901, 'learning_rate': 3.4573204566118476e-05, 'epoch': 0.26} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2008 [2024-06-10 08:32:03,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.92 | bwd_microstep: 890.52 | bwd_inner_microstep: 890.43 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547 [2024-06-10 08:32:05,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.37 | bwd_microstep: 1292.59 | bwd_inner_microstep: 1292.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3854 [2024-06-10 08:32:07,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4317 [2024-06-10 08:32:09,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1585.79 | bwd_inner_microstep: 1585.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 08:32:10,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.13 | bwd_microstep: 791.36 | bwd_inner_microstep: 791.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-10 08:32:12,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1310.04 | bwd_inner_microstep: 1310.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 08:32:14,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.54 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 718 [2024-06-10 08:32:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.84 | bwd_microstep: 290.34 | bwd_inner_microstep: 290.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 08:32:16,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1431.40 | bwd_inner_microstep: 1431.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 08:32:18,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.07 | bwd_microstep: 1161.33 | bwd_inner_microstep: 1161.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1876 [2024-06-10 08:32:19,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.11 | bwd_microstep: 742.79 | bwd_inner_microstep: 742.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966 [2024-06-10 08:32:20,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.10 | bwd_microstep: 853.57 | bwd_inner_microstep: 853.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-10 08:32:22,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.07 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 08:32:24,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1347.71 | bwd_inner_microstep: 1347.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 08:32:26,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1285.21 | bwd_inner_microstep: 1285.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643 [2024-06-10 08:32:28,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1511.62 | bwd_inner_microstep: 1511.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3871 [2024-06-10 08:32:30,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.35 | bwd_microstep: 1498.05 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 08:32:32,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1553.98 | bwd_inner_microstep: 1553.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2448 [2024-06-10 08:32:33,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.54 | bwd_microstep: 853.67 | bwd_inner_microstep: 853.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2421 [2024-06-10 08:32:35,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.66 | bwd_microstep: 1036.77 | bwd_inner_microstep: 1036.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-10 08:32:36,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.79 | bwd_microstep: 697.47 | bwd_inner_microstep: 697.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 08:32:38,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.60 | bwd_microstep: 1294.52 | bwd_inner_microstep: 1294.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 08:32:39,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.01 | bwd_microstep: 802.70 | bwd_inner_microstep: 802.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4148 [2024-06-10 08:32:41,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.79 | bwd_microstep: 1647.65 | bwd_inner_microstep: 1647.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 08:32:43,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1389.44 | bwd_inner_microstep: 1389.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2017 [2024-06-10 08:32:44,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.69 | bwd_microstep: 839.44 | bwd_inner_microstep: 839.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2841 [2024-06-10 08:32:46,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.55 | bwd_microstep: 1159.55 | bwd_inner_microstep: 1159.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564 [2024-06-10 08:32:47,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.65 | bwd_microstep: 1344.72 | bwd_inner_microstep: 1344.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827 [2024-06-10 08:32:50,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.96 | bwd_microstep: 1750.64 | bwd_inner_microstep: 1750.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2562 [2024-06-10 08:32:51,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.92 | bwd_microstep: 1101.34 | bwd_inner_microstep: 1101.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816 [2024-06-10 08:32:53,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.54 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787 [2024-06-10 08:33:02,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 08:33:02,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.43 | bwd_microstep: 7800.95 | bwd_inner_microstep: 1628.74 | bwd_allreduce_microstep: 6172.15 | step_microstep: 37.95 [2024-06-10 08:33:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14498.34 | bwd: 44912.29 | bwd_inner: 38739.16 | bwd_allreduce: 6172.42 | step: 39.49 {'loss': 1.3042, 'learning_rate': 3.4547472831460976e-05, 'epoch': 0.26} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 08:33:04,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1588.63 | bwd_inner_microstep: 1588.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4511 [2024-06-10 08:33:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.49 | bwd_microstep: 1635.85 | bwd_inner_microstep: 1635.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 08:33:08,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.06 | bwd_microstep: 1286.52 | bwd_inner_microstep: 1286.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 08:33:10,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1544.78 | bwd_inner_microstep: 1544.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 08:33:12,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1478.98 | bwd_inner_microstep: 1478.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2588 [2024-06-10 08:33:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.74 | bwd_microstep: 1041.07 | bwd_inner_microstep: 1041.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2260 [2024-06-10 08:33:15,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.62 | bwd_microstep: 967.74 | bwd_inner_microstep: 967.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 08:33:17,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3436 [2024-06-10 08:33:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.96 | bwd_microstep: 1184.19 | bwd_inner_microstep: 1184.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 08:33:20,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1384.20 | bwd_inner_microstep: 1384.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 08:33:22,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1300.45 | bwd_inner_microstep: 1300.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647 [2024-06-10 08:33:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1541.41 | bwd_inner_microstep: 1541.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 08:33:26,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1286.21 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969 [2024-06-10 08:33:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.31 | bwd_microstep: 891.72 | bwd_inner_microstep: 891.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3651 [2024-06-10 08:33:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.79 | bwd_microstep: 1642.42 | bwd_inner_microstep: 1642.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 08:33:31,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.57 | bwd_microstep: 1373.39 | bwd_inner_microstep: 1373.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2015 [2024-06-10 08:33:33,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.04 | bwd_microstep: 831.41 | bwd_inner_microstep: 831.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3629 [2024-06-10 08:33:35,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.47 | bwd_microstep: 1807.82 | bwd_inner_microstep: 1807.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 08:33:37,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-10 08:33:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.59 | bwd_microstep: 1503.77 | bwd_inner_microstep: 1503.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 08:33:41,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1554.83 | bwd_inner_microstep: 1554.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 08:33:43,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1294.70 | bwd_inner_microstep: 1294.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3584 [2024-06-10 08:33:45,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.41 | bwd_microstep: 1207.23 | bwd_inner_microstep: 1207.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616 [2024-06-10 08:33:47,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.97 | bwd_microstep: 1340.35 | bwd_inner_microstep: 1340.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 08:33:48,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1350.19 | bwd_inner_microstep: 1350.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 08:33:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1388.70 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599 [2024-06-10 08:33:53,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.00 | bwd_microstep: 1702.04 | bwd_inner_microstep: 1702.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738 [2024-06-10 08:33:55,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.20 | bwd_microstep: 1432.74 | bwd_inner_microstep: 1432.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769 [2024-06-10 08:33:57,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.88 | bwd_microstep: 1507.65 | bwd_inner_microstep: 1507.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571 [2024-06-10 08:33:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1429.74 | bwd_inner_microstep: 1429.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215 [2024-06-10 08:34:00,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.60 | bwd_microstep: 954.90 | bwd_inner_microstep: 954.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595 [2024-06-10 08:34:02,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.15 | optimizer_step: 6.57 [2024-06-10 08:34:02,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.25 | bwd_microstep: 1842.51 | bwd_inner_microstep: 1455.26 | bwd_allreduce_microstep: 387.21 | step_microstep: 37.67 [2024-06-10 08:34:02,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16282.74 | bwd: 43982.67 | bwd_inner: 43594.56 | bwd_allreduce: 387.43 | step: 39.23 {'loss': 1.263, 'learning_rate': 3.452168986067979e-05, 'epoch': 0.26} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3456 [2024-06-10 08:34:05,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.45 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.11 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882 [2024-06-10 08:34:07,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1487.43 | bwd_inner_microstep: 1487.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 08:34:09,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1376.07 | bwd_inner_microstep: 1376.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 08:34:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1650.51 | bwd_inner_microstep: 1650.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 08:34:13,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 1448.41 | bwd_inner_microstep: 1448.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425 [2024-06-10 08:34:15,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.99 | bwd_microstep: 1311.20 | bwd_inner_microstep: 1311.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 08:34:16,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1277.34 | bwd_inner_microstep: 1277.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 08:34:18,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1533.56 | bwd_inner_microstep: 1533.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773 [2024-06-10 08:34:21,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.47 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1505.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-10 08:34:23,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1423.00 | bwd_inner_microstep: 1422.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 08:34:24,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1347.71 | bwd_inner_microstep: 1347.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 08:34:26,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1390.04 | bwd_inner_microstep: 1390.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 08:34:28,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.20 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 08:34:30,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.47 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 08:34:32,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.91 | bwd_microstep: 1604.27 | bwd_inner_microstep: 1604.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 08:34:34,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1425.54 | bwd_inner_microstep: 1425.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3715 [2024-06-10 08:34:37,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1697.40 | bwd_inner_microstep: 1697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 08:34:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1605.60 | bwd_inner_microstep: 1605.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543 [2024-06-10 08:34:41,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.88 | bwd_microstep: 1326.70 | bwd_inner_microstep: 1326.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-10 08:34:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1355.88 | bwd_inner_microstep: 1355.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 08:34:45,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1498.75 | bwd_inner_microstep: 1498.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 08:34:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1295.66 | bwd_inner_microstep: 1295.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-10 08:34:49,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.64 | bwd_microstep: 1510.01 | bwd_inner_microstep: 1509.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 08:34:51,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1497.22 | bwd_inner_microstep: 1497.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 08:34:53,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.18 | bwd_microstep: 1401.38 | bwd_inner_microstep: 1401.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 08:34:54,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1351.17 | bwd_inner_microstep: 1351.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452 [2024-06-10 08:34:56,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1446.34 | bwd_inner_microstep: 1446.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 08:34:58,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1346.29 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 08:35:01,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.06 | bwd_microstep: 1650.68 | bwd_inner_microstep: 1650.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 08:35:03,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.80 | bwd_microstep: 1458.32 | bwd_inner_microstep: 1458.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3608 [2024-06-10 08:35:05,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 08:35:07,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.64 [2024-06-10 08:35:07,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.49 | bwd_microstep: 1512.34 | bwd_inner_microstep: 1504.61 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.63 [2024-06-10 08:35:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17370.28 | bwd: 46577.93 | bwd_inner: 46569.26 | bwd_allreduce: 7.96 | step: 39.20 :36, 61.44s/it] 26%|██▌ | 452/1726 [7:52:37<21:36:35, 61.06s/it] 26%|██▌ | 452/1726 [7:52:37<21:36:35, 61.06s/it] 26%|██▌ | 453/1726 [7:53:39<21:37:31, 61.16s/it] 26%|██▌ | 453/1726 [7:53:39<21:37:31, 61.16s/it] 26%|██▋ | 454/1726 [7:54:39<21:30:26, 60.87s/it] 26%|██▋ | 454/1726 [7:54:39<21:30:26, 60.87s/it] 26%|██▋ | 455/1726 [7:55:39<21:22:15, 60.53s/it] 26%|██▋ | 455/1726 [7:55:39<21:22:15, 60.53s/it] 26%|██▋ | 456/1726 [7:56:39<21:21:45, 60.56s/it] 26%|██▋ | 456/1726 [7:56:39<21:21:45, 60.56s/it] 26%|██▋ | 457/1726 [7:57:44<21:44:29, 61.68s/it] {'loss': 1.2736, 'learning_rate': 3.44958557445824e-05, 'epoch': 0.26} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 08:35:09,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1338.62 | bwd_inner_microstep: 1338.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3917 [2024-06-10 08:35:11,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.37 | bwd_microstep: 1695.36 | bwd_inner_microstep: 1695.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 08:35:13,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1348.94 | bwd_inner_microstep: 1348.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865 [2024-06-10 08:35:15,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.17 | bwd_microstep: 1462.62 | bwd_inner_microstep: 1462.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 08:35:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 793.89 | bwd_inner_microstep: 793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 08:35:18,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3682 [2024-06-10 08:35:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.21 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501 [2024-06-10 08:35:22,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.75 | bwd_microstep: 1222.48 | bwd_inner_microstep: 1222.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506 [2024-06-10 08:35:23,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.03 | bwd_microstep: 1222.21 | bwd_inner_microstep: 1222.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-10 08:35:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.85 | bwd_microstep: 1154.68 | bwd_inner_microstep: 1154.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 08:35:27,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.59 | bwd_microstep: 1381.14 | bwd_inner_microstep: 1381.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-10 08:35:28,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.00 | bwd_microstep: 804.81 | bwd_inner_microstep: 804.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378 [2024-06-10 08:35:30,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1336.02 | bwd_inner_microstep: 1335.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 08:35:32,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1394.27 | bwd_inner_microstep: 1394.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 08:35:34,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1292.28 | bwd_inner_microstep: 1292.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-10 08:35:35,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.38 | bwd_microstep: 801.33 | bwd_inner_microstep: 801.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 08:35:37,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1418.03 | bwd_inner_microstep: 1418.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-10 08:35:39,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1438.91 | bwd_inner_microstep: 1438.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 08:35:41,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1396.78 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 08:35:42,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.87 | bwd_inner_microstep: 1402.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624 [2024-06-10 08:35:44,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1313.77 | bwd_inner_microstep: 1313.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2027 [2024-06-10 08:35:45,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.46 | bwd_microstep: 839.68 | bwd_inner_microstep: 839.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 08:35:47,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.23 | bwd_inner_microstep: 1283.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633 [2024-06-10 08:35:49,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.54 | bwd_microstep: 1318.96 | bwd_inner_microstep: 1318.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-10 08:35:51,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1415.23 | bwd_inner_microstep: 1415.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629 [2024-06-10 08:35:53,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2295 [2024-06-10 08:35:55,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.82 | bwd_microstep: 1076.04 | bwd_inner_microstep: 1076.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 08:35:56,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.02 | bwd_microstep: 977.46 | bwd_inner_microstep: 977.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 08:35:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1475.11 | bwd_inner_microstep: 1475.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3011 [2024-06-10 08:36:00,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.66 | bwd_microstep: 1136.37 | bwd_inner_microstep: 1136.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3598 [2024-06-10 08:36:01,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1347.42 | bwd_inner_microstep: 1347.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3777 [2024-06-10 08:36:07,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 08:36:07,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 5167.74 | bwd_inner_microstep: 1752.97 | bwd_allreduce_microstep: 3414.72 | step_microstep: 38.17 [2024-06-10 08:36:07,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15440.57 | bwd: 44661.01 | bwd_inner: 41245.39 | bwd_allreduce: 3414.94 | step: 39.88 {'loss': 1.3033, 'learning_rate': 3.4469970574156436e-05, 'epoch': 0.27} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 08:36:09,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.76 | bwd_microstep: 1267.56 | bwd_inner_microstep: 1267.37 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.65 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1088 [2024-06-10 08:36:10,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 168.50 | bwd_microstep: 434.56 | bwd_inner_microstep: 434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 08:36:11,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.41 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 08:36:14,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1557.70 | bwd_inner_microstep: 1557.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-10 08:36:15,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.89 | bwd_microstep: 1216.17 | bwd_inner_microstep: 1216.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1413 [2024-06-10 08:36:16,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 225.52 | bwd_microstep: 597.25 | bwd_inner_microstep: 597.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 08:36:18,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1385.87 | bwd_inner_microstep: 1385.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-10 08:36:20,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1312.72 | bwd_inner_microstep: 1312.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 08:36:21,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.74 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 08:36:23,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649 [2024-06-10 08:36:25,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.12 | bwd_microstep: 1324.62 | bwd_inner_microstep: 1324.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3723 [2024-06-10 08:36:27,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.01 | bwd_microstep: 1729.81 | bwd_inner_microstep: 1729.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 08:36:29,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.63 | bwd_microstep: 1288.45 | bwd_inner_microstep: 1288.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377 [2024-06-10 08:36:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1332.61 | bwd_inner_microstep: 1332.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 08:36:33,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-10 08:36:35,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.42 | bwd_microstep: 1622.35 | bwd_inner_microstep: 1622.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-10 08:36:37,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967 [2024-06-10 08:36:38,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.05 | bwd_microstep: 890.93 | bwd_inner_microstep: 890.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 08:36:40,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2308 [2024-06-10 08:36:42,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.37 | bwd_microstep: 883.81 | bwd_inner_microstep: 883.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-10 08:36:43,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1429.65 | bwd_inner_microstep: 1429.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469 [2024-06-10 08:36:45,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.52 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-10 08:36:47,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1315.01 | bwd_inner_microstep: 1314.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3535 [2024-06-10 08:36:49,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.38 | bwd_microstep: 1588.47 | bwd_inner_microstep: 1588.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 08:36:52,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.12 | bwd_microstep: 1645.76 | bwd_inner_microstep: 1645.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2273 [2024-06-10 08:36:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 813.63 | bwd_inner_microstep: 813.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 08:36:54,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1294.35 | bwd_inner_microstep: 1294.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 08:36:56,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3596 [2024-06-10 08:36:58,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1246.89 | bwd_inner_microstep: 1246.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568 [2024-06-10 08:37:00,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.33 | bwd_microstep: 1593.60 | bwd_inner_microstep: 1593.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2037 [2024-06-10 08:37:02,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.33 | bwd_microstep: 907.00 | bwd_inner_microstep: 906.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638 [2024-06-10 08:37:08,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.37 | optimizer_step: 6.60 [2024-06-10 08:37:08,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.52 | bwd_microstep: 6064.34 | bwd_inner_microstep: 1148.46 | bwd_allreduce_microstep: 4915.81 | step_microstep: 39.11 [2024-06-10 08:37:08,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15128.62 | bwd: 45320.36 | bwd_inner: 40403.47 | bwd_allreduce: 4916.13 | step: 40.88 {'loss': 1.2928, 'learning_rate': 3.444403444056934e-05, 'epoch': 0.27} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 08:37:10,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.06 | bwd_microstep: 1236.55 | bwd_inner_microstep: 1236.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 08:37:12,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896 [2024-06-10 08:37:14,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.02 | bwd_microstep: 1584.35 | bwd_inner_microstep: 1584.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 08:37:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1552.66 | bwd_inner_microstep: 1552.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865 [2024-06-10 08:37:18,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.71 | bwd_microstep: 1460.82 | bwd_inner_microstep: 1460.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3753 [2024-06-10 08:37:20,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.91 | bwd_microstep: 1501.71 | bwd_inner_microstep: 1501.52 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 08:37:21,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 794.51 | bwd_inner_microstep: 794.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3716 [2024-06-10 08:37:23,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1239.56 | bwd_inner_microstep: 1239.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 08:37:25,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 08:37:26,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 801.46 | bwd_inner_microstep: 801.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4091 [2024-06-10 08:37:28,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.55 | bwd_microstep: 1532.92 | bwd_inner_microstep: 1532.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 08:37:30,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.40 | bwd_microstep: 1254.31 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910 [2024-06-10 08:37:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.85 | bwd_microstep: 749.26 | bwd_inner_microstep: 749.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2154 [2024-06-10 08:37:32,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.78 | bwd_microstep: 887.24 | bwd_inner_microstep: 887.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 08:37:34,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.75 | bwd_microstep: 1347.78 | bwd_inner_microstep: 1347.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 08:37:36,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.34 | bwd_microstep: 1344.71 | bwd_inner_microstep: 1344.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 08:37:37,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2082 [2024-06-10 08:37:38,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.98 | bwd_microstep: 724.94 | bwd_inner_microstep: 724.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 08:37:40,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.69 | bwd_microstep: 1434.68 | bwd_inner_microstep: 1434.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3698 [2024-06-10 08:37:42,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 08:37:44,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1411.72 | bwd_inner_microstep: 1411.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 08:37:46,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702 [2024-06-10 08:37:48,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1427.33 | bwd_inner_microstep: 1427.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3825 [2024-06-10 08:37:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1407.39 | bwd_inner_microstep: 1407.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3556 [2024-06-10 08:37:52,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.97 | bwd_microstep: 1327.24 | bwd_inner_microstep: 1327.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469 [2024-06-10 08:37:54,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1249.54 | bwd_inner_microstep: 1249.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727 [2024-06-10 08:37:56,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.11 | bwd_microstep: 1435.19 | bwd_inner_microstep: 1435.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 08:37:58,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 08:38:00,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.59 | bwd_microstep: 1353.80 | bwd_inner_microstep: 1353.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-10 08:38:01,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.62 | bwd_microstep: 1184.97 | bwd_inner_microstep: 1184.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2918 [2024-06-10 08:38:03,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.92 | bwd_microstep: 1290.74 | bwd_inner_microstep: 1290.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3047 [2024-06-10 08:38:10,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.30 | optimizer_step: 6.62 [2024-06-10 08:38:10,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.47 | bwd_microstep: 5888.53 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 4381.19 | step_microstep: 38.68 [2024-06-10 08:38:10,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15503.98 | bwd: 45720.68 | bwd_inner: 41338.44 | bwd_allreduce: 4381.49 | step: 40.38 {'loss': 1.3046, 'learning_rate': 3.4418047435168025e-05, 'epoch': 0.27} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964 [2024-06-10 08:38:11,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.36 | bwd_microstep: 824.65 | bwd_inner_microstep: 824.52 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664 [2024-06-10 08:38:13,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.62 | bwd_microstep: 1717.19 | bwd_inner_microstep: 1717.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 08:38:15,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.38 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3405 [2024-06-10 08:38:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1294.47 | bwd_inner_microstep: 1294.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 08:38:19,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1555.69 | bwd_inner_microstep: 1555.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 08:38:21,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1545.12 | bwd_inner_microstep: 1545.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741 [2024-06-10 08:38:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1431.50 | bwd_inner_microstep: 1431.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 08:38:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1411.05 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3902 [2024-06-10 08:38:27,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1587.23 | bwd_inner_microstep: 1587.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2148 [2024-06-10 08:38:28,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.55 | bwd_microstep: 788.25 | bwd_inner_microstep: 788.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3619 [2024-06-10 08:38:30,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.95 | bwd_microstep: 1221.68 | bwd_inner_microstep: 1221.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-10 08:38:32,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.92 | bwd_microstep: 1627.63 | bwd_inner_microstep: 1627.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2165 [2024-06-10 08:38:34,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.26 | bwd_microstep: 953.93 | bwd_inner_microstep: 953.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2922 [2024-06-10 08:38:35,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.38 | bwd_microstep: 1129.37 | bwd_inner_microstep: 1129.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731 [2024-06-10 08:38:37,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.50 | bwd_microstep: 1630.14 | bwd_inner_microstep: 1630.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523 [2024-06-10 08:38:40,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.42 | bwd_microstep: 1584.37 | bwd_inner_microstep: 1584.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2386 [2024-06-10 08:38:41,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.18 | bwd_microstep: 1000.69 | bwd_inner_microstep: 1000.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 08:38:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.15 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 08:38:44,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.66 | bwd_microstep: 1282.27 | bwd_inner_microstep: 1282.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1992 [2024-06-10 08:38:45,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.68 | bwd_microstep: 829.63 | bwd_inner_microstep: 829.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 08:38:47,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 08:38:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.97 | bwd_microstep: 1463.68 | bwd_inner_microstep: 1463.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2979 [2024-06-10 08:38:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.55 | bwd_microstep: 1202.01 | bwd_inner_microstep: 1201.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 08:38:53,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3800 [2024-06-10 08:38:55,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.24 | bwd_microstep: 1518.01 | bwd_inner_microstep: 1517.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 08:38:57,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1440.34 | bwd_inner_microstep: 1440.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2022 [2024-06-10 08:38:58,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.46 | bwd_microstep: 743.13 | bwd_inner_microstep: 743.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 08:39:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.25 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 08:39:02,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1354.14 | bwd_inner_microstep: 1354.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 08:39:04,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.88 | bwd_microstep: 1550.74 | bwd_inner_microstep: 1550.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 08:39:06,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1478.92 | bwd_inner_microstep: 1478.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3617 [2024-06-10 08:39:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 08:39:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.64 | bwd_microstep: 4630.23 | bwd_inner_microstep: 1739.63 | bwd_allreduce_microstep: 2890.55 | step_microstep: 37.97 [2024-06-10 08:39:11,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15797.78 | bwd: 45325.86 | bwd_inner: 42434.31 | bwd_allreduce: 2890.82 | step: 39.70 {'loss': 1.2734, 'learning_rate': 3.4392009649478596e-05, 'epoch': 0.27} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 08:39:13,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.66 | bwd_microstep: 1365.05 | bwd_inner_microstep: 1364.95 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920 [2024-06-10 08:39:15,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.80 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3835 [2024-06-10 08:39:17,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1389.26 | bwd_inner_microstep: 1389.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 08:39:19,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.74 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-10 08:39:20,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 817.01 | bwd_inner_microstep: 816.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 08:39:22,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1247.64 | bwd_inner_microstep: 1247.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 08:39:24,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.66 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-10 08:39:25,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 680.31 | bwd_inner_microstep: 680.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 08:39:27,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.43 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 08:39:28,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.27 | bwd_microstep: 1316.24 | bwd_inner_microstep: 1316.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-10 08:39:31,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.70 | bwd_microstep: 1614.83 | bwd_inner_microstep: 1614.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2873 [2024-06-10 08:39:32,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.26 | bwd_microstep: 990.75 | bwd_inner_microstep: 990.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 08:39:34,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 08:39:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.42 | bwd_microstep: 1346.05 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3614 [2024-06-10 08:39:38,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1705.86 | bwd_inner_microstep: 1705.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3635 [2024-06-10 08:39:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1249.75 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 08:39:42,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 08:39:44,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.26 | bwd_microstep: 1298.90 | bwd_inner_microstep: 1298.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-10 08:39:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.37 | bwd_microstep: 1527.10 | bwd_inner_microstep: 1527.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 08:39:48,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829 [2024-06-10 08:39:50,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.28 | bwd_microstep: 1265.01 | bwd_inner_microstep: 1264.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 08:39:51,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 802.18 | bwd_inner_microstep: 802.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-10 08:39:53,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 634 [2024-06-10 08:39:53,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.14 | bwd_microstep: 264.61 | bwd_inner_microstep: 264.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 08:39:55,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.91 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 08:39:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1302.31 | bwd_inner_microstep: 1302.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949 [2024-06-10 08:39:58,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.49 | bwd_microstep: 702.10 | bwd_inner_microstep: 702.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 08:40:00,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.09 | bwd_microstep: 1604.88 | bwd_inner_microstep: 1604.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2003 [2024-06-10 08:40:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.19 | bwd_microstep: 832.50 | bwd_inner_microstep: 832.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934 [2024-06-10 08:40:02,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.35 | bwd_microstep: 727.89 | bwd_inner_microstep: 727.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3569 [2024-06-10 08:40:05,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.02 | bwd_microstep: 1665.78 | bwd_inner_microstep: 1665.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 08:40:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.35 | optimizer_step: 6.60 [2024-06-10 08:40:13,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.01 | bwd_microstep: 8433.24 | bwd_inner_microstep: 1101.23 | bwd_allreduce_microstep: 7331.94 | step_microstep: 38.85 [2024-06-10 08:40:14,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14899.13 | bwd: 47132.45 | bwd_inner: 39799.50 | bwd_allreduce: 7332.23 | step: 40.77 {'loss': 1.2294, 'learning_rate': 3.4365921175206e-05, 'epoch': 0.27} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 08:40:16,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 1475.62 | bwd_inner_microstep: 1475.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3982 [2024-06-10 08:40:18,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.88 | bwd_microstep: 1464.81 | bwd_inner_microstep: 1464.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 08:40:19,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1376.36 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852 [2024-06-10 08:40:22,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.92 | bwd_microstep: 1662.09 | bwd_inner_microstep: 1662.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786 [2024-06-10 08:40:24,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1648.61 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 08:40:26,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.45 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 08:40:27,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.21 | bwd_microstep: 682.30 | bwd_inner_microstep: 682.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 08:40:29,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-10 08:40:31,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690 [2024-06-10 08:40:32,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.94 | bwd_microstep: 1328.78 | bwd_inner_microstep: 1328.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 08:40:34,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1253.63 | bwd_inner_microstep: 1253.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 08:40:36,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.48 | bwd_microstep: 1530.80 | bwd_inner_microstep: 1530.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 08:40:38,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.25 | bwd_microstep: 1257.53 | bwd_inner_microstep: 1257.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902 [2024-06-10 08:40:39,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.71 | bwd_microstep: 689.98 | bwd_inner_microstep: 689.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 08:40:41,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 08:40:43,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1258.04 | bwd_inner_microstep: 1258.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3423 [2024-06-10 08:40:45,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 08:40:47,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.30 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538 [2024-06-10 08:40:49,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1422.52 | bwd_inner_microstep: 1422.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 08:40:50,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.20 | bwd_microstep: 799.36 | bwd_inner_microstep: 799.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611 [2024-06-10 08:40:52,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1309.96 | bwd_inner_microstep: 1309.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3744 [2024-06-10 08:40:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1560.98 | bwd_inner_microstep: 1560.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 08:40:56,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1399.19 | bwd_inner_microstep: 1399.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075 [2024-06-10 08:40:57,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.94 | bwd_microstep: 916.85 | bwd_inner_microstep: 916.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3946 [2024-06-10 08:41:00,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 693.88 | bwd_microstep: 1913.38 | bwd_inner_microstep: 1913.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3571 [2024-06-10 08:41:02,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.26 | bwd_microstep: 1561.21 | bwd_inner_microstep: 1561.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-10 08:41:03,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.98 | bwd_microstep: 920.88 | bwd_inner_microstep: 920.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 08:41:05,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1588.42 | bwd_inner_microstep: 1588.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-10 08:41:07,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.61 | bwd_microstep: 1643.79 | bwd_inner_microstep: 1643.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 08:41:10,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1507.14 | bwd_inner_microstep: 1507.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-10 08:41:11,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1405.06 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779 [2024-06-10 08:41:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 08:41:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.19 | bwd_microstep: 5040.80 | bwd_inner_microstep: 1823.34 | bwd_allreduce_microstep: 3217.41 | step_microstep: 38.50 [2024-06-10 08:41:17,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16275.47 | bwd: 47002.62 | bwd_inner: 43784.30 | bwd_allreduce: 3217.65 | step: 40.13 26%|██▋ | 457/1726 [7:57:44<21:44:29, 61.68s/it] 27%|██▋ | 458/1726 [7:58:44<21:35:42, 61.31s/it] 27%|██▋ | 458/1726 [7:58:44<21:35:42, 61.31s/it] 27%|██▋ | 459/1726 [7:59:45<21:31:29, 61.16s/it] 27%|██▋ | 459/1726 [7:59:45<21:31:29, 61.16s/it] 27%|██▋ | 460/1726 [8:00:46<21:33:09, 61.29s/it] 27%|██▋ | 460/1726 [8:00:46<21:33:09, 61.29s/it] 27%|██▋ | 461/1726 [8:01:48<21:33:23, 61.35s/it] 27%|██▋ | 461/1726 [8:01:48<21:33:23, 61.35s/it] 27%|██▋ | 462/1726 [8:02:50<21:38:55, 61.66s/it] 27%|██▋ | 462/1726 [8:02:50<21:38:55, 61.66s/it] 27%|██▋ | 463/1726 [8:03:54<{'loss': 1.2398, 'learning_rate': 3.43397821042337e-05, 'epoch': 0.27} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 08:41:19,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1366.90 | bwd_inner_microstep: 1366.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3989 [2024-06-10 08:41:21,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1434.89 | bwd_inner_microstep: 1434.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 08:41:23,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1546.38 | bwd_inner_microstep: 1546.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-10 08:41:24,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 794.92 | bwd_inner_microstep: 794.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 08:41:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.63 | bwd_microstep: 1434.46 | bwd_inner_microstep: 1434.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 08:41:28,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1377.77 | bwd_inner_microstep: 1377.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 08:41:30,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696 [2024-06-10 08:41:32,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1518.99 | bwd_inner_microstep: 1518.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 08:41:34,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1377.68 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987 [2024-06-10 08:41:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.41 | bwd_microstep: 710.57 | bwd_inner_microstep: 710.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3419 [2024-06-10 08:41:37,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3970 [2024-06-10 08:41:39,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.29 | bwd_microstep: 1694.24 | bwd_inner_microstep: 1694.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 08:41:41,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1380.50 | bwd_inner_microstep: 1380.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-10 08:41:43,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1410.78 | bwd_inner_microstep: 1410.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-10 08:41:44,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.89 | bwd_microstep: 976.32 | bwd_inner_microstep: 976.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-10 08:41:46,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1417.75 | bwd_inner_microstep: 1417.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972 [2024-06-10 08:41:47,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.79 | bwd_microstep: 703.30 | bwd_inner_microstep: 703.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490 [2024-06-10 08:41:49,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.38 | bwd_microstep: 1317.23 | bwd_inner_microstep: 1317.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 08:41:51,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1415.86 | bwd_inner_microstep: 1415.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-10 08:41:53,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.14 | bwd_microstep: 1327.64 | bwd_inner_microstep: 1327.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 08:41:55,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1285.07 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 08:41:56,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.64 | bwd_microstep: 1318.21 | bwd_inner_microstep: 1318.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 08:41:59,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.51 | bwd_microstep: 1556.34 | bwd_inner_microstep: 1556.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 08:42:01,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1387.80 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-10 08:42:02,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1313.51 | bwd_inner_microstep: 1313.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399 [2024-06-10 08:42:04,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1441.57 | bwd_inner_microstep: 1441.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-10 08:42:06,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2056 [2024-06-10 08:42:07,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.30 | bwd_microstep: 812.89 | bwd_inner_microstep: 812.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 08:42:09,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1493.76 | bwd_inner_microstep: 1493.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3624 [2024-06-10 08:42:11,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.54 | bwd_microstep: 1452.92 | bwd_inner_microstep: 1452.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2222 [2024-06-10 08:42:13,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.26 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 08:42:18,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 08:42:18,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.96 | bwd_microstep: 4459.00 | bwd_inner_microstep: 1755.89 | bwd_allreduce_microstep: 2703.06 | step_microstep: 38.17 [2024-06-10 08:42:18,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.66 | bwd: 44503.80 | bwd_inner: 41799.84 | bwd_allreduce: 2703.29 | step: 39.82 {'loss': 1.2896, 'learning_rate': 3.4313592528623384e-05, 'epoch': 0.27} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890 [2024-06-10 08:42:19,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.15 | bwd_microstep: 766.45 | bwd_inner_microstep: 766.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 08:42:20,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1277.51 | bwd_inner_microstep: 1277.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 08:42:23,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.62 | bwd_microstep: 1496.90 | bwd_inner_microstep: 1496.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3852 [2024-06-10 08:42:25,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.20 | bwd_microstep: 1518.48 | bwd_inner_microstep: 1518.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 08:42:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1396.60 | bwd_inner_microstep: 1396.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2455 [2024-06-10 08:42:28,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.56 | bwd_microstep: 948.98 | bwd_inner_microstep: 948.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 08:42:30,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1245.82 | bwd_inner_microstep: 1245.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 08:42:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1248.22 | bwd_inner_microstep: 1248.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 08:42:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.13 | bwd_microstep: 1615.24 | bwd_inner_microstep: 1615.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3652 [2024-06-10 08:42:36,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1481.96 | bwd_inner_microstep: 1481.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 08:42:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.14 | bwd_microstep: 1336.52 | bwd_inner_microstep: 1336.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3636 [2024-06-10 08:42:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 1576.18 | bwd_inner_microstep: 1576.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-10 08:42:41,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 707.44 | bwd_inner_microstep: 707.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471 [2024-06-10 08:42:42,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.69 | bwd_microstep: 1310.84 | bwd_inner_microstep: 1310.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 08:42:44,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1278.07 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-10 08:42:46,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1513.30 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634 [2024-06-10 08:42:48,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 08:42:50,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 08:42:52,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1284.02 | bwd_inner_microstep: 1283.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2493 [2024-06-10 08:42:53,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.19 | bwd_microstep: 1054.79 | bwd_inner_microstep: 1054.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 08:42:55,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.24 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 08:42:57,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.89 | bwd_microstep: 1259.94 | bwd_inner_microstep: 1259.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3598 [2024-06-10 08:42:59,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1437.70 | bwd_inner_microstep: 1437.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 08:43:01,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.93 | bwd_microstep: 1657.74 | bwd_inner_microstep: 1657.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 08:43:02,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.19 | bwd_microstep: 877.28 | bwd_inner_microstep: 877.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 08:43:04,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1411.98 | bwd_inner_microstep: 1411.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 08:43:06,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1432.74 | bwd_inner_microstep: 1432.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3598 [2024-06-10 08:43:09,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.09 | bwd_microstep: 1525.71 | bwd_inner_microstep: 1525.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3611 [2024-06-10 08:43:11,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1477.39 | bwd_inner_microstep: 1477.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3566 [2024-06-10 08:43:13,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.14 | bwd_microstep: 1459.44 | bwd_inner_microstep: 1459.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441 [2024-06-10 08:43:14,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 08:43:19,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 08:43:19,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 4105.19 | bwd_inner_microstep: 1686.75 | bwd_allreduce_microstep: 2418.39 | step_microstep: 37.93 [2024-06-10 08:43:19,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15983.56 | bwd: 45272.33 | bwd_inner: 42852.95 | bwd_allreduce: 2418.67 | step: 39.48 {'loss': 1.2771, 'learning_rate': 3.428735254061458e-05, 'epoch': 0.27} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479 [2024-06-10 08:43:21,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.34 | bwd_microstep: 1568.82 | bwd_inner_microstep: 1568.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3957 [2024-06-10 08:43:24,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.89 | bwd_microstep: 1697.71 | bwd_inner_microstep: 1697.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3865 [2024-06-10 08:43:26,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.58 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834 [2024-06-10 08:43:28,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.28 | bwd_microstep: 1653.07 | bwd_inner_microstep: 1653.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3768 [2024-06-10 08:43:30,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.18 | bwd_microstep: 1490.52 | bwd_inner_microstep: 1490.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3763 [2024-06-10 08:43:32,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.13 | bwd_microstep: 1609.04 | bwd_inner_microstep: 1609.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 08:43:34,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1414.28 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4119 [2024-06-10 08:43:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.33 | bwd_microstep: 1740.78 | bwd_inner_microstep: 1740.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 08:43:39,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.68 | bwd_microstep: 1516.36 | bwd_inner_microstep: 1516.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-10 08:43:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.16 | bwd_microstep: 891.38 | bwd_inner_microstep: 891.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 08:43:42,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1618.23 | bwd_inner_microstep: 1618.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 08:43:44,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.46 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 08:43:46,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.31 | bwd_microstep: 1344.38 | bwd_inner_microstep: 1344.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 08:43:48,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1346.83 | bwd_inner_microstep: 1346.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393 [2024-06-10 08:43:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1245.80 | bwd_inner_microstep: 1245.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 08:43:52,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1514.59 | bwd_inner_microstep: 1514.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 08:43:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1252.28 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 08:43:56,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.55 | bwd_microstep: 1346.81 | bwd_inner_microstep: 1346.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301 [2024-06-10 08:43:57,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.46 | bwd_microstep: 979.61 | bwd_inner_microstep: 979.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289 [2024-06-10 08:43:58,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.82 | bwd_microstep: 878.41 | bwd_inner_microstep: 878.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-10 08:44:00,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1440.14 | bwd_inner_microstep: 1440.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 08:44:02,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1555.75 | bwd_inner_microstep: 1555.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 08:44:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.79 | bwd_microstep: 805.02 | bwd_inner_microstep: 804.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 08:44:05,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.64 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 08:44:07,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1460.57 | bwd_inner_microstep: 1460.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2191 [2024-06-10 08:44:09,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.57 | bwd_microstep: 861.86 | bwd_inner_microstep: 861.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 08:44:11,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577 [2024-06-10 08:44:13,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.10 | bwd_microstep: 1522.92 | bwd_inner_microstep: 1522.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033 [2024-06-10 08:44:14,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.62 | bwd_microstep: 840.89 | bwd_inner_microstep: 840.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642 [2024-06-10 08:44:16,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.47 | bwd_microstep: 1352.47 | bwd_inner_microstep: 1352.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 08:44:18,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1411.20 | bwd_inner_microstep: 1411.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3780 [2024-06-10 08:44:22,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 08:44:22,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 3286.41 | bwd_inner_microstep: 1773.45 | bwd_allreduce_microstep: 1512.90 | step_microstep: 38.28 [2024-06-10 08:44:22,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16370.00 | bwd: 45730.99 | bwd_inner: 44217.18 | bwd_allreduce: 1513.13 | step: 40.05 {'loss': 1.2856, 'learning_rate': 3.4261062232624405e-05, 'epoch': 0.27} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 08:44:24,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1469.81 | bwd_inner_microstep: 1469.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 08:44:25,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1290.62 | bwd_inner_microstep: 1290.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3863 [2024-06-10 08:44:28,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.71 | bwd_microstep: 1510.62 | bwd_inner_microstep: 1510.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 08:44:29,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.36 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 08:44:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1436.08 | bwd_inner_microstep: 1436.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 08:44:33,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1382.71 | bwd_inner_microstep: 1382.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-10 08:44:35,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1530.63 | bwd_inner_microstep: 1530.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-10 08:44:37,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1211.14 | bwd_inner_microstep: 1211.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942 [2024-06-10 08:44:38,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.56 | bwd_microstep: 884.74 | bwd_inner_microstep: 884.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491 [2024-06-10 08:44:40,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1580.41 | bwd_inner_microstep: 1580.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3016 [2024-06-10 08:44:42,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.70 | bwd_microstep: 1130.03 | bwd_inner_microstep: 1130.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850 [2024-06-10 08:44:44,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1562.46 | bwd_inner_microstep: 1562.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028 [2024-06-10 08:44:45,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.46 | bwd_microstep: 906.00 | bwd_inner_microstep: 905.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3497 [2024-06-10 08:44:47,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.28 | bwd_microstep: 1332.06 | bwd_inner_microstep: 1332.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-10 08:44:49,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1319.62 | bwd_inner_microstep: 1319.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1980 [2024-06-10 08:44:50,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 706.97 | bwd_inner_microstep: 706.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624 [2024-06-10 08:44:52,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.26 | bwd_microstep: 1444.17 | bwd_inner_microstep: 1444.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829 [2024-06-10 08:44:54,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1586.44 | bwd_inner_microstep: 1586.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3724 [2024-06-10 08:44:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1465.58 | bwd_inner_microstep: 1465.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3694 [2024-06-10 08:44:58,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.69 | bwd_microstep: 1397.64 | bwd_inner_microstep: 1397.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 08:45:00,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1297.35 | bwd_inner_microstep: 1297.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 08:45:02,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.86 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 08:45:04,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1391.92 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 08:45:06,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1398.22 | bwd_inner_microstep: 1398.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3638 [2024-06-10 08:45:08,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1346.97 | bwd_inner_microstep: 1346.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 08:45:10,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599 [2024-06-10 08:45:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.62 | bwd_microstep: 1570.13 | bwd_inner_microstep: 1570.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 08:45:14,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.77 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044 [2024-06-10 08:45:15,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.45 | bwd_microstep: 906.44 | bwd_inner_microstep: 906.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3810 [2024-06-10 08:45:17,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.99 | bwd_microstep: 1723.88 | bwd_inner_microstep: 1723.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 08:45:19,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1506.73 | bwd_inner_microstep: 1506.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2116 [2024-06-10 08:45:24,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-10 08:45:24,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.21 | bwd_microstep: 3897.44 | bwd_inner_microstep: 1082.48 | bwd_allreduce_microstep: 2814.92 | step_microstep: 37.81 [2024-06-10 08:45:24,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15960.73 | bwd: 45661.96 | bwd_inner: 42846.10 | bwd_allreduce: 2815.16 | step: 39.52 {'loss': 1.3667, 'learning_rate': 3.423472169724719e-05, 'epoch': 0.27} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397 [2024-06-10 08:45:26,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.46 | bwd_microstep: 1393.77 | bwd_inner_microstep: 1393.54 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-10 08:45:27,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.54 | bwd_microstep: 786.81 | bwd_inner_microstep: 786.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864 [2024-06-10 08:45:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 1559.56 | bwd_inner_microstep: 1559.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 08:45:31,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1484.36 | bwd_inner_microstep: 1484.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 08:45:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 08:45:35,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1428.61 | bwd_inner_microstep: 1428.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732 [2024-06-10 08:45:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.96 | bwd_microstep: 1630.06 | bwd_inner_microstep: 1630.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188 [2024-06-10 08:45:38,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.51 | bwd_microstep: 955.13 | bwd_inner_microstep: 955.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3707 [2024-06-10 08:45:40,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1557.85 | bwd_inner_microstep: 1557.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1885 [2024-06-10 08:45:41,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.04 | bwd_microstep: 684.34 | bwd_inner_microstep: 684.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 08:45:43,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 796.20 | bwd_inner_microstep: 796.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-10 08:45:44,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 08:45:46,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466 [2024-06-10 08:45:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.50 | bwd_microstep: 1196.29 | bwd_inner_microstep: 1196.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2608 [2024-06-10 08:45:49,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.10 | bwd_microstep: 1204.64 | bwd_inner_microstep: 1204.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638 [2024-06-10 08:45:52,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.57 | bwd_microstep: 1711.10 | bwd_inner_microstep: 1711.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461 [2024-06-10 08:45:54,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.27 | bwd_microstep: 1312.07 | bwd_inner_microstep: 1312.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 08:45:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-10 08:45:58,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.11 | bwd_microstep: 1528.42 | bwd_inner_microstep: 1528.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2010 [2024-06-10 08:45:59,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.67 | bwd_microstep: 832.44 | bwd_inner_microstep: 832.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291 [2024-06-10 08:46:00,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.04 | bwd_microstep: 913.13 | bwd_inner_microstep: 913.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 08:46:03,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.47 | bwd_microstep: 1659.48 | bwd_inner_microstep: 1659.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 08:46:05,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1658.93 | bwd_inner_microstep: 1658.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 08:46:07,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.26 | bwd_microstep: 1746.77 | bwd_inner_microstep: 1746.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 08:46:09,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1400.27 | bwd_inner_microstep: 1400.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3428 [2024-06-10 08:46:11,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.78 | bwd_microstep: 1543.22 | bwd_inner_microstep: 1543.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-10 08:46:13,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1535.03 | bwd_inner_microstep: 1535.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476 [2024-06-10 08:46:15,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 1441.98 | bwd_inner_microstep: 1441.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574 [2024-06-10 08:46:17,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.91 | bwd_microstep: 1426.71 | bwd_inner_microstep: 1426.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3568 [2024-06-10 08:46:19,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1529.38 | bwd_inner_microstep: 1529.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-10 08:46:21,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.21 | bwd_microstep: 969.17 | bwd_inner_microstep: 969.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 08:46:24,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 08:46:24,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.72 | bwd_microstep: 2608.93 | bwd_inner_microstep: 1334.03 | bwd_allreduce_microstep: 1274.85 | step_microstep: 37.91 [2024-06-10 08:46:24,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15911.53 | bwd: 43998.87 | bwd_inner: 42722.93 | bwd_allreduce: 1275.16 | step: 39.54 {'loss': 1.3183, 'learning_rate': 3.420833102725415e-05, 'epoch': 0.27} 21:50:23, 62.25s/it] 27%|██▋ | 463/1726 [8:03:54<21:50:23, 62.25s/it] 27%|██▋ | 464/1726 [8:04:54<21:38:07, 61.72s/it] 27%|██▋ | 464/1726 [8:04:54<21:38:07, 61.72s/it] 27%|██▋ | 465/1726 [8:05:56<21:36:16, 61.68s/it] 27%|██▋ | 465/1726 [8:05:56<21:36:16, 61.68s/it] 27%|██▋ | 466/1726 [8:06:58<21:40:09, 61.91s/it] 27%|██▋ | 466/1726 [8:06:58<21:40:09, 61.91s/it] 27%|██▋ | 467/1726 [8:08:00<21:39:30, 61.93s/it] 27%|██▋ | 467/1726 [8:08:00<21:39:30, 61.93s/it] 27%|██▋ | 468/1726 [8:09:01<21:27:59, 61.43s/it] 27%|██▋ | 468/1726 [8:09:01> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600 [INFO|configuration_utils.py:473] 2024-06-10 11:02:03,018 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/config.json [INFO|configuration_utils.py:594] 2024-06-10 11:02:03,020 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 11:02:11,553 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 11:02:11,564 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 11:02:11,566 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 11:02:11,567 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/added_tokens.json [2024-06-10 11:02:11,786] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step600 is about to be saved! [2024-06-10 11:02:11,798] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt [2024-06-10 11:02:11,798] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt... [2024-06-10 11:02:21,370] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/mp_rank_00_model_states.pt. [2024-06-10 11:02:21,377] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 11:02:34,521] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 11:02:34,530] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-600/global_step600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 11:02:34,530] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step600 is ready now! dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468 [2024-06-10 11:02:36,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.05 | bwd_microstep: 1563.68 | bwd_inner_microstep: 1563.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3954 [2024-06-10 11:02:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.78 | bwd_microstep: 1588.42 | bwd_inner_microstep: 1588.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 11:02:41,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.58 | bwd_microstep: 1375.74 | bwd_inner_microstep: 1375.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3835 [2024-06-10 11:02:42,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1381.12 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 11:02:45,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1536.59 | bwd_inner_microstep: 1536.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 11:02:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1540.04 | bwd_inner_microstep: 1540.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 11:02:48,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.66 | bwd_microstep: 1244.45 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 11:02:50,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1385.09 | bwd_inner_microstep: 1385.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 11:02:52,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4003 [2024-06-10 11:02:54,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1508.84 | bwd_inner_microstep: 1508.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1913 [2024-06-10 11:02:55,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.45 | bwd_microstep: 839.12 | bwd_inner_microstep: 839.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990 [2024-06-10 11:02:57,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.37 | bwd_microstep: 897.14 | bwd_inner_microstep: 897.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-10 11:02:59,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1409.03 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 11:03:01,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674 [2024-06-10 11:03:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.05 | bwd_microstep: 1714.16 | bwd_inner_microstep: 1714.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 11:03:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1288.75 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 11:03:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1157.09 | bwd_inner_microstep: 1157.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 11:03:08,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 11:03:10,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1393.64 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837 [2024-06-10 11:03:12,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.14 | bwd_microstep: 1358.51 | bwd_inner_microstep: 1358.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676 [2024-06-10 11:03:14,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1625.20 | bwd_inner_microstep: 1625.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 11:03:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1413.53 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2306 [2024-06-10 11:03:17,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.36 | bwd_microstep: 983.06 | bwd_inner_microstep: 983.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4171 [2024-06-10 11:03:20,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.33 | bwd_microstep: 1556.34 | bwd_inner_microstep: 1556.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 11:03:22,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.60 | bwd_microstep: 1462.54 | bwd_inner_microstep: 1462.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377 [2024-06-10 11:03:23,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1241.69 | bwd_inner_microstep: 1241.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3584 [2024-06-10 11:03:26,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.29 | bwd_microstep: 1698.36 | bwd_inner_microstep: 1698.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041 [2024-06-10 11:03:27,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.82 | bwd_microstep: 904.10 | bwd_inner_microstep: 904.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 11:03:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.29 | bwd_microstep: 1449.35 | bwd_inner_microstep: 1449.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 11:03:31,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 1400.85 | bwd_inner_microstep: 1400.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 11:03:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1385.89 | bwd_inner_microstep: 1385.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 11:03:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-10 11:03:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.32 | bwd_microstep: 1624.41 | bwd_inner_microstep: 1472.14 | bwd_allreduce_microstep: 152.22 | step_microstep: 37.92 [2024-06-10 11:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16391.34 | bwd: 43996.48 | bwd_inner: 43843.30 | bwd_allreduce: 152.47 | step: 39.48 {'loss': 1.2789, 'learning_rate': 3.029110800057258e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 11:03:37,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1336.94 | bwd_inner_microstep: 1336.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4020 [2024-06-10 11:03:39,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1545.41 | bwd_inner_microstep: 1545.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 11:03:41,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 11:03:43,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1249.34 | bwd_inner_microstep: 1249.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 11:03:44,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1283.35 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 11:03:46,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 11:03:48,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1532.28 | bwd_inner_microstep: 1532.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747 [2024-06-10 11:03:50,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1536.95 | bwd_inner_microstep: 1536.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893 [2024-06-10 11:03:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 11:03:53,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995 [2024-06-10 11:03:55,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 895.86 | bwd_inner_microstep: 895.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 11:03:56,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-10 11:03:58,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.22 | bwd_microstep: 1522.76 | bwd_inner_microstep: 1522.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2777 [2024-06-10 11:04:00,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.53 | bwd_microstep: 1146.52 | bwd_inner_microstep: 1146.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 11:04:02,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 11:04:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.25 | bwd_microstep: 1447.23 | bwd_inner_microstep: 1447.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 11:04:06,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 11:04:07,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.04 | bwd_microstep: 807.27 | bwd_inner_microstep: 807.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3631 [2024-06-10 11:04:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.21 | bwd_microstep: 1552.07 | bwd_inner_microstep: 1552.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 11:04:11,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.81 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697 [2024-06-10 11:04:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.35 | bwd_microstep: 1425.17 | bwd_inner_microstep: 1425.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 11:04:14,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 11:04:16,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.14 | bwd_microstep: 1558.83 | bwd_inner_microstep: 1558.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 11:04:18,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 11:04:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.91 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-10 11:04:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 11:04:23,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.91 | bwd_microstep: 802.84 | bwd_inner_microstep: 802.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 11:04:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1509.38 | bwd_inner_microstep: 1509.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-10 11:04:27,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1443.32 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3579 [2024-06-10 11:04:29,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1333.34 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 11:04:31,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.67 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3787 [2024-06-10 11:04:35,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.24 | optimizer_step: 6.64 [2024-06-10 11:04:35,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.56 | bwd_microstep: 3656.88 | bwd_inner_microstep: 1523.48 | bwd_allreduce_microstep: 2133.34 | step_microstep: 38.34 [2024-06-10 11:04:35,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15735.26 | bwd: 44236.27 | bwd_inner: 42102.01 | bwd_allreduce: 2133.57 | step: 41.84 {'loss': 1.2564, 'learning_rate': 3.025890613293557e-05, 'epoch': 0.35} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 11:04:37,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1474.18 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3914 [2024-06-10 11:04:40,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2395 [2024-06-10 11:04:41,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.03 | bwd_microstep: 1000.68 | bwd_inner_microstep: 1000.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3762 [2024-06-10 11:04:43,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1371.81 | bwd_inner_microstep: 1371.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3777 [2024-06-10 11:04:45,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.08 | bwd_microstep: 1571.73 | bwd_inner_microstep: 1571.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3421 [2024-06-10 11:04:47,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.02 | bwd_microstep: 1216.44 | bwd_inner_microstep: 1216.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2087 [2024-06-10 11:04:48,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.50 | bwd_microstep: 727.46 | bwd_inner_microstep: 727.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 11:04:50,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 11:04:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075 [2024-06-10 11:04:52,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.35 | bwd_microstep: 818.83 | bwd_inner_microstep: 818.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-10 11:04:54,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1417.88 | bwd_inner_microstep: 1417.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 11:04:56,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.47 | bwd_microstep: 1290.33 | bwd_inner_microstep: 1290.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3495 [2024-06-10 11:04:58,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.16 | bwd_microstep: 1345.85 | bwd_inner_microstep: 1345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470 [2024-06-10 11:05:00,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1344.31 | bwd_inner_microstep: 1344.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 11:05:02,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1495.04 | bwd_inner_microstep: 1495.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2760 [2024-06-10 11:05:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.86 | bwd_microstep: 1142.31 | bwd_inner_microstep: 1142.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607 [2024-06-10 11:05:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1703.31 | bwd_inner_microstep: 1703.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 11:05:08,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1651.03 | bwd_inner_microstep: 1651.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 11:05:09,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2070 [2024-06-10 11:05:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.59 | bwd_microstep: 724.56 | bwd_inner_microstep: 724.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671 [2024-06-10 11:05:12,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1329.68 | bwd_inner_microstep: 1329.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 11:05:13,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.69 | bwd_microstep: 804.20 | bwd_inner_microstep: 804.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 11:05:15,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.45 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-10 11:05:17,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 976.20 | bwd_inner_microstep: 976.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 11:05:19,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.13 | bwd_microstep: 1503.72 | bwd_inner_microstep: 1503.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 11:05:21,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.10 | bwd_microstep: 1438.40 | bwd_inner_microstep: 1438.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 11:05:22,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.62 | bwd_microstep: 1183.52 | bwd_inner_microstep: 1183.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 11:05:24,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.84 | bwd_microstep: 1557.53 | bwd_inner_microstep: 1557.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 11:05:27,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.49 | bwd_microstep: 1646.08 | bwd_inner_microstep: 1646.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2952 [2024-06-10 11:05:28,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.31 | bwd_microstep: 1100.82 | bwd_inner_microstep: 1100.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441 [2024-06-10 11:05:30,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3463 [2024-06-10 11:05:39,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.47 | optimizer_step: 6.61 [2024-06-10 11:05:39,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.56 | bwd_microstep: 8152.17 | bwd_inner_microstep: 1777.13 | bwd_allreduce_microstep: 6374.97 | step_microstep: 40.20 [2024-06-10 11:05:39,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15431.67 | bwd: 47781.04 | bwd_inner: 41404.98 | bwd_allreduce: 6375.29 | step: 41.76 {'loss': 1.2539, 'learning_rate': 3.0226668133484494e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 11:05:41,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1336.73 | bwd_inner_microstep: 1336.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2030 [2024-06-10 11:05:42,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.74 | bwd_microstep: 715.15 | bwd_inner_microstep: 715.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-10 11:05:44,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1557.92 | bwd_inner_microstep: 1557.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854 [2024-06-10 11:05:46,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.88 | bwd_microstep: 1658.71 | bwd_inner_microstep: 1658.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 11:05:48,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1277.36 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 11:05:49,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.77 | bwd_microstep: 677.15 | bwd_inner_microstep: 677.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 11:05:51,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.02 | bwd_microstep: 1292.28 | bwd_inner_microstep: 1292.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 11:05:53,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1386.62 | bwd_inner_microstep: 1386.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 11:05:55,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.61 | bwd_microstep: 1477.33 | bwd_inner_microstep: 1477.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408 [2024-06-10 11:05:56,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.68 | bwd_microstep: 1278.75 | bwd_inner_microstep: 1278.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2948 [2024-06-10 11:05:58,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.62 | bwd_microstep: 1190.89 | bwd_inner_microstep: 1190.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445 [2024-06-10 11:06:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.78 | bwd_microstep: 1376.24 | bwd_inner_microstep: 1376.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2151 [2024-06-10 11:06:01,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.15 | bwd_microstep: 878.05 | bwd_inner_microstep: 878.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502 [2024-06-10 11:06:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1580.34 | bwd_inner_microstep: 1580.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-10 11:06:06,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.59 | bwd_microstep: 1562.25 | bwd_inner_microstep: 1562.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 11:06:07,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1285.53 | bwd_inner_microstep: 1285.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 11:06:09,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.90 | bwd_microstep: 1254.72 | bwd_inner_microstep: 1254.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 11:06:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 973.76 | bwd_inner_microstep: 973.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3836 [2024-06-10 11:06:12,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1386.09 | bwd_inner_microstep: 1386.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1973 [2024-06-10 11:06:13,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.65 | bwd_microstep: 704.05 | bwd_inner_microstep: 704.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 11:06:15,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1346.94 | bwd_inner_microstep: 1346.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3530 [2024-06-10 11:06:17,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.62 | bwd_microstep: 1655.00 | bwd_inner_microstep: 1654.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3822 [2024-06-10 11:06:20,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.73 | bwd_microstep: 1749.33 | bwd_inner_microstep: 1749.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 11:06:22,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1552.35 | bwd_inner_microstep: 1552.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042 [2024-06-10 11:06:23,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.46 | bwd_microstep: 811.60 | bwd_inner_microstep: 811.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 11:06:25,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 11:06:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.49 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 11:06:29,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1495.98 | bwd_inner_microstep: 1495.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3760 [2024-06-10 11:06:31,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 11:06:33,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1285.73 | bwd_inner_microstep: 1285.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190 [2024-06-10 11:06:34,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.43 | bwd_microstep: 858.96 | bwd_inner_microstep: 858.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2225 [2024-06-10 11:06:43,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.43 | optimizer_step: 6.61 [2024-06-10 11:06:43,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.50 | bwd_microstep: 8967.64 | bwd_inner_microstep: 1010.30 | bwd_allreduce_microstep: 7957.27 | step_microstep: 40.13 [2024-06-10 11:06:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15295.82 | bwd: 48848.81 | bwd_inner: 40890.60 | bwd_allreduce: 7957.52 | step: 41.86 {'loss': 1.2737, 'learning_rate': 3.0194394115761415e-05, 'epoch': 0.35} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-10 11:06:45,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1442.45 | bwd_inner_microstep: 1442.32 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 11:06:47,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.06 | bwd_microstep: 1339.52 | bwd_inner_microstep: 1339.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2350 [2024-06-10 11:06:49,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.35 | bwd_microstep: 984.50 | bwd_inner_microstep: 984.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 11:06:50,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.83 | bwd_microstep: 1370.04 | bwd_inner_microstep: 1370.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473 [2024-06-10 11:06:52,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.12 | bwd_microstep: 1438.44 | bwd_inner_microstep: 1438.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 11:06:54,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1374.54 | bwd_inner_microstep: 1374.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 11:06:56,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1340.66 | bwd_inner_microstep: 1340.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 11:06:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-10 11:06:59,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 794.98 | bwd_inner_microstep: 794.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 11:07:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.48 | bwd_microstep: 795.67 | bwd_inner_microstep: 795.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 11:07:02,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1380.56 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490 [2024-06-10 11:07:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.89 | bwd_microstep: 1219.86 | bwd_inner_microstep: 1219.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1907 [2024-06-10 11:07:05,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.43 | bwd_microstep: 718.03 | bwd_inner_microstep: 718.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 11:07:07,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1379.01 | bwd_inner_microstep: 1378.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 11:07:09,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1446.29 | bwd_inner_microstep: 1446.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-10 11:07:11,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.75 | bwd_microstep: 1632.52 | bwd_inner_microstep: 1632.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 11:07:13,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1457.79 | bwd_inner_microstep: 1457.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 11:07:15,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1483.10 | bwd_inner_microstep: 1483.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3476 [2024-06-10 11:07:17,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.57 | bwd_microstep: 1249.33 | bwd_inner_microstep: 1249.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 11:07:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1556.88 | bwd_inner_microstep: 1556.66 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.18 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 11:07:21,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1285.79 | bwd_inner_microstep: 1285.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3439 [2024-06-10 11:07:22,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.94 | bwd_microstep: 1156.26 | bwd_inner_microstep: 1156.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528 [2024-06-10 11:07:24,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1326.04 | bwd_inner_microstep: 1326.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 11:07:26,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.29 | bwd_microstep: 1461.98 | bwd_inner_microstep: 1461.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548 [2024-06-10 11:07:28,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.22 | bwd_microstep: 1329.35 | bwd_inner_microstep: 1329.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 11:07:30,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1556.93 | bwd_inner_microstep: 1556.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1891 [2024-06-10 11:07:31,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.84 | bwd_microstep: 794.04 | bwd_inner_microstep: 794.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467 [2024-06-10 11:07:33,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.59 | bwd_microstep: 1190.49 | bwd_inner_microstep: 1190.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3558 [2024-06-10 11:07:35,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.60 | bwd_microstep: 1662.47 | bwd_inner_microstep: 1662.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3811 [2024-06-10 11:07:38,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.34 | bwd_microstep: 1791.06 | bwd_inner_microstep: 1791.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583 [2024-06-10 11:07:40,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.82 | bwd_microstep: 1696.31 | bwd_inner_microstep: 1696.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 11:07:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.34 | optimizer_step: 6.56 [2024-06-10 11:07:46,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.53 | bwd_microstep: 5471.69 | bwd_inner_microstep: 1451.93 | bwd_allreduce_microstep: 4019.68 | step_microstep: 38.82 [2024-06-10 11:07:46,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15871.01 | bwd: 46472.43 | bwd_inner: 42451.54 | bwd_allreduce: 4020.05 | step: 40.86 {'loss': 1.2413, 'learning_rate': 3.0162084193435257e-05, 'epoch': 0.35} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5043 [2024-06-10 11:07:49,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 727.55 | bwd_microstep: 1959.36 | bwd_inner_microstep: 1959.14 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.17 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470 [2024-06-10 11:07:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1439.41 | bwd_inner_microstep: 1439.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903 [2024-06-10 11:07:53,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1588.13 | bwd_inner_microstep: 1588.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 11:07:55,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.21 | bwd_microstep: 1242.89 | bwd_inner_microstep: 1242.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 11:07:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.73 | bwd_microstep: 1276.63 | bwd_inner_microstep: 1276.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419 [2024-06-10 11:07:58,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1314.03 | bwd_inner_microstep: 1314.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3432 [2024-06-10 11:08:00,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1188.50 | bwd_inner_microstep: 1188.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 11:08:02,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-10 11:08:04,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1309.72 | bwd_inner_microstep: 1309.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 11:08:06,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.27 | bwd_microstep: 1419.30 | bwd_inner_microstep: 1419.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2466 [2024-06-10 11:08:07,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.15 | bwd_microstep: 929.48 | bwd_inner_microstep: 929.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504 [2024-06-10 11:08:09,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.32 | bwd_microstep: 1447.41 | bwd_inner_microstep: 1447.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2427 [2024-06-10 11:08:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.00 | bwd_microstep: 1040.30 | bwd_inner_microstep: 1040.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 11:08:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1384.49 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633 [2024-06-10 11:08:14,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 11:08:16,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.02 | bwd_microstep: 1362.92 | bwd_inner_microstep: 1362.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660 [2024-06-10 11:08:18,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1421.53 | bwd_inner_microstep: 1421.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 11:08:20,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.88 | bwd_microstep: 1605.37 | bwd_inner_microstep: 1605.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 11:08:22,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1395.87 | bwd_inner_microstep: 1395.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 11:08:24,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1499.02 | bwd_inner_microstep: 1499.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536 [2024-06-10 11:08:26,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.83 | bwd_microstep: 1202.31 | bwd_inner_microstep: 1202.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 11:08:28,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.01 | bwd_microstep: 1290.89 | bwd_inner_microstep: 1290.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2239 [2024-06-10 11:08:29,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 899.91 | bwd_inner_microstep: 899.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.30 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817 [2024-06-10 11:08:31,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.54 | bwd_microstep: 1628.41 | bwd_inner_microstep: 1628.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718 [2024-06-10 11:08:33,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 11:08:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1258.48 | bwd_inner_microstep: 1258.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 11:08:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.48 | bwd_microstep: 1567.32 | bwd_inner_microstep: 1567.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 11:08:38,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 731.02 | bwd_inner_microstep: 730.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-10 11:08:40,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.08 | bwd_microstep: 1204.83 | bwd_inner_microstep: 1204.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400 [2024-06-10 11:08:42,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.66 | bwd_microstep: 1439.46 | bwd_inner_microstep: 1439.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2905 [2024-06-10 11:08:43,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.20 | bwd_microstep: 1094.95 | bwd_inner_microstep: 1094.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 11:08:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 11:08:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.75 | bwd_microstep: 4647.53 | bwd_inner_microstep: 1861.01 | bwd_allreduce_microstep: 2786.47 | step_microstep: 38.44 [2024-06-10 11:08:49,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.38 | bwd: 45981.58 | bwd_inner: 43194.02 | bwd_allreduce: 2786.79 | step: 41.54 35%|███▍ | 601/1726 [10:26:12<22:27:55, 71.89s/it] 35%|███▍ | 601/1726 [10:26:12<22:27:55, 71.89s/it] 35%|███▍ | 602/1726 [10:27:12<21:21:46, 68.42s/it] 35%|███▍ | 602/1726 [10:27:12<21:21:46, 68.42s/it] 35%|███▍ | 603/1726 [10:28:16<20:53:21, 66.96s/it] 35%|███▍ | 603/1726 [10:28:16<20:53:21, 66.96s/it] 35%|███▍ | 604/1726 [10:29:20<20:38:25, 66.23s/it] 35%|███▍ | 604/1726 [10:29:20<20:38:25, 66.23s/it] 35%|███▌ | 605/1726 [10:30:23<20:17:37, 65.17s/it] 35%|███▌ | 605/1726 [10:30:23<20:17:37, 65.17s/it] 35%|███▌ | 606/1726 [10:31:25<20:01:27, 64.36s/it] {'loss': 1.2677, 'learning_rate': 3.0129738480301398e-05, 'epoch': 0.35} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-10 11:08:51,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.21 | bwd_microstep: 1444.91 | bwd_inner_microstep: 1444.71 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903 [2024-06-10 11:08:53,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.43 | bwd_microstep: 1482.53 | bwd_inner_microstep: 1482.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882 [2024-06-10 11:08:55,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.34 | bwd_microstep: 1479.15 | bwd_inner_microstep: 1479.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 11:08:57,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1651.98 | bwd_inner_microstep: 1651.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 11:08:59,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 11:09:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1370 [2024-06-10 11:09:01,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.37 | bwd_microstep: 519.92 | bwd_inner_microstep: 519.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-10 11:09:03,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1317.44 | bwd_inner_microstep: 1317.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3744 [2024-06-10 11:09:05,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1500.12 | bwd_inner_microstep: 1500.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 11:09:07,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958 [2024-06-10 11:09:08,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 797.55 | bwd_inner_microstep: 797.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3684 [2024-06-10 11:09:10,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.14 | bwd_microstep: 1572.06 | bwd_inner_microstep: 1572.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3676 [2024-06-10 11:09:12,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1407.17 | bwd_inner_microstep: 1407.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3652 [2024-06-10 11:09:14,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.29 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2672 [2024-06-10 11:09:16,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.31 | bwd_microstep: 1025.09 | bwd_inner_microstep: 1025.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 11:09:18,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1380.13 | bwd_inner_microstep: 1380.02 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.16 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 11:09:19,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.66 | bwd_microstep: 819.69 | bwd_inner_microstep: 819.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832 [2024-06-10 11:09:21,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.34 | bwd_microstep: 1588.08 | bwd_inner_microstep: 1588.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 11:09:22,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.92 | bwd_microstep: 881.36 | bwd_inner_microstep: 881.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304 [2024-06-10 11:09:24,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.80 | bwd_microstep: 977.68 | bwd_inner_microstep: 977.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3601 [2024-06-10 11:09:26,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 1370.67 | bwd_inner_microstep: 1370.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3462 [2024-06-10 11:09:28,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1424.39 | bwd_inner_microstep: 1424.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 11:09:29,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1384.12 | bwd_inner_microstep: 1384.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-10 11:09:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.29 | bwd_microstep: 977.50 | bwd_inner_microstep: 977.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3613 [2024-06-10 11:09:33,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1575.56 | bwd_inner_microstep: 1575.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 11:09:35,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.07 | bwd_microstep: 1596.88 | bwd_inner_microstep: 1596.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 11:09:37,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1605.27 | bwd_inner_microstep: 1603.87 | bwd_allreduce_microstep: 0.12 | step_microstep: 0.14 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529 [2024-06-10 11:09:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.57 | bwd_microstep: 1424.84 | bwd_inner_microstep: 1424.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 11:09:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.03 | bwd_microstep: 1612.61 | bwd_inner_microstep: 1612.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 11:09:43,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.36 | bwd_microstep: 1281.32 | bwd_inner_microstep: 1281.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2225 [2024-06-10 11:09:45,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.00 | bwd_microstep: 896.27 | bwd_inner_microstep: 896.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 11:09:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.37 | optimizer_step: 6.63 [2024-06-10 11:09:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.78 | bwd_microstep: 5121.25 | bwd_inner_microstep: 1658.01 | bwd_allreduce_microstep: 3463.17 | step_microstep: 38.99 [2024-06-10 11:09:50,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.36 | bwd: 45646.49 | bwd_inner: 42181.95 | bwd_allreduce: 3463.65 | step: 41.30 {'loss': 1.3236, 'learning_rate': 3.0097357090281267e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 11:09:52,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1337.58 | bwd_inner_microstep: 1337.32 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 11:09:53,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.93 | bwd_microstep: 778.71 | bwd_inner_microstep: 778.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 11:09:55,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 11:09:57,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.63 | bwd_microstep: 1481.45 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 11:09:59,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1642.14 | bwd_inner_microstep: 1642.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 11:10:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.60 | bwd_microstep: 1652.92 | bwd_inner_microstep: 1652.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895 [2024-06-10 11:10:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.60 | bwd_microstep: 684.26 | bwd_inner_microstep: 684.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 11:10:05,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 11:10:07,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.45 | bwd_microstep: 1538.43 | bwd_inner_microstep: 1538.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609 [2024-06-10 11:10:09,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1312.76 | bwd_inner_microstep: 1312.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 11:10:10,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.10 | bwd_microstep: 682.58 | bwd_inner_microstep: 682.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 11:10:11,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1296.06 | bwd_inner_microstep: 1296.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3014 [2024-06-10 11:10:13,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.96 | bwd_microstep: 1263.76 | bwd_inner_microstep: 1263.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 11:10:15,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.28 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 11:10:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424 [2024-06-10 11:10:19,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1541.27 | bwd_inner_microstep: 1541.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 11:10:21,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.32 | bwd_microstep: 1355.94 | bwd_inner_microstep: 1355.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3628 [2024-06-10 11:10:23,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1347.92 | bwd_inner_microstep: 1347.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 11:10:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1295.09 | bwd_inner_microstep: 1295.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 11:10:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1283.39 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634 [2024-06-10 11:10:28,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.22 | bwd_microstep: 1316.51 | bwd_inner_microstep: 1316.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3812 [2024-06-10 11:10:30,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.34 | bwd_microstep: 1293.97 | bwd_inner_microstep: 1293.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 11:10:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1187.74 | bwd_inner_microstep: 1187.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 11:10:34,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1533.44 | bwd_inner_microstep: 1533.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1982 [2024-06-10 11:10:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.71 | bwd_microstep: 769.50 | bwd_inner_microstep: 769.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3485 [2024-06-10 11:10:37,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1439.35 | bwd_inner_microstep: 1439.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3716 [2024-06-10 11:10:39,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1558.88 | bwd_inner_microstep: 1558.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-10 11:10:41,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.34 | bwd_microstep: 1605.93 | bwd_inner_microstep: 1605.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 11:10:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.58 | bwd_microstep: 1648.47 | bwd_inner_microstep: 1648.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 11:10:45,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.23 | bwd_microstep: 1450.72 | bwd_inner_microstep: 1450.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 11:10:47,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1510.20 | bwd_inner_microstep: 1510.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3743 [2024-06-10 11:10:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 11:10:52,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.73 | bwd_microstep: 3528.52 | bwd_inner_microstep: 1971.90 | bwd_allreduce_microstep: 1556.56 | step_microstep: 38.07 [2024-06-10 11:10:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16134.88 | bwd: 44877.23 | bwd_inner: 43319.55 | bwd_allreduce: 1556.89 | step: 39.76 {'loss': 1.236, 'learning_rate': 3.006494013742196e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 11:10:54,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.57 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4079 [2024-06-10 11:10:56,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1453.93 | bwd_inner_microstep: 1453.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-10 11:10:57,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1308.66 | bwd_inner_microstep: 1308.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 11:10:59,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1378.39 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 11:11:02,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.12 | bwd_microstep: 1648.96 | bwd_inner_microstep: 1648.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 11:11:03,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3396 [2024-06-10 11:11:05,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.31 | bwd_microstep: 1211.61 | bwd_inner_microstep: 1211.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416 [2024-06-10 11:11:07,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.24 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064 [2024-06-10 11:11:08,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.84 | bwd_microstep: 819.05 | bwd_inner_microstep: 819.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 11:11:10,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1385.31 | bwd_inner_microstep: 1385.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488 [2024-06-10 11:11:12,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1346.18 | bwd_inner_microstep: 1346.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 11:11:14,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.78 | bwd_microstep: 1351.71 | bwd_inner_microstep: 1351.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 11:11:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1475.28 | bwd_inner_microstep: 1475.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 11:11:17,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 677.95 | bwd_inner_microstep: 677.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1952 [2024-06-10 11:11:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.27 | bwd_microstep: 920.60 | bwd_inner_microstep: 920.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2497 [2024-06-10 11:11:19,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.63 | bwd_microstep: 986.63 | bwd_inner_microstep: 986.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-10 11:11:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 804.39 | bwd_inner_microstep: 804.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 11:11:22,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1460.02 | bwd_inner_microstep: 1459.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 11:11:25,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.73 | bwd_microstep: 1657.13 | bwd_inner_microstep: 1657.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974 [2024-06-10 11:11:26,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.15 | bwd_microstep: 827.36 | bwd_inner_microstep: 827.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 11:11:28,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1608.71 | bwd_inner_microstep: 1608.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 11:11:30,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.12 | bwd_microstep: 1533.51 | bwd_inner_microstep: 1533.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881 [2024-06-10 11:11:31,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.90 | bwd_microstep: 680.46 | bwd_inner_microstep: 680.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 11:11:33,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574 [2024-06-10 11:11:35,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.04 | bwd_microstep: 1492.31 | bwd_inner_microstep: 1492.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278 [2024-06-10 11:11:36,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.56 | bwd_microstep: 908.39 | bwd_inner_microstep: 908.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2397 [2024-06-10 11:11:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.70 | bwd_microstep: 889.22 | bwd_inner_microstep: 889.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 11:11:39,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1283.96 | bwd_inner_microstep: 1283.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 11:11:41,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1396.96 | bwd_inner_microstep: 1396.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816 [2024-06-10 11:11:43,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.82 | bwd_microstep: 1619.07 | bwd_inner_microstep: 1619.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-10 11:11:45,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.19 | bwd_microstep: 1440.40 | bwd_inner_microstep: 1440.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749 [2024-06-10 11:11:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-10 11:11:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.90 | bwd_microstep: 5977.40 | bwd_inner_microstep: 1851.93 | bwd_allreduce_microstep: 4125.39 | step_microstep: 38.77 [2024-06-10 11:11:52,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15166.45 | bwd: 44720.32 | bwd_inner: 40594.00 | bwd_allreduce: 4125.64 | step: 40.58 {'loss': 1.314, 'learning_rate': 3.0032487735895803e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 11:11:54,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.24 | bwd_microstep: 1334.01 | bwd_inner_microstep: 1333.90 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 11:11:56,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.45 | bwd_microstep: 1284.86 | bwd_inner_microstep: 1284.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904 [2024-06-10 11:11:58,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.92 | bwd_microstep: 1687.69 | bwd_inner_microstep: 1687.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 11:12:00,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1652.33 | bwd_inner_microstep: 1652.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3790 [2024-06-10 11:12:02,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.86 | bwd_microstep: 1576.04 | bwd_inner_microstep: 1576.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474 [2024-06-10 11:12:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.12 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 11:12:06,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1252.77 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 11:12:08,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1397.69 | bwd_inner_microstep: 1397.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2085 [2024-06-10 11:12:09,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.18 | bwd_microstep: 730.75 | bwd_inner_microstep: 730.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3493 [2024-06-10 11:12:11,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1533.71 | bwd_inner_microstep: 1533.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3509 [2024-06-10 11:12:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.83 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 11:12:15,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.28 | bwd_microstep: 1254.91 | bwd_inner_microstep: 1254.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 11:12:17,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1354.77 | bwd_inner_microstep: 1354.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2914 [2024-06-10 11:12:18,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.50 | bwd_microstep: 1094.41 | bwd_inner_microstep: 1094.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 11:12:19,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.59 | bwd_microstep: 795.47 | bwd_inner_microstep: 795.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 11:12:21,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 11:12:23,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.14 | bwd_microstep: 1286.13 | bwd_inner_microstep: 1286.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 11:12:25,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.55 | bwd_microstep: 1422.92 | bwd_inner_microstep: 1422.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 11:12:27,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1287.01 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 11:12:28,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.61 | bwd_microstep: 1190.46 | bwd_inner_microstep: 1190.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 11:12:29,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 799.46 | bwd_inner_microstep: 799.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3835 [2024-06-10 11:12:31,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.33 | bwd_microstep: 1359.50 | bwd_inner_microstep: 1359.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2075 [2024-06-10 11:12:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.26 | bwd_microstep: 919.01 | bwd_inner_microstep: 918.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 11:12:34,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.78 | bwd_microstep: 1389.60 | bwd_inner_microstep: 1389.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2163 [2024-06-10 11:12:36,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.19 | bwd_microstep: 857.34 | bwd_inner_microstep: 857.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 11:12:37,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.95 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 11:12:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.38 | bwd_microstep: 1666.20 | bwd_inner_microstep: 1666.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 11:12:41,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.43 | bwd_microstep: 1653.10 | bwd_inner_microstep: 1653.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597 [2024-06-10 11:12:44,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1570.50 | bwd_inner_microstep: 1570.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3778 [2024-06-10 11:12:46,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.69 | bwd_microstep: 1821.01 | bwd_inner_microstep: 1820.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580 [2024-06-10 11:12:48,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 1593.21 | bwd_inner_microstep: 1593.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776 [2024-06-10 11:12:52,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.57 [2024-06-10 11:12:52,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.65 | bwd_microstep: 3063.11 | bwd_inner_microstep: 1905.75 | bwd_allreduce_microstep: 1157.30 | step_microstep: 38.12 [2024-06-10 11:12:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15873.40 | bwd: 43802.25 | bwd_inner: 42643.95 | bwd_allreduce: 1157.59 | step: 40.23 {'loss': 1.2643, 'learning_rate': 3.0000000000000004e-05, 'epoch': 0.35} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 11:12:54,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4210 [2024-06-10 11:12:56,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.34 | bwd_microstep: 1656.69 | bwd_inner_microstep: 1656.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 11:12:58,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1349.50 | bwd_inner_microstep: 1349.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 11:13:00,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1378.33 | bwd_inner_microstep: 1378.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 11:13:02,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1384.37 | bwd_inner_microstep: 1384.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 11:13:04,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1245.67 | bwd_inner_microstep: 1245.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 11:13:06,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.27 | bwd_microstep: 1388.27 | bwd_inner_microstep: 1388.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 11:13:07,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1252.27 | bwd_inner_microstep: 1252.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 11:13:09,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1248.79 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 11:13:11,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1387.32 | bwd_inner_microstep: 1387.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 11:13:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.70 | bwd_microstep: 1393.33 | bwd_inner_microstep: 1393.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483 [2024-06-10 11:13:15,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.46 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 11:13:17,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.50 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 11:13:19,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1386.24 | bwd_inner_microstep: 1386.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 11:13:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1485.31 | bwd_inner_microstep: 1485.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 11:13:23,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1391.50 | bwd_inner_microstep: 1391.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 11:13:25,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.88 | bwd_microstep: 1620.43 | bwd_inner_microstep: 1620.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 11:13:27,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412 [2024-06-10 11:13:29,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.05 | bwd_microstep: 1291.54 | bwd_inner_microstep: 1291.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 11:13:31,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.72 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3630 [2024-06-10 11:13:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.59 | bwd_microstep: 1540.39 | bwd_inner_microstep: 1540.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 11:13:35,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1417.48 | bwd_inner_microstep: 1417.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624 [2024-06-10 11:13:37,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1442.76 | bwd_inner_microstep: 1442.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-10 11:13:39,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.02 | bwd_microstep: 1645.70 | bwd_inner_microstep: 1645.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 11:13:41,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1557.59 | bwd_inner_microstep: 1557.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-10 11:13:42,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.73 | bwd_microstep: 804.97 | bwd_inner_microstep: 804.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 11:13:44,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.66 | bwd_microstep: 1549.80 | bwd_inner_microstep: 1549.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-10 11:13:46,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.53 | bwd_microstep: 1421.25 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916 [2024-06-10 11:13:47,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.21 | bwd_microstep: 691.69 | bwd_inner_microstep: 691.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437 [2024-06-10 11:13:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1372.98 | bwd_inner_microstep: 1372.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3773 [2024-06-10 11:13:51,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1676.20 | bwd_inner_microstep: 1676.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 11:13:55,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.36 | optimizer_step: 6.62 [2024-06-10 11:13:55,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 2842.53 | bwd_inner_microstep: 1814.43 | bwd_allreduce_microstep: 1028.03 | step_microstep: 39.48 [2024-06-10 11:13:55,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16682.80 | bwd: 45898.12 | bwd_inner: 44868.89 | bwd_allreduce: 1028.42 | step: 41.70 {'loss': 1.2655, 'learning_rate': 2.9967477044156184e-05, 'epoch': 0.35} 35%|███▌ | 606/1726 [10:31:25<20:01:27, 64.36s/it] 35%|███▌ | 607/1726 [10:32:27<19:45:44, 63.58s/it] 35%|███▌ | 607/1726 [10:32:27<19:45:44, 63.58s/it] 35%|███▌ | 608/1726 [10:33:28<19:32:17, 62.91s/it] 35%|███▌ | 608/1726 [10:33:28<19:32:17, 62.91s/it] 35%|███▌ | 609/1726 [10:34:29<19:16:19, 62.11s/it] 35%|███▌ | 609/1726 [10:34:29<19:16:19, 62.11s/it] 35%|███▌ | 610/1726 [10:35:29<19:03:45, 61.49s/it] 35%|███▌ | 610/1726 [10:35:29<19:03:45, 61.49s/it] 35%|███▌ | 611/1726 [10:36:32<19:10:59, 61.94s/it] 35%|███▌ | 611/1726 [10:36:32> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800 [INFO|configuration_utils.py:473] 2024-06-10 14:27:16,093 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/config.json [INFO|configuration_utils.py:594] 2024-06-10 14:27:16,131 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 14:27:23,894 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 14:27:23,920 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 14:27:23,928 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 14:27:23,930 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/added_tokens.json [2024-06-10 14:27:24,192] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step800 is about to be saved! [2024-06-10 14:27:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt [2024-06-10 14:27:24,204] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt... [2024-06-10 14:27:32,653] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/mp_rank_00_model_states.pt. [2024-06-10 14:27:32,687] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 14:27:44,856] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 14:27:44,953] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-800/global_step800/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 14:27:44,954] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step800 is ready now! [INFO|trainer.py:3028] 2024-06-10 14:27:45,176 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-200] due to args.save_total_limit dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1270 [2024-06-10 14:27:46,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.54 | bwd_microstep: 453.65 | bwd_inner_microstep: 453.51 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 14:27:48,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1469.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 14:27:50,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.84 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296 [2024-06-10 14:27:51,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.78 | bwd_microstep: 967.75 | bwd_inner_microstep: 967.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 14:27:53,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1406.48 | bwd_inner_microstep: 1406.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 14:27:55,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.92 | bwd_microstep: 1277.69 | bwd_inner_microstep: 1277.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 14:27:57,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.09 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507 [2024-06-10 14:27:59,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.24 | bwd_microstep: 1190.70 | bwd_inner_microstep: 1190.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2457 [2024-06-10 14:28:00,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.83 | bwd_microstep: 1022.84 | bwd_inner_microstep: 1022.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4006 [2024-06-10 14:28:02,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.72 | bwd_microstep: 1635.82 | bwd_inner_microstep: 1635.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2471 [2024-06-10 14:28:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.38 | bwd_microstep: 979.96 | bwd_inner_microstep: 979.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 14:28:05,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1344.56 | bwd_inner_microstep: 1344.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 14:28:08,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1491.91 | bwd_inner_microstep: 1491.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-10 14:28:10,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 1605.52 | bwd_inner_microstep: 1605.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3519 [2024-06-10 14:28:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.16 | bwd_microstep: 1432.13 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 14:28:14,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1371.56 | bwd_inner_microstep: 1371.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100 [2024-06-10 14:28:15,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 914.80 | bwd_inner_microstep: 914.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513 [2024-06-10 14:28:17,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.81 | bwd_microstep: 1409.09 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 14:28:19,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1431.26 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913 [2024-06-10 14:28:20,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.57 | bwd_microstep: 686.95 | bwd_inner_microstep: 686.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2038 [2024-06-10 14:28:21,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.51 | bwd_microstep: 868.31 | bwd_inner_microstep: 868.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647 [2024-06-10 14:28:23,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1603.33 | bwd_inner_microstep: 1603.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3505 [2024-06-10 14:28:25,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 14:28:27,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1411.08 | bwd_inner_microstep: 1411.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466 [2024-06-10 14:28:28,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.91 | bwd_microstep: 1180.20 | bwd_inner_microstep: 1180.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-10 14:28:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1357.05 | bwd_inner_microstep: 1357.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 14:28:32,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1374.44 | bwd_inner_microstep: 1374.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 14:28:34,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1292.21 | bwd_inner_microstep: 1292.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-10 14:28:36,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.55 | bwd_microstep: 1641.72 | bwd_inner_microstep: 1641.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3761 [2024-06-10 14:28:38,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.50 | bwd_microstep: 1469.37 | bwd_inner_microstep: 1469.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770 [2024-06-10 14:28:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1566.32 | bwd_inner_microstep: 1566.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3757 [2024-06-10 14:28:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-10 14:28:46,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.95 | bwd_microstep: 5030.98 | bwd_inner_microstep: 1979.37 | bwd_allreduce_microstep: 3051.56 | step_microstep: 38.09 [2024-06-10 14:28:46,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15542.64 | bwd: 44904.33 | bwd_inner: 41851.77 | bwd_allreduce: 3051.84 | step: 39.79 {'loss': 1.2194, 'learning_rate': 2.3287994683714222e-05, 'epoch': 0.46} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 14:28:48,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1371.48 | bwd_inner_microstep: 1371.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 14:28:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 14:28:52,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1379.21 | bwd_inner_microstep: 1379.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907 [2024-06-10 14:28:53,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.51 | bwd_microstep: 773.87 | bwd_inner_microstep: 773.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797 [2024-06-10 14:28:55,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 1643.40 | bwd_inner_microstep: 1643.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 14:28:57,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 14:28:59,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1277.40 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2638 [2024-06-10 14:29:00,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.91 | bwd_microstep: 919.29 | bwd_inner_microstep: 919.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 14:29:02,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1385.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 14:29:04,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1286.00 | bwd_inner_microstep: 1285.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 14:29:06,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1383.34 | bwd_inner_microstep: 1383.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 14:29:07,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1335.98 | bwd_inner_microstep: 1335.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3660 [2024-06-10 14:29:09,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1320.91 | bwd_inner_microstep: 1320.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 14:29:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 795.02 | bwd_inner_microstep: 794.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2014 [2024-06-10 14:29:11,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.82 | bwd_microstep: 835.79 | bwd_inner_microstep: 835.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672 [2024-06-10 14:29:13,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.67 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 14:29:15,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1383.68 | bwd_inner_microstep: 1383.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 14:29:17,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.15 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3830 [2024-06-10 14:29:19,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 14:29:21,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1407.98 | bwd_inner_microstep: 1407.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-10 14:29:23,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1609.78 | bwd_inner_microstep: 1609.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3563 [2024-06-10 14:29:25,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.01 | bwd_microstep: 1232.37 | bwd_inner_microstep: 1232.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920 [2024-06-10 14:29:26,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.89 | bwd_microstep: 686.34 | bwd_inner_microstep: 686.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-10 14:29:28,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1475.15 | bwd_inner_microstep: 1475.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 14:29:30,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1553.61 | bwd_inner_microstep: 1553.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235 [2024-06-10 14:29:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.73 | bwd_microstep: 864.81 | bwd_inner_microstep: 864.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1081 [2024-06-10 14:29:32,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.53 | bwd_microstep: 420.81 | bwd_inner_microstep: 420.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439 [2024-06-10 14:29:34,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 14:29:36,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1344.99 | bwd_inner_microstep: 1344.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 14:29:38,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1380.89 | bwd_inner_microstep: 1380.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 14:29:40,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1395.36 | bwd_inner_microstep: 1395.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3819 [2024-06-10 14:29:47,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.38 | optimizer_step: 6.59 [2024-06-10 14:29:47,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.15 | bwd_microstep: 6816.39 | bwd_inner_microstep: 2042.53 | bwd_allreduce_microstep: 4773.81 | step_microstep: 38.88 [2024-06-10 14:29:47,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15175.15 | bwd: 45417.19 | bwd_inner: 40642.47 | bwd_allreduce: 4774.04 | step: 40.32 {'loss': 1.2191, 'learning_rate': 2.325096564751193e-05, 'epoch': 0.46} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 14:29:48,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.20 | bwd_microstep: 790.56 | bwd_inner_microstep: 790.49 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 14:29:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.87 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 14:29:52,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1476.94 | bwd_inner_microstep: 1476.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 14:29:54,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.43 | bwd_inner_microstep: 1479.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058 [2024-06-10 14:29:55,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.64 | bwd_microstep: 815.97 | bwd_inner_microstep: 815.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 14:29:57,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1295.16 | bwd_inner_microstep: 1295.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426 [2024-06-10 14:29:59,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 14:30:00,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.89 | bwd_microstep: 791.33 | bwd_inner_microstep: 791.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 14:30:02,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1342.39 | bwd_inner_microstep: 1342.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 14:30:04,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 14:30:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1402.33 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 14:30:07,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 14:30:09,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1407.20 | bwd_inner_microstep: 1407.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 14:30:11,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1478.65 | bwd_inner_microstep: 1478.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3628 [2024-06-10 14:30:13,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.64 | bwd_microstep: 1398.84 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3689 [2024-06-10 14:30:15,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.74 | bwd_microstep: 1493.52 | bwd_inner_microstep: 1493.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 14:30:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1615.41 | bwd_inner_microstep: 1615.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 14:30:20,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1656.78 | bwd_inner_microstep: 1656.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 14:30:22,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.53 | bwd_microstep: 1461.23 | bwd_inner_microstep: 1461.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 14:30:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.55 | bwd_microstep: 1458.66 | bwd_inner_microstep: 1458.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3773 [2024-06-10 14:30:26,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.27 | bwd_microstep: 1345.04 | bwd_inner_microstep: 1345.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3836 [2024-06-10 14:30:28,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.74 | bwd_microstep: 1262.47 | bwd_inner_microstep: 1262.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825 [2024-06-10 14:30:30,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.18 | bwd_microstep: 1585.29 | bwd_inner_microstep: 1585.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 14:30:32,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1511.86 | bwd_inner_microstep: 1511.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816 [2024-06-10 14:30:34,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.39 | bwd_microstep: 1615.67 | bwd_inner_microstep: 1615.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780 [2024-06-10 14:30:37,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.81 | bwd_microstep: 2411.76 | bwd_inner_microstep: 2411.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 14:30:39,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.87 | bwd_microstep: 1593.38 | bwd_inner_microstep: 1593.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818 [2024-06-10 14:30:41,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.11 | bwd_microstep: 1587.52 | bwd_inner_microstep: 1587.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 14:30:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.15 | bwd_microstep: 1647.82 | bwd_inner_microstep: 1647.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810 [2024-06-10 14:30:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.82 | bwd_microstep: 1752.52 | bwd_inner_microstep: 1752.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471 [2024-06-10 14:30:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.10 | bwd_microstep: 1434.61 | bwd_inner_microstep: 1434.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-10 14:30:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 14:30:49,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.25 | bwd_microstep: 723.25 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 13.82 | step_microstep: 38.87 [2024-06-10 14:30:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16574.75 | bwd: 45208.33 | bwd_inner: 45193.58 | bwd_allreduce: 14.07 | step: 40.34 {'loss': 1.199, 'learning_rate': 2.3213925161425533e-05, 'epoch': 0.47} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457 [2024-06-10 14:30:51,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1565.60 | bwd_inner_microstep: 1565.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2366 [2024-06-10 14:30:53,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.97 | bwd_microstep: 987.62 | bwd_inner_microstep: 987.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 14:30:55,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1550.89 | bwd_inner_microstep: 1550.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-10 14:30:56,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.13 | bwd_microstep: 803.82 | bwd_inner_microstep: 803.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 14:30:58,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-10 14:30:59,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 790.14 | bwd_inner_microstep: 790.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883 [2024-06-10 14:31:00,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.09 | bwd_microstep: 680.81 | bwd_inner_microstep: 680.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 14:31:02,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1386.23 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 712 [2024-06-10 14:31:02,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.98 | bwd_microstep: 290.04 | bwd_inner_microstep: 290.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3470 [2024-06-10 14:31:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.45 | bwd_microstep: 1432.00 | bwd_inner_microstep: 1431.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508 [2024-06-10 14:31:06,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.21 | bwd_microstep: 1549.01 | bwd_inner_microstep: 1548.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3921 [2024-06-10 14:31:09,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.51 | bwd_microstep: 1740.01 | bwd_inner_microstep: 1739.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3622 [2024-06-10 14:31:11,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.94 | bwd_microstep: 1452.85 | bwd_inner_microstep: 1452.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-10 14:31:13,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.16 | bwd_microstep: 1437.34 | bwd_inner_microstep: 1437.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 14:31:15,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1487.97 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 14:31:16,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1288.38 | bwd_inner_microstep: 1288.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296 [2024-06-10 14:31:18,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.27 | bwd_microstep: 975.38 | bwd_inner_microstep: 975.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2428 [2024-06-10 14:31:19,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.79 | bwd_microstep: 1038.75 | bwd_inner_microstep: 1038.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3618 [2024-06-10 14:31:21,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 1536.17 | bwd_inner_microstep: 1536.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-10 14:31:22,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 798.09 | bwd_inner_microstep: 798.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 14:31:24,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1556.39 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 14:31:26,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1405.48 | bwd_inner_microstep: 1405.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 14:31:28,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1280.61 | bwd_inner_microstep: 1280.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2005 [2024-06-10 14:31:29,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.80 | bwd_microstep: 861.64 | bwd_inner_microstep: 861.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 14:31:31,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1347.63 | bwd_inner_microstep: 1347.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 14:31:34,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466 [2024-06-10 14:31:35,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417 [2024-06-10 14:31:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.49 | bwd_microstep: 1372.71 | bwd_inner_microstep: 1372.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3461 [2024-06-10 14:31:39,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.85 | bwd_microstep: 1520.90 | bwd_inner_microstep: 1520.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-10 14:31:42,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.16 | bwd_microstep: 1635.09 | bwd_inner_microstep: 1635.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487 [2024-06-10 14:31:44,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1442.81 | bwd_inner_microstep: 1442.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773 [2024-06-10 14:31:52,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.33 | optimizer_step: 6.62 [2024-06-10 14:31:52,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 7282.92 | bwd_inner_microstep: 2293.44 | bwd_allreduce_microstep: 4989.41 | step_microstep: 38.88 [2024-06-10 14:31:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15382.96 | bwd: 46787.77 | bwd_inner: 41797.43 | bwd_allreduce: 4989.64 | step: 40.39 {'loss': 1.2867, 'learning_rate': 2.3176873355911414e-05, 'epoch': 0.47} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 14:31:53,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.73 | bwd_microstep: 1235.14 | bwd_inner_microstep: 1235.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 14:31:54,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.75 | bwd_microstep: 777.39 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 14:31:56,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1378.16 | bwd_inner_microstep: 1378.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801 [2024-06-10 14:31:58,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.13 | bwd_microstep: 1545.61 | bwd_inner_microstep: 1545.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 14:32:01,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1489.57 | bwd_inner_microstep: 1489.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 14:32:02,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1383.14 | bwd_inner_microstep: 1383.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 841 [2024-06-10 14:32:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.01 | bwd_microstep: 344.78 | bwd_inner_microstep: 344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681 [2024-06-10 14:32:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1326.38 | bwd_inner_microstep: 1326.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-10 14:32:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 794.25 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 14:32:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.36 | bwd_microstep: 792.98 | bwd_inner_microstep: 792.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1917 [2024-06-10 14:32:08,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.39 | bwd_microstep: 716.64 | bwd_inner_microstep: 716.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 14:32:10,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1389.27 | bwd_inner_microstep: 1389.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-10 14:32:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.82 | bwd_microstep: 1513.89 | bwd_inner_microstep: 1513.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 14:32:14,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.95 | bwd_microstep: 1581.94 | bwd_inner_microstep: 1581.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 14:32:16,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.88 | bwd_microstep: 1480.91 | bwd_inner_microstep: 1480.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 14:32:18,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.38 | bwd_microstep: 1352.57 | bwd_inner_microstep: 1352.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840 [2024-06-10 14:32:20,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1654.61 | bwd_inner_microstep: 1654.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533 [2024-06-10 14:32:22,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.19 | bwd_microstep: 1196.43 | bwd_inner_microstep: 1196.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 14:32:24,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1394.64 | bwd_inner_microstep: 1394.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-10 14:32:26,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1357.44 | bwd_inner_microstep: 1357.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-10 14:32:28,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1411.79 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583 [2024-06-10 14:32:29,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1236.40 | bwd_inner_microstep: 1236.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3657 [2024-06-10 14:32:32,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.27 | bwd_microstep: 1482.83 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 14:32:33,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1285.81 | bwd_inner_microstep: 1285.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3816 [2024-06-10 14:32:36,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1630.28 | bwd_inner_microstep: 1630.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 14:32:38,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.40 | bwd_microstep: 1467.01 | bwd_inner_microstep: 1466.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 14:32:40,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.48 | bwd_microstep: 1655.21 | bwd_inner_microstep: 1655.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963 [2024-06-10 14:32:41,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.41 | bwd_microstep: 704.82 | bwd_inner_microstep: 704.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3026 [2024-06-10 14:32:43,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.06 | bwd_microstep: 1230.49 | bwd_inner_microstep: 1230.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792 [2024-06-10 14:32:45,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.48 | bwd_microstep: 1549.42 | bwd_inner_microstep: 1549.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3756 [2024-06-10 14:32:47,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.73 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-10 14:32:52,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.22 | optimizer_step: 6.64 [2024-06-10 14:32:52,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.85 | bwd_microstep: 4514.54 | bwd_inner_microstep: 1406.99 | bwd_allreduce_microstep: 3107.49 | step_microstep: 37.91 [2024-06-10 14:32:52,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15375.53 | bwd: 44265.97 | bwd_inner: 41157.56 | bwd_allreduce: 3107.72 | step: 39.38 {'loss': 1.2487, 'learning_rate': 2.3139810361465854e-05, 'epoch': 0.47} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 14:32:54,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.95 | bwd_microstep: 1365.79 | bwd_inner_microstep: 1365.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2363 [2024-06-10 14:32:55,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.25 | bwd_microstep: 987.43 | bwd_inner_microstep: 987.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 14:32:57,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1383.30 | bwd_inner_microstep: 1383.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 14:32:59,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.45 | bwd_microstep: 1639.90 | bwd_inner_microstep: 1639.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 14:33:01,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.76 | bwd_microstep: 1643.58 | bwd_inner_microstep: 1643.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 14:33:03,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.81 | bwd_microstep: 1146.44 | bwd_inner_microstep: 1146.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4089 [2024-06-10 14:33:05,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1529.84 | bwd_inner_microstep: 1529.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 14:33:07,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3910 [2024-06-10 14:33:09,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.83 | bwd_microstep: 1591.78 | bwd_inner_microstep: 1591.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-10 14:33:11,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.30 | bwd_microstep: 1535.06 | bwd_inner_microstep: 1535.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3502 [2024-06-10 14:33:13,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.66 | bwd_microstep: 1347.25 | bwd_inner_microstep: 1347.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 14:33:15,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1349.13 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1944 [2024-06-10 14:33:16,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.98 | bwd_microstep: 886.22 | bwd_inner_microstep: 886.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416 [2024-06-10 14:33:18,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.34 | bwd_microstep: 1313.21 | bwd_inner_microstep: 1313.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 14:33:19,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 790.51 | bwd_inner_microstep: 790.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3770 [2024-06-10 14:33:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1495.40 | bwd_inner_microstep: 1495.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 14:33:23,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.63 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 14:33:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1587.86 | bwd_inner_microstep: 1587.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 14:33:27,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1495.13 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-10 14:33:29,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.55 | bwd_microstep: 973.96 | bwd_inner_microstep: 973.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898 [2024-06-10 14:33:31,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.05 | bwd_microstep: 1578.43 | bwd_inner_microstep: 1578.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 14:33:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1450.96 | bwd_inner_microstep: 1450.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818 [2024-06-10 14:33:35,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.01 | bwd_microstep: 1719.76 | bwd_inner_microstep: 1719.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 14:33:37,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.68 | bwd_microstep: 1390.22 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 14:33:39,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.77 | bwd_microstep: 1405.55 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3452 [2024-06-10 14:33:41,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.78 | bwd_microstep: 1218.82 | bwd_inner_microstep: 1218.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594 [2024-06-10 14:33:42,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1308.25 | bwd_inner_microstep: 1308.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 14:33:45,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.55 | bwd_microstep: 1502.24 | bwd_inner_microstep: 1502.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 14:33:46,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262 [2024-06-10 14:33:48,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.65 | bwd_microstep: 972.27 | bwd_inner_microstep: 972.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765 [2024-06-10 14:33:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.60 | bwd_microstep: 1639.79 | bwd_inner_microstep: 1639.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 14:33:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 14:33:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 3911.89 | bwd_inner_microstep: 1529.19 | bwd_allreduce_microstep: 2382.64 | step_microstep: 38.14 [2024-06-10 14:33:55,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.06 | bwd: 46270.15 | bwd_inner: 43886.59 | bwd_allreduce: 2382.88 | step: 39.56 46%|████▋ | 801/1726 [13:51:23<18:16:09, 71.10s/it] 46%|████▋ | 801/1726 [13:51:23<18:16:09, 71.10s/it] 46%|████▋ | 802/1726 [13:52:24<17:27:56, 68.05s/it] 46%|████▋ | 802/1726 [13:52:24<17:27:56, 68.05s/it] 47%|████▋ | 803/1726 [13:53:26<16:59:28, 66.27s/it] 47%|████▋ | 803/1726 [13:53:26<16:59:28, 66.27s/it] 47%|████▋ | 804/1726 [13:54:28<16:41:00, 65.14s/it] 47%|████▋ | 804/1726 [13:54:28<16:41:00, 65.14s/it] 47%|████▋ | 805/1726 [13:55:28<16:16:04, 63.59s/it] 47%|████▋ | 805/1726 [13:55:28<16:16:04, 63.59s/it] 47%|████▋ | 806/1726 [13:56:31<16:11:52, 63.38s/{'loss': 1.2709, 'learning_rate': 2.310273630862453e-05, 'epoch': 0.47} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 14:33:56,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1385.84 | bwd_inner_microstep: 1385.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1858 [2024-06-10 14:33:57,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.67 | bwd_microstep: 674.01 | bwd_inner_microstep: 673.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842 [2024-06-10 14:34:00,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.72 | bwd_microstep: 1554.24 | bwd_inner_microstep: 1554.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3837 [2024-06-10 14:34:02,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.95 | bwd_microstep: 1655.45 | bwd_inner_microstep: 1655.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 14:34:04,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.33 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 14:34:06,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1352.36 | bwd_inner_microstep: 1352.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 14:34:08,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1381.24 | bwd_inner_microstep: 1381.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 14:34:09,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 794.03 | bwd_inner_microstep: 794.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2151 [2024-06-10 14:34:10,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 927.93 | bwd_inner_microstep: 927.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411 [2024-06-10 14:34:12,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.78 | bwd_microstep: 1310.83 | bwd_inner_microstep: 1310.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 14:34:14,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.01 | bwd_microstep: 1381.85 | bwd_inner_microstep: 1381.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 14:34:16,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 14:34:18,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.69 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2175 [2024-06-10 14:34:19,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.66 | bwd_microstep: 1045.50 | bwd_inner_microstep: 1045.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3380 [2024-06-10 14:34:21,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.84 | bwd_microstep: 1303.04 | bwd_inner_microstep: 1303.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 14:34:23,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 14:34:24,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.00 | bwd_inner_microstep: 1347.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 14:34:27,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1531.08 | bwd_inner_microstep: 1531.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 14:34:29,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1517.75 | bwd_inner_microstep: 1517.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3661 [2024-06-10 14:34:31,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.12 | bwd_microstep: 1553.21 | bwd_inner_microstep: 1553.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 14:34:33,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1255.71 | bwd_inner_microstep: 1255.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 14:34:34,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 14:34:37,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1657.80 | bwd_inner_microstep: 1657.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634 [2024-06-10 14:34:39,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.95 | bwd_microstep: 1513.46 | bwd_inner_microstep: 1513.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1889 [2024-06-10 14:34:40,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.48 | bwd_microstep: 713.60 | bwd_inner_microstep: 713.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 14:34:41,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1292.91 | bwd_inner_microstep: 1292.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-10 14:34:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1300.56 | bwd_inner_microstep: 1300.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 14:34:45,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.87 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 14:34:47,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023 [2024-06-10 14:34:48,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 807.93 | bwd_inner_microstep: 807.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3794 [2024-06-10 14:34:50,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1516.77 | bwd_inner_microstep: 1516.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3573 [2024-06-10 14:34:55,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 14:34:55,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 3512.24 | bwd_inner_microstep: 1723.03 | bwd_allreduce_microstep: 1789.16 | step_microstep: 37.95 [2024-06-10 14:34:55,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15692.38 | bwd: 43950.03 | bwd_inner: 42159.96 | bwd_allreduce: 1789.39 | step: 39.42 {'loss': 1.2709, 'learning_rate': 2.3065651327962054e-05, 'epoch': 0.47} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4557 [2024-06-10 14:34:57,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 714.26 | bwd_microstep: 1940.87 | bwd_inner_microstep: 1940.66 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 14:34:59,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1379.43 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 14:35:01,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1375.15 | bwd_inner_microstep: 1375.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-10 14:35:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.82 | bwd_microstep: 1492.10 | bwd_inner_microstep: 1492.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 14:35:05,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.05 | bwd_inner_microstep: 1343.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806 [2024-06-10 14:35:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.38 | bwd_microstep: 1417.04 | bwd_inner_microstep: 1417.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 14:35:09,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 14:35:11,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582 [2024-06-10 14:35:13,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1302.06 | bwd_inner_microstep: 1302.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 14:35:14,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 14:35:16,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.33 | bwd_microstep: 1414.10 | bwd_inner_microstep: 1414.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899 [2024-06-10 14:35:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 714.96 | bwd_inner_microstep: 714.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 14:35:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.53 | bwd_microstep: 1315.16 | bwd_inner_microstep: 1315.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 14:35:21,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 14:35:23,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 14:35:25,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.18 | bwd_microstep: 1337.24 | bwd_inner_microstep: 1337.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 14:35:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 790.22 | bwd_inner_microstep: 790.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 14:35:28,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1389.83 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 14:35:30,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 14:35:32,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1295.41 | bwd_inner_microstep: 1295.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1937 [2024-06-10 14:35:33,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.94 | bwd_microstep: 696.30 | bwd_inner_microstep: 696.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3078 [2024-06-10 14:35:34,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.97 | bwd_microstep: 1053.05 | bwd_inner_microstep: 1053.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 14:35:36,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1488.48 | bwd_inner_microstep: 1488.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 14:35:38,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.15 | bwd_microstep: 1401.95 | bwd_inner_microstep: 1401.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3664 [2024-06-10 14:35:40,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.41 | bwd_microstep: 1454.17 | bwd_inner_microstep: 1454.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-10 14:35:42,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1325.46 | bwd_inner_microstep: 1325.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3378 [2024-06-10 14:35:44,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.05 | bwd_microstep: 1432.28 | bwd_inner_microstep: 1432.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 14:35:46,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1258.40 | bwd_inner_microstep: 1258.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438 [2024-06-10 14:35:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.81 | bwd_microstep: 1321.52 | bwd_inner_microstep: 1321.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766 [2024-06-10 14:35:49,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1443.83 | bwd_inner_microstep: 1443.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 14:35:51,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1394.50 | bwd_inner_microstep: 1394.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 14:35:56,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 14:35:56,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 4459.66 | bwd_inner_microstep: 1946.25 | bwd_allreduce_microstep: 2513.35 | step_microstep: 37.96 [2024-06-10 14:35:56,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.17 | bwd: 45557.88 | bwd_inner: 43043.46 | bwd_allreduce: 2513.66 | step: 39.51 {'loss': 1.2509, 'learning_rate': 2.3028555550091536e-05, 'epoch': 0.47} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1929 [2024-06-10 14:35:58,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.04 | bwd_microstep: 836.93 | bwd_inner_microstep: 836.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 14:36:00,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1547.41 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814 [2024-06-10 14:36:02,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.51 | bwd_microstep: 1578.36 | bwd_inner_microstep: 1578.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 14:36:04,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.42 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 14:36:06,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.35 | bwd_microstep: 1336.04 | bwd_inner_microstep: 1336.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 14:36:07,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.56 | bwd_microstep: 1294.91 | bwd_inner_microstep: 1294.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 14:36:09,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 14:36:10,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.97 | bwd_microstep: 792.50 | bwd_inner_microstep: 792.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953 [2024-06-10 14:36:11,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.64 | bwd_microstep: 729.51 | bwd_inner_microstep: 729.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 14:36:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1243.49 | bwd_inner_microstep: 1243.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-10 14:36:15,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.38 | bwd_microstep: 1276.89 | bwd_inner_microstep: 1276.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2014 [2024-06-10 14:36:16,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 740.13 | bwd_inner_microstep: 740.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693 [2024-06-10 14:36:18,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1721.88 | bwd_inner_microstep: 1721.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193 [2024-06-10 14:36:20,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 950.81 | bwd_inner_microstep: 950.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3899 [2024-06-10 14:36:22,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1455.55 | bwd_inner_microstep: 1455.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 14:36:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 14:36:25,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1348.53 | bwd_inner_microstep: 1348.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105 [2024-06-10 14:36:26,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.63 | bwd_microstep: 821.84 | bwd_inner_microstep: 821.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 14:36:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.70 | bwd_microstep: 1448.88 | bwd_inner_microstep: 1448.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-10 14:36:30,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.15 | bwd_microstep: 1485.18 | bwd_inner_microstep: 1485.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449 [2024-06-10 14:36:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.31 | bwd_microstep: 1190.22 | bwd_inner_microstep: 1190.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 14:36:34,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.98 | bwd_microstep: 1353.04 | bwd_inner_microstep: 1353.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2291 [2024-06-10 14:36:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.88 | bwd_microstep: 909.31 | bwd_inner_microstep: 909.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-10 14:36:37,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1448.72 | bwd_inner_microstep: 1448.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3616 [2024-06-10 14:36:39,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.31 | bwd_microstep: 1638.86 | bwd_inner_microstep: 1638.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599 [2024-06-10 14:36:42,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.65 | bwd_microstep: 1559.96 | bwd_inner_microstep: 1559.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379 [2024-06-10 14:36:43,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1240.45 | bwd_inner_microstep: 1240.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3588 [2024-06-10 14:36:46,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.61 | bwd_microstep: 1767.87 | bwd_inner_microstep: 1767.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420 [2024-06-10 14:36:48,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1534.71 | bwd_inner_microstep: 1534.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 14:36:50,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 1487.97 | bwd_inner_microstep: 1487.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3650 [2024-06-10 14:36:52,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1535.95 | bwd_inner_microstep: 1535.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 14:36:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 14:36:58,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 4986.86 | bwd_inner_microstep: 1521.34 | bwd_allreduce_microstep: 3465.47 | step_microstep: 38.04 [2024-06-10 14:36:58,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15566.70 | bwd: 45316.46 | bwd_inner: 41850.01 | bwd_allreduce: 3465.73 | step: 39.53 {'loss': 1.2591, 'learning_rate': 2.2991449105664113e-05, 'epoch': 0.47} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-10 14:37:00,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.63 | bwd_microstep: 1433.41 | bwd_inner_microstep: 1433.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 14:37:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.96 | bwd_microstep: 1238.78 | bwd_inner_microstep: 1238.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-10 14:37:02,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.69 | bwd_microstep: 800.57 | bwd_inner_microstep: 800.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3530 [2024-06-10 14:37:04,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1355.28 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 14:37:05,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.80 | bwd_microstep: 679.79 | bwd_inner_microstep: 679.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957 [2024-06-10 14:37:06,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.83 | bwd_microstep: 381.22 | bwd_inner_microstep: 381.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 14:37:08,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.45 | bwd_inner_microstep: 1382.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3236 [2024-06-10 14:37:09,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.04 | bwd_microstep: 1178.62 | bwd_inner_microstep: 1178.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2215 [2024-06-10 14:37:11,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.10 | bwd_microstep: 939.67 | bwd_inner_microstep: 939.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-10 14:37:13,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-10 14:37:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 710.98 | bwd_inner_microstep: 710.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 14:37:15,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1279.09 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3667 [2024-06-10 14:37:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.92 | bwd_microstep: 1654.10 | bwd_inner_microstep: 1654.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 14:37:19,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1346.83 | bwd_inner_microstep: 1346.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3667 [2024-06-10 14:37:22,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.09 | bwd_microstep: 1716.68 | bwd_inner_microstep: 1716.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2886 [2024-06-10 14:37:23,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.60 | bwd_microstep: 1085.91 | bwd_inner_microstep: 1085.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 14:37:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1245.37 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294 [2024-06-10 14:37:27,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.88 | bwd_microstep: 1069.78 | bwd_inner_microstep: 1069.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464 [2024-06-10 14:37:28,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.24 | bwd_microstep: 1403.16 | bwd_inner_microstep: 1403.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007 [2024-06-10 14:37:30,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 901.03 | bwd_inner_microstep: 901.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290 [2024-06-10 14:37:31,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.31 | bwd_microstep: 882.08 | bwd_inner_microstep: 882.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 14:37:33,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1509.09 | bwd_inner_microstep: 1509.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 14:37:35,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.84 | bwd_microstep: 1186.83 | bwd_inner_microstep: 1186.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 14:37:37,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.29 | bwd_microstep: 1558.76 | bwd_inner_microstep: 1558.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 14:37:39,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 14:37:41,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1355.53 | bwd_inner_microstep: 1355.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 14:37:43,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1393.48 | bwd_inner_microstep: 1393.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-10 14:37:44,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.58 | bwd_microstep: 877.06 | bwd_inner_microstep: 877.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426 [2024-06-10 14:37:46,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.40 | bwd_microstep: 1538.74 | bwd_inner_microstep: 1538.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569 [2024-06-10 14:37:48,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.60 | bwd_microstep: 1450.08 | bwd_inner_microstep: 1450.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 14:37:50,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1313.81 | bwd_inner_microstep: 1313.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3987 [2024-06-10 14:37:57,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 14:37:57,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.26 | bwd_microstep: 6716.32 | bwd_inner_microstep: 2041.96 | bwd_allreduce_microstep: 4674.31 | step_microstep: 37.85 [2024-06-10 14:37:57,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14834.40 | bwd: 44459.33 | bwd_inner: 39784.12 | bwd_allreduce: 4674.54 | step: 39.41 {'loss': 1.2518, 'learning_rate': 2.295433212536849e-05, 'epoch': 0.47} dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4678 [2024-06-10 14:38:00,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 755.98 | bwd_microstep: 2064.52 | bwd_inner_microstep: 2064.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3929 [2024-06-10 14:38:02,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1496.44 | bwd_inner_microstep: 1496.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839 [2024-06-10 14:38:04,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 14:38:05,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.65 | bwd_microstep: 875.70 | bwd_inner_microstep: 875.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879 [2024-06-10 14:38:06,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.87 | bwd_microstep: 679.68 | bwd_inner_microstep: 679.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 14:38:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-10 14:38:10,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1428.25 | bwd_inner_microstep: 1428.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 14:38:12,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1247.69 | bwd_inner_microstep: 1247.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 14:38:14,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1371 [2024-06-10 14:38:14,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.89 | bwd_microstep: 522.04 | bwd_inner_microstep: 522.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3984 [2024-06-10 14:38:16,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1488.00 | bwd_inner_microstep: 1487.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651 [2024-06-10 14:38:18,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.31 | bwd_microstep: 1316.70 | bwd_inner_microstep: 1316.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 14:38:20,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1405.16 | bwd_inner_microstep: 1405.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2187 [2024-06-10 14:38:21,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.69 | bwd_microstep: 857.82 | bwd_inner_microstep: 857.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 14:38:23,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1473.48 | bwd_inner_microstep: 1473.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 14:38:25,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1384.21 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3474 [2024-06-10 14:38:27,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1326.26 | bwd_inner_microstep: 1326.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 14:38:29,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1281.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 14:38:31,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1289.91 | bwd_inner_microstep: 1289.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2276 [2024-06-10 14:38:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.10 | bwd_microstep: 782.86 | bwd_inner_microstep: 782.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 14:38:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1250.79 | bwd_inner_microstep: 1250.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 14:38:35,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1394.00 | bwd_inner_microstep: 1393.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167 [2024-06-10 14:38:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.15 | bwd_microstep: 854.83 | bwd_inner_microstep: 854.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3539 [2024-06-10 14:38:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1358.60 | bwd_inner_microstep: 1358.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4896 [2024-06-10 14:38:41,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 756.08 | bwd_microstep: 2050.02 | bwd_inner_microstep: 2050.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3427 [2024-06-10 14:38:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471 [2024-06-10 14:38:45,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1403.00 | bwd_inner_microstep: 1402.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3594 [2024-06-10 14:38:47,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.74 | bwd_microstep: 1337.18 | bwd_inner_microstep: 1337.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-10 14:38:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.58 | bwd_microstep: 708.70 | bwd_inner_microstep: 708.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 14:38:50,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1259.16 | bwd_inner_microstep: 1259.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582 [2024-06-10 14:38:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1591.93 | bwd_inner_microstep: 1591.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3009 [2024-06-10 14:38:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 14:38:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 5518.26 | bwd_inner_microstep: 1435.39 | bwd_allreduce_microstep: 4082.81 | step_microstep: 38.15 [2024-06-10 14:38:58,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15371.15 | bwd: 45119.98 | bwd_inner: 41036.27 | bwd_allreduce: 4083.04 | step: 39.63 {'loss': 1.2639, 'learning_rate': 2.291720473993049e-05, 'epoch': 0.47} it] 47%|████▋ | 806/1726 [13:56:31<16:11:52, 63.38s/it] 47%|████▋ | 807/1726 [13:57:31<15:55:09, 62.36s/it] 47%|████▋ | 807/1726 [13:57:31<15:55:09, 62.36s/it] 47%|████▋ | 808/1726 [13:58:33<15:51:52, 62.21s/it] 47%|████▋ | 808/1726 [13:58:33<15:51:52, 62.21s/it] 47%|████▋ | 809/1726 [13:59:34<15:46:17, 61.92s/it] 47%|████▋ | 809/1726 [13:59:34<15:46:17, 61.92s/it] 47%|████▋ | 810/1726 [14:00:34<15:34:44, 61.23s/it] 47%|████▋ | 810/1726 [14:00:34<15:34:44, 61.23s/it] 47%|████▋ | 811/1726 [14:01:35<15:31:50, 61.10s/it] 4dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874 [2024-06-10 14:38:59,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.16 | bwd_microstep: 670.28 | bwd_inner_microstep: 670.21 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3907 [2024-06-10 14:39:01,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.58 | bwd_microstep: 1813.34 | bwd_inner_microstep: 1813.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 14:39:03,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1442.44 | bwd_inner_microstep: 1442.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-10 14:39:06,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.97 | bwd_microstep: 1544.46 | bwd_inner_microstep: 1544.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 14:39:08,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.85 | bwd_microstep: 1473.20 | bwd_inner_microstep: 1473.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928 [2024-06-10 14:39:09,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.60 | bwd_microstep: 787.63 | bwd_inner_microstep: 787.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2253 [2024-06-10 14:39:10,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 964.36 | bwd_inner_microstep: 964.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 14:39:12,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3722 [2024-06-10 14:39:14,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1461.24 | bwd_inner_microstep: 1461.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1878 [2024-06-10 14:39:15,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.00 | bwd_microstep: 709.29 | bwd_inner_microstep: 709.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3542 [2024-06-10 14:39:17,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1230.43 | bwd_inner_microstep: 1230.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438 [2024-06-10 14:39:18,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.78 | bwd_microstep: 1157.92 | bwd_inner_microstep: 1157.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 14:39:20,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1377.58 | bwd_inner_microstep: 1377.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3666 [2024-06-10 14:39:22,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.40 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1599.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3678 [2024-06-10 14:39:25,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.71 | bwd_microstep: 1657.81 | bwd_inner_microstep: 1657.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2397 [2024-06-10 14:39:26,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.26 | bwd_microstep: 1003.89 | bwd_inner_microstep: 1003.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 14:39:28,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.33 | bwd_microstep: 1609.42 | bwd_inner_microstep: 1609.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 14:39:30,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3821 [2024-06-10 14:39:32,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.26 | bwd_microstep: 1336.76 | bwd_inner_microstep: 1336.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 14:39:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1394.72 | bwd_inner_microstep: 1394.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 14:39:36,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1553.87 | bwd_inner_microstep: 1553.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1990 [2024-06-10 14:39:37,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.43 | bwd_microstep: 708.81 | bwd_inner_microstep: 708.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3480 [2024-06-10 14:39:39,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.67 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 14:39:41,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1355.65 | bwd_inner_microstep: 1355.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-10 14:39:42,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.30 | bwd_microstep: 1407.32 | bwd_inner_microstep: 1407.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-10 14:39:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1341.45 | bwd_inner_microstep: 1341.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 14:39:46,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1372.19 | bwd_inner_microstep: 1372.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3462 [2024-06-10 14:39:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1520.00 | bwd_inner_microstep: 1519.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 14:39:50,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1482.67 | bwd_inner_microstep: 1482.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557 [2024-06-10 14:39:53,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.91 | bwd_microstep: 1585.69 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 14:39:54,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1243.07 | bwd_inner_microstep: 1243.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 14:39:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.24 | optimizer_step: 6.63 [2024-06-10 14:39:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 4159.41 | bwd_inner_microstep: 1649.26 | bwd_allreduce_microstep: 2510.10 | step_microstep: 38.43 [2024-06-10 14:39:59,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15806.80 | bwd: 44839.60 | bwd_inner: 42328.55 | bwd_allreduce: 2510.35 | step: 39.89 {'loss': 1.2812, 'learning_rate': 2.2880067080112553e-05, 'epoch': 0.47} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 14:40:00,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 770.54 | bwd_inner_microstep: 770.45 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 14:40:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1339.58 | bwd_inner_microstep: 1339.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231 [2024-06-10 14:40:03,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.43 | bwd_microstep: 958.09 | bwd_inner_microstep: 958.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 14:40:05,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 14:40:07,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1338.99 | bwd_inner_microstep: 1338.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-10 14:40:09,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1530.88 | bwd_inner_microstep: 1530.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 14:40:11,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1248.08 | bwd_inner_microstep: 1248.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943 [2024-06-10 14:40:12,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.42 | bwd_microstep: 727.77 | bwd_inner_microstep: 727.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481 [2024-06-10 14:40:13,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.01 | bwd_microstep: 1214.86 | bwd_inner_microstep: 1214.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 14:40:15,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.55 | bwd_microstep: 1337.98 | bwd_inner_microstep: 1337.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2891 [2024-06-10 14:40:17,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 425.99 | bwd_microstep: 1119.36 | bwd_inner_microstep: 1119.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 14:40:19,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1378.53 | bwd_inner_microstep: 1378.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 14:40:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1444.38 | bwd_inner_microstep: 1444.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631 [2024-06-10 14:40:23,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.00 | bwd_microstep: 1805.49 | bwd_inner_microstep: 1805.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 14:40:25,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3654 [2024-06-10 14:40:27,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.05 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512 [2024-06-10 14:40:29,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1320.27 | bwd_inner_microstep: 1320.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 14:40:31,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 14:40:33,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.43 | bwd_microstep: 1553.00 | bwd_inner_microstep: 1552.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1987 [2024-06-10 14:40:34,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.15 | bwd_microstep: 770.61 | bwd_inner_microstep: 770.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 14:40:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.80 | bwd_microstep: 1156.69 | bwd_inner_microstep: 1156.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 14:40:37,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.54 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533 [2024-06-10 14:40:39,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.76 | bwd_microstep: 1197.22 | bwd_inner_microstep: 1197.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917 [2024-06-10 14:40:40,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.89 | bwd_microstep: 687.53 | bwd_inner_microstep: 687.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004 [2024-06-10 14:40:41,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.26 | bwd_microstep: 713.01 | bwd_inner_microstep: 712.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 14:40:43,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1313.30 | bwd_inner_microstep: 1313.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2026 [2024-06-10 14:40:44,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.92 | bwd_microstep: 933.54 | bwd_inner_microstep: 933.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2054 [2024-06-10 14:40:46,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.93 | bwd_microstep: 957.50 | bwd_inner_microstep: 957.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576 [2024-06-10 14:40:48,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1529.40 | bwd_inner_microstep: 1529.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 14:40:49,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 14:40:51,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.95 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-10 14:41:02,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.27 | optimizer_step: 6.60 [2024-06-10 14:41:02,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.42 | bwd_microstep: 9596.16 | bwd_inner_microstep: 1647.12 | bwd_allreduce_microstep: 7948.96 | step_microstep: 38.53 [2024-06-10 14:41:02,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14781.30 | bwd: 47515.19 | bwd_inner: 39565.24 | bwd_allreduce: 7949.25 | step: 39.99 {'loss': 1.2868, 'learning_rate': 2.2842919276713334e-05, 'epoch': 0.47} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419 [2024-06-10 14:41:04,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.46 | bwd_microstep: 1359.15 | bwd_inner_microstep: 1359.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898 [2024-06-10 14:41:06,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.04 | bwd_microstep: 1580.70 | bwd_inner_microstep: 1580.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949 [2024-06-10 14:41:07,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 825.72 | bwd_inner_microstep: 825.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 14:41:09,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1382.26 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803 [2024-06-10 14:41:11,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 14:41:13,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.19 | bwd_microstep: 1630.01 | bwd_inner_microstep: 1629.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 14:41:15,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1252.80 | bwd_inner_microstep: 1252.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 14:41:16,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100 [2024-06-10 14:41:17,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.62 | bwd_microstep: 918.43 | bwd_inner_microstep: 918.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719 [2024-06-10 14:41:19,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.14 | bwd_microstep: 1621.73 | bwd_inner_microstep: 1621.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492 [2024-06-10 14:41:21,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.40 | bwd_microstep: 1328.50 | bwd_inner_microstep: 1328.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 14:41:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2218 [2024-06-10 14:41:24,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.54 | bwd_microstep: 959.42 | bwd_inner_microstep: 959.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-10 14:41:26,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1444.72 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2963 [2024-06-10 14:41:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.92 | bwd_microstep: 1196.88 | bwd_inner_microstep: 1196.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 14:41:30,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.55 | bwd_microstep: 1610.01 | bwd_inner_microstep: 1609.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677 [2024-06-10 14:41:32,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.34 | bwd_microstep: 1621.95 | bwd_inner_microstep: 1621.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 14:41:34,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1351.61 | bwd_inner_microstep: 1351.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656 [2024-06-10 14:41:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.99 | bwd_microstep: 1620.43 | bwd_inner_microstep: 1620.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 14:41:39,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.38 | bwd_microstep: 1475.49 | bwd_inner_microstep: 1475.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 14:41:41,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.91 | bwd_microstep: 1479.62 | bwd_inner_microstep: 1479.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3738 [2024-06-10 14:41:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593 [2024-06-10 14:41:45,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.02 | bwd_microstep: 1432.45 | bwd_inner_microstep: 1432.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 14:41:46,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1246.94 | bwd_inner_microstep: 1246.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3516 [2024-06-10 14:41:48,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1509.96 | bwd_inner_microstep: 1509.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 14:41:50,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3813 [2024-06-10 14:41:52,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.03 | bwd_microstep: 1256.75 | bwd_inner_microstep: 1256.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 14:41:54,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1294.73 | bwd_inner_microstep: 1294.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271 [2024-06-10 14:41:55,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 876.74 | bwd_inner_microstep: 876.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 14:41:56,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 875.75 | bwd_inner_microstep: 875.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3778 [2024-06-10 14:41:58,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.60 | bwd_microstep: 1607.71 | bwd_inner_microstep: 1607.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 14:42:03,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.58 [2024-06-10 14:42:03,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 3609.01 | bwd_inner_microstep: 1577.34 | bwd_allreduce_microstep: 2031.62 | step_microstep: 37.72 [2024-06-10 14:42:03,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15902.87 | bwd: 44783.46 | bwd_inner: 42750.94 | bwd_allreduce: 2031.85 | step: 39.18 {'loss': 1.2105, 'learning_rate': 2.2805761460567197e-05, 'epoch': 0.47} dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2635 [2024-06-10 14:42:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.44 | bwd_microstep: 1008.18 | bwd_inner_microstep: 1008.10 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 14:42:06,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1379.18 | bwd_inner_microstep: 1379.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 14:42:08,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1378.69 | bwd_inner_microstep: 1378.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 14:42:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1299.65 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 14:42:12,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1385.41 | bwd_inner_microstep: 1385.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478 [2024-06-10 14:42:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.64 | bwd_microstep: 1409.89 | bwd_inner_microstep: 1409.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-10 14:42:16,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 14:42:18,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.98 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 14:42:19,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1298.00 | bwd_inner_microstep: 1297.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 14:42:20,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.43 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227 [2024-06-10 14:42:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.61 | bwd_microstep: 960.25 | bwd_inner_microstep: 960.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 14:42:24,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.85 | bwd_microstep: 1289.33 | bwd_inner_microstep: 1289.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719 [2024-06-10 14:42:26,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1559.85 | bwd_inner_microstep: 1559.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 14:42:28,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.91 | bwd_microstep: 1443.43 | bwd_inner_microstep: 1443.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 14:42:30,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1502.95 | bwd_inner_microstep: 1502.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 14:42:32,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1345.25 | bwd_inner_microstep: 1345.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3828 [2024-06-10 14:42:34,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.83 | bwd_microstep: 1858.25 | bwd_inner_microstep: 1858.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 14:42:36,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 14:42:38,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1255.30 | bwd_inner_microstep: 1255.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 14:42:40,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.72 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 14:42:42,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 14:42:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.09 | bwd_microstep: 1293.00 | bwd_inner_microstep: 1292.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 14:42:45,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.47 | bwd_microstep: 1296.15 | bwd_inner_microstep: 1296.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 14:42:48,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.15 | bwd_microstep: 1656.86 | bwd_inner_microstep: 1656.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 889 [2024-06-10 14:42:48,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.80 | bwd_microstep: 370.44 | bwd_inner_microstep: 370.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 14:42:50,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.95 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 14:42:52,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1351.05 | bwd_inner_microstep: 1351.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468 [2024-06-10 14:42:54,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1434.96 | bwd_inner_microstep: 1434.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3806 [2024-06-10 14:42:56,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.50 | bwd_microstep: 1696.13 | bwd_inner_microstep: 1696.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2284 [2024-06-10 14:42:57,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.10 | bwd_microstep: 782.23 | bwd_inner_microstep: 782.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 14:42:59,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1553.31 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582 [2024-06-10 14:43:07,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 14:43:07,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.15 | bwd_microstep: 6454.24 | bwd_inner_microstep: 1808.64 | bwd_allreduce_microstep: 4645.55 | step_microstep: 38.39 [2024-06-10 14:43:07,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16008.29 | bwd: 47551.36 | bwd_inner: 42904.85 | bwd_allreduce: 4645.81 | step: 39.90 {'loss': 1.2262, 'learning_rate': 2.2768593762543784e-05, 'epoch': 0.47} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 14:43:08,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 14:43:10,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1344.63 | bwd_inner_microstep: 1344.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3879 [2024-06-10 14:43:13,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.60 | bwd_microstep: 1582.17 | bwd_inner_microstep: 1582.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791 [2024-06-10 14:43:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.87 | bwd_microstep: 1646.33 | bwd_inner_microstep: 1646.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798 [2024-06-10 14:43:17,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.65 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 14:43:18,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.42 | bwd_microstep: 790.05 | bwd_inner_microstep: 790.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-10 14:43:19,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.93 | bwd_microstep: 727.55 | bwd_inner_microstep: 727.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3433 [2024-06-10 14:43:21,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.72 | bwd_microstep: 1214.54 | bwd_inner_microstep: 1214.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 14:43:23,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 14:43:25,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.33 | bwd_microstep: 1531.04 | bwd_inner_microstep: 1531.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443 [2024-06-10 14:43:27,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.55 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 14:43:29,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1478.57 | bwd_inner_microstep: 1478.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660 [2024-06-10 14:43:31,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.41 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 14:43:33,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.13 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-10 14:43:35,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.85 | bwd_microstep: 1585.03 | bwd_inner_microstep: 1585.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 14:43:37,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1326.08 | bwd_inner_microstep: 1326.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3522 [2024-06-10 14:43:38,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1322.87 | bwd_inner_microstep: 1322.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 14:43:40,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 14:43:42,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1391.61 | bwd_inner_microstep: 1391.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-10 14:43:44,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.04 | bwd_microstep: 1213.32 | bwd_inner_microstep: 1213.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154 [2024-06-10 14:43:45,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.55 | bwd_microstep: 851.76 | bwd_inner_microstep: 851.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 14:43:47,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1291.26 | bwd_inner_microstep: 1291.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-10 14:43:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2407 [2024-06-10 14:43:50,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.62 | bwd_microstep: 842.07 | bwd_inner_microstep: 842.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 14:43:52,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.34 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2269 [2024-06-10 14:43:53,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.68 | bwd_microstep: 907.15 | bwd_inner_microstep: 907.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1935 [2024-06-10 14:43:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.06 | bwd_microstep: 760.50 | bwd_inner_microstep: 760.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2216 [2024-06-10 14:43:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.89 | bwd_microstep: 831.86 | bwd_inner_microstep: 831.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636 [2024-06-10 14:43:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.37 | bwd_microstep: 1602.45 | bwd_inner_microstep: 1602.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670 [2024-06-10 14:44:00,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.18 | bwd_microstep: 1723.53 | bwd_inner_microstep: 1723.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584 [2024-06-10 14:44:02,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1526.58 | bwd_inner_microstep: 1526.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-10 14:44:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.18 | optimizer_step: 6.61 [2024-06-10 14:44:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.17 | bwd_microstep: 3759.90 | bwd_inner_microstep: 1918.86 | bwd_allreduce_microstep: 1840.99 | step_microstep: 37.72 [2024-06-10 14:44:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15742.97 | bwd: 44178.29 | bwd_inner: 42336.40 | bwd_allreduce: 1841.22 | step: 39.17 {'loss': 1.2647, 'learning_rate': 2.273141631354753e-05, 'epoch': 0.47} 7%|████▋ | 811/1726 [14:01:35<15:31:50, 61.10s/it] 47%|████▋ | 812/1726 [14:02:36<15:30:13, 61.07s/it] 47%|████▋ | 812/1726 [14:02:36<15:30:13, 61.07s/it] 47%|████▋ | 813/1726 [14:03:38<15:36:17, 61.53s/it] 47%|████▋ | 813/1726 [14:03:38<15:36:17, 61.53s/it] 47%|████▋ | 814/1726 [14:04:39<15:32:57, 61.38s/it] 47%|████▋ | 814/1726 [14:04:39<15:32:57, 61.38s/it] 47%|████▋ | 815/1726 [14:05:43<15:43:22, 62.13s/it] 47%|████▋ | 815/1726 [14:05:43<15:43:22, 62.13s/it] 47%|████▋ | 816/1726 [14:06:44<15:33:45, 61.57s/it] 47%|████▋ | 816/1726 [14:06:44<15:33:45, 61.57s/it]dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 14:44:09,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1236.10 | bwd_inner_microstep: 1236.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816 [2024-06-10 14:44:10,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.55 | bwd_microstep: 1354.28 | bwd_inner_microstep: 1354.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-10 14:44:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1314.59 | bwd_inner_microstep: 1314.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 14:44:14,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 14:44:16,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1439.79 | bwd_inner_microstep: 1439.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735 [2024-06-10 14:44:18,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.42 | bwd_microstep: 1532.00 | bwd_inner_microstep: 1531.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 14:44:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-10 14:44:22,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.88 | bwd_microstep: 1274.58 | bwd_inner_microstep: 1274.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 14:44:23,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.50 | bwd_microstep: 889.59 | bwd_inner_microstep: 889.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 14:44:25,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1246.11 | bwd_inner_microstep: 1246.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 14:44:27,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.56 | bwd_microstep: 1749.96 | bwd_inner_microstep: 1749.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678 [2024-06-10 14:44:29,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.06 | bwd_microstep: 1362.25 | bwd_inner_microstep: 1362.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 14:44:31,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1275.90 | bwd_inner_microstep: 1275.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3619 [2024-06-10 14:44:33,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1708.54 | bwd_inner_microstep: 1708.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3398 [2024-06-10 14:44:35,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1305.18 | bwd_inner_microstep: 1305.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 14:44:37,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1458.06 | bwd_inner_microstep: 1458.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2303 [2024-06-10 14:44:38,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.40 | bwd_microstep: 881.41 | bwd_inner_microstep: 881.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2312 [2024-06-10 14:44:39,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.37 | bwd_microstep: 822.56 | bwd_inner_microstep: 822.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099 [2024-06-10 14:44:40,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.64 | bwd_microstep: 922.35 | bwd_inner_microstep: 922.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 14:44:42,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.26 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3617 [2024-06-10 14:44:44,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.21 | bwd_microstep: 1556.62 | bwd_inner_microstep: 1556.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 14:44:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1280.43 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2080 [2024-06-10 14:44:47,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.20 | bwd_microstep: 753.75 | bwd_inner_microstep: 753.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 14:44:49,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1498.43 | bwd_inner_microstep: 1498.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 14:44:51,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.54 | bwd_inner_microstep: 1375.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 14:44:53,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1509.82 | bwd_inner_microstep: 1509.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 14:44:55,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1453.61 | bwd_inner_microstep: 1453.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465 [2024-06-10 14:44:57,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-10 14:44:59,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1438.16 | bwd_inner_microstep: 1438.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 14:45:01,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 1304.09 | bwd_inner_microstep: 1304.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 14:45:02,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 973.90 | bwd_inner_microstep: 973.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 14:45:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 14:45:06,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 3136.89 | bwd_inner_microstep: 2106.79 | bwd_allreduce_microstep: 1030.05 | step_microstep: 37.77 [2024-06-10 14:45:06,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15646.61 | bwd: 43273.77 | bwd_inner: 42242.82 | bwd_allreduce: 1030.28 | step: 39.24 {'loss': 1.2263, 'learning_rate': 2.2694229244517226e-05, 'epoch': 0.47} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 14:45:08,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1383.57 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4027 [2024-06-10 14:45:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.52 | bwd_microstep: 1712.46 | bwd_inner_microstep: 1712.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-10 14:45:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1184.53 | bwd_inner_microstep: 1184.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780 [2024-06-10 14:45:14,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1646.39 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 14:45:16,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1389.40 | bwd_inner_microstep: 1389.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 14:45:18,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1534.22 | bwd_inner_microstep: 1534.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 14:45:20,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2629 [2024-06-10 14:45:21,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.16 | bwd_microstep: 1017.21 | bwd_inner_microstep: 1017.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064 [2024-06-10 14:45:23,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.75 | bwd_microstep: 815.22 | bwd_inner_microstep: 815.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 14:45:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.93 | bwd_microstep: 1446.75 | bwd_inner_microstep: 1446.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413 [2024-06-10 14:45:27,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1446.39 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 14:45:28,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.61 | bwd_microstep: 1279.96 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498 [2024-06-10 14:45:31,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1579.93 | bwd_inner_microstep: 1579.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 14:45:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1384.62 | bwd_inner_microstep: 1384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014 [2024-06-10 14:45:34,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.05 | bwd_microstep: 899.72 | bwd_inner_microstep: 899.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3463 [2024-06-10 14:45:36,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.35 | bwd_microstep: 1500.13 | bwd_inner_microstep: 1500.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640 [2024-06-10 14:45:38,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1572.25 | bwd_inner_microstep: 1572.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637 [2024-06-10 14:45:40,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1319.71 | bwd_inner_microstep: 1319.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2406 [2024-06-10 14:45:41,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.15 | bwd_microstep: 940.29 | bwd_inner_microstep: 940.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-10 14:45:43,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1418.41 | bwd_inner_microstep: 1418.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3625 [2024-06-10 14:45:45,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1370.43 | bwd_inner_microstep: 1370.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817 [2024-06-10 14:45:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.45 | bwd_microstep: 1578.94 | bwd_inner_microstep: 1578.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670 [2024-06-10 14:45:49,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1325.63 | bwd_inner_microstep: 1325.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 14:45:51,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1494.19 | bwd_inner_microstep: 1494.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 14:45:53,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1301.36 | bwd_inner_microstep: 1301.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-10 14:45:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.69 | bwd_microstep: 697.36 | bwd_inner_microstep: 697.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 14:45:56,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1508.70 | bwd_inner_microstep: 1508.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 14:45:58,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1297.17 | bwd_inner_microstep: 1297.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 14:46:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.61 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685 [2024-06-10 14:46:02,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.22 | bwd_microstep: 1527.33 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3586 [2024-06-10 14:46:04,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1531.50 | bwd_inner_microstep: 1531.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2244 [2024-06-10 14:46:09,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 14:46:09,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.32 | bwd_microstep: 4262.52 | bwd_inner_microstep: 1100.85 | bwd_allreduce_microstep: 3161.62 | step_microstep: 37.98 [2024-06-10 14:46:09,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16018.63 | bwd: 46099.74 | bwd_inner: 42937.21 | bwd_allreduce: 3161.84 | step: 39.51 {'loss': 1.2571, 'learning_rate': 2.2657032686425517e-05, 'epoch': 0.47} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-10 14:46:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 14:46:11,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 787.28 | bwd_inner_microstep: 787.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 14:46:14,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.66 | bwd_microstep: 1550.90 | bwd_inner_microstep: 1550.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482 [2024-06-10 14:46:15,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1314.17 | bwd_inner_microstep: 1314.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 14:46:18,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.03 | bwd_microstep: 1547.72 | bwd_inner_microstep: 1547.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 14:46:19,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1245.58 | bwd_inner_microstep: 1245.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 14:46:21,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717 [2024-06-10 14:46:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1364.21 | bwd_inner_microstep: 1364.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 14:46:24,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.48 | bwd_microstep: 702.77 | bwd_inner_microstep: 702.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017 [2024-06-10 14:46:26,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.04 | bwd_microstep: 1716.69 | bwd_inner_microstep: 1716.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3750 [2024-06-10 14:46:29,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.17 | bwd_microstep: 1636.17 | bwd_inner_microstep: 1636.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 14:46:31,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1346.05 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 14:46:33,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1446.64 | bwd_inner_microstep: 1446.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 14:46:35,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.97 | bwd_microstep: 1486.43 | bwd_inner_microstep: 1486.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2023 [2024-06-10 14:46:36,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.93 | bwd_microstep: 838.44 | bwd_inner_microstep: 838.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3467 [2024-06-10 14:46:37,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.03 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3639 [2024-06-10 14:46:39,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.08 | bwd_microstep: 1317.83 | bwd_inner_microstep: 1317.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 14:46:41,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1459.59 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 639 [2024-06-10 14:46:42,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.28 | bwd_microstep: 264.82 | bwd_inner_microstep: 264.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 14:46:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 14:46:45,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1384.40 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966 [2024-06-10 14:46:46,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 701.30 | bwd_inner_microstep: 701.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 14:46:48,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.43 | bwd_microstep: 1358.80 | bwd_inner_microstep: 1358.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 14:46:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1508.68 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 14:46:53,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.20 | bwd_microstep: 1651.40 | bwd_inner_microstep: 1651.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603 [2024-06-10 14:46:54,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.44 | bwd_microstep: 1213.38 | bwd_inner_microstep: 1213.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011 [2024-06-10 14:46:55,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.09 | bwd_microstep: 805.65 | bwd_inner_microstep: 805.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285 [2024-06-10 14:46:57,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.64 | bwd_microstep: 915.15 | bwd_inner_microstep: 915.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580 [2024-06-10 14:46:59,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.38 | bwd_microstep: 1519.86 | bwd_inner_microstep: 1519.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 14:47:01,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3815 [2024-06-10 14:47:03,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.23 | bwd_microstep: 1433.28 | bwd_inner_microstep: 1433.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 14:47:09,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 14:47:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 6153.67 | bwd_inner_microstep: 1434.07 | bwd_allreduce_microstep: 4719.54 | step_microstep: 37.87 [2024-06-10 14:47:09,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15194.27 | bwd: 45337.93 | bwd_inner: 40617.48 | bwd_allreduce: 4719.77 | step: 39.43 {'loss': 1.2723, 'learning_rate': 2.261982677027851e-05, 'epoch': 0.47} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 14:47:10,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.31 | bwd_microstep: 670.32 | bwd_inner_microstep: 670.23 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 14:47:12,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1475.60 | bwd_inner_microstep: 1475.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 14:47:14,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1476.45 | bwd_inner_microstep: 1476.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-10 14:47:17,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.61 | bwd_microstep: 1542.86 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 14:47:19,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1549.36 | bwd_inner_microstep: 1549.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 14:47:20,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 14:47:22,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1294.50 | bwd_inner_microstep: 1294.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 14:47:24,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.37 | bwd_microstep: 1641.29 | bwd_inner_microstep: 1641.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 14:47:26,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.21 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692 [2024-06-10 14:47:28,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1521.80 | bwd_inner_microstep: 1521.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3679 [2024-06-10 14:47:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1549.92 | bwd_inner_microstep: 1549.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701 [2024-06-10 14:47:33,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.49 | bwd_microstep: 1617.04 | bwd_inner_microstep: 1617.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3661 [2024-06-10 14:47:35,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1655.42 | bwd_inner_microstep: 1655.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3512 [2024-06-10 14:47:37,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1680.38 | bwd_inner_microstep: 1680.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 14:47:40,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1551.37 | bwd_inner_microstep: 1551.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 14:47:42,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.16 | bwd_microstep: 1603.83 | bwd_inner_microstep: 1603.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 14:47:44,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1645.82 | bwd_inner_microstep: 1645.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 14:47:46,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642 [2024-06-10 14:47:48,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1412.53 | bwd_inner_microstep: 1412.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077 [2024-06-10 14:47:49,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.67 | bwd_microstep: 917.14 | bwd_inner_microstep: 917.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514 [2024-06-10 14:47:51,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1580.86 | bwd_inner_microstep: 1580.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518 [2024-06-10 14:47:53,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1420.01 | bwd_inner_microstep: 1419.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821 [2024-06-10 14:47:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.66 | bwd_microstep: 1753.23 | bwd_inner_microstep: 1753.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 14:47:58,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1461.11 | bwd_inner_microstep: 1461.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 14:48:00,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.55 | bwd_microstep: 1548.57 | bwd_inner_microstep: 1548.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447 [2024-06-10 14:48:02,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1377.33 | bwd_inner_microstep: 1377.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 14:48:04,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.09 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-10 14:48:05,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.75 | bwd_microstep: 1401.89 | bwd_inner_microstep: 1401.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3581 [2024-06-10 14:48:07,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1349.73 | bwd_inner_microstep: 1349.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 14:48:09,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.79 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3766 [2024-06-10 14:48:11,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1373.59 | bwd_inner_microstep: 1373.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 14:48:13,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.67 [2024-06-10 14:48:13,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.63 | bwd_microstep: 1441.61 | bwd_inner_microstep: 1433.85 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.75 [2024-06-10 14:48:13,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17223.09 | bwd: 46280.65 | bwd_inner: 46271.96 | bwd_allreduce: 7.99 | step: 39.22 {'loss': 1.3042, 'learning_rate': 2.258261162711523e-05, 'epoch': 0.48} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 14:48:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1488.02 | bwd_inner_microstep: 1487.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 14:48:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866 [2024-06-10 14:48:18,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.84 | bwd_inner_microstep: 676.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 14:48:20,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1378.36 | bwd_inner_microstep: 1378.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-10 14:48:22,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1637.45 | bwd_inner_microstep: 1637.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421 [2024-06-10 14:48:24,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.66 | bwd_microstep: 1185.04 | bwd_inner_microstep: 1185.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925 [2024-06-10 14:48:25,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.65 | bwd_microstep: 758.71 | bwd_inner_microstep: 758.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 14:48:27,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.95 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694 [2024-06-10 14:48:29,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.23 | bwd_microstep: 1624.32 | bwd_inner_microstep: 1624.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 14:48:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1387.89 | bwd_inner_microstep: 1387.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3677 [2024-06-10 14:48:33,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.73 | bwd_microstep: 1570.17 | bwd_inner_microstep: 1570.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3517 [2024-06-10 14:48:35,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1513.82 | bwd_inner_microstep: 1513.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 14:48:37,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.26 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949 [2024-06-10 14:48:38,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 821.76 | bwd_inner_microstep: 821.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3645 [2024-06-10 14:48:41,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.11 | bwd_microstep: 1758.40 | bwd_inner_microstep: 1758.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3646 [2024-06-10 14:48:43,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.02 | bwd_microstep: 1641.65 | bwd_inner_microstep: 1641.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632 [2024-06-10 14:48:45,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.40 | bwd_microstep: 1248.20 | bwd_inner_microstep: 1248.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 14:48:47,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1375.03 | bwd_inner_microstep: 1375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 14:48:48,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991 [2024-06-10 14:48:50,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.33 | bwd_microstep: 832.13 | bwd_inner_microstep: 832.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 14:48:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 14:48:53,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1256.72 | bwd_inner_microstep: 1256.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 14:48:55,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.11 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 14:48:57,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.45 | bwd_microstep: 804.25 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585 [2024-06-10 14:48:59,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.87 | bwd_microstep: 1533.29 | bwd_inner_microstep: 1533.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2367 [2024-06-10 14:49:00,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 409.75 | bwd_microstep: 1119.00 | bwd_inner_microstep: 1118.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1927 [2024-06-10 14:49:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.72 | bwd_microstep: 759.19 | bwd_inner_microstep: 759.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 14:49:03,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1604.18 | bwd_inner_microstep: 1604.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934 [2024-06-10 14:49:04,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 756.68 | bwd_inner_microstep: 756.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586 [2024-06-10 14:49:06,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1338.44 | bwd_inner_microstep: 1338.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 14:49:08,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.65 | bwd_microstep: 1546.04 | bwd_inner_microstep: 1546.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-10 14:49:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.15 | optimizer_gradients: 4.30 | optimizer_step: 6.62 [2024-06-10 14:49:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.59 | bwd_microstep: 4100.15 | bwd_inner_microstep: 777.93 | bwd_allreduce_microstep: 3322.15 | step_microstep: 41.17 [2024-06-10 14:49:13,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15210.39 | bwd: 44113.85 | bwd_inner: 40790.79 | bwd_allreduce: 3322.39 | step: 42.63 {'loss': 1.3127, 'learning_rate': 2.2545387388007227e-05, 'epoch': 0.48} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 14:49:15,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1475.35 | bwd_inner_microstep: 1475.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 14:49:17,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 14:49:19,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1375.09 | bwd_inner_microstep: 1375.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3788 [2024-06-10 14:49:21,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1510.31 | bwd_inner_microstep: 1510.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 14:49:23,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.67 | bwd_microstep: 1341.00 | bwd_inner_microstep: 1340.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 14:49:25,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.89 | bwd_microstep: 1534.71 | bwd_inner_microstep: 1534.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 14:49:27,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 14:49:28,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1246.73 | bwd_inner_microstep: 1246.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 14:49:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.80 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-10 14:49:31,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 790.32 | bwd_inner_microstep: 790.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 14:49:33,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.03 | bwd_microstep: 1495.28 | bwd_inner_microstep: 1495.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 14:49:35,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.25 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958 [2024-06-10 14:49:36,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.96 | bwd_microstep: 822.73 | bwd_inner_microstep: 822.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 14:49:38,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1486.13 | bwd_inner_microstep: 1486.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437 [2024-06-10 14:49:40,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2127 [2024-06-10 14:49:41,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.65 | bwd_microstep: 829.46 | bwd_inner_microstep: 829.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 14:49:43,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1414.68 | bwd_inner_microstep: 1414.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-10 14:49:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.13 | bwd_microstep: 1611.92 | bwd_inner_microstep: 1611.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616 [2024-06-10 14:49:47,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.16 | bwd_microstep: 1310.62 | bwd_inner_microstep: 1310.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3519 [2024-06-10 14:49:50,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.50 | bwd_microstep: 1634.26 | bwd_inner_microstep: 1634.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842 [2024-06-10 14:49:52,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1458.06 | bwd_inner_microstep: 1458.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823 [2024-06-10 14:49:54,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.38 | bwd_microstep: 1752.21 | bwd_inner_microstep: 1752.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3710 [2024-06-10 14:49:56,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1487.25 | bwd_inner_microstep: 1487.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518 [2024-06-10 14:49:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1512.19 | bwd_inner_microstep: 1512.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3808 [2024-06-10 14:50:00,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1419.44 | bwd_inner_microstep: 1419.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815 [2024-06-10 14:50:02,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.11 | bwd_microstep: 1387.73 | bwd_inner_microstep: 1387.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 14:50:04,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.13 | bwd_microstep: 1405.17 | bwd_inner_microstep: 1405.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518 [2024-06-10 14:50:06,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.85 | bwd_microstep: 1227.47 | bwd_inner_microstep: 1227.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3778 [2024-06-10 14:50:08,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.81 | bwd_microstep: 1580.82 | bwd_inner_microstep: 1580.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 14:50:09,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.40 | bwd_microstep: 1185.34 | bwd_inner_microstep: 1185.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 14:50:12,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1456.93 | bwd_inner_microstep: 1456.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3830 [2024-06-10 14:50:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 14:50:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.46 | bwd_microstep: 3205.73 | bwd_inner_microstep: 1938.12 | bwd_allreduce_microstep: 1267.56 | step_microstep: 37.64 [2024-06-10 14:50:15,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16505.75 | bwd: 45668.90 | bwd_inner: 44400.43 | bwd_allreduce: 1267.79 | step: 39.09 47%|████▋ | 817/1726 [14:07:43<15:22:13, 60.87s/it] 47%|████▋ | 817/1726 [14:07:43<15:22:13, 60.87s/it] 47%|████▋ | 818/1726 [14:08:45<15:28:26, 61.35s/it] 47%|████▋ | 818/1726 [14:08:45<15:28:26, 61.35s/it] 47%|████▋ | 819/1726 [14:09:46<15:25:11, 61.20s/it] 47%|████▋ | 819/1726 [14:09:46<15:25:11, 61.20s/it] 48%|████▊ | 820/1726 [14:10:50<15:36:08, 62.00s/it] 48%|████▊ | 820/1726 [14:10:50<15:36:08, 62.00s/it] 48%|████▊ | 821/1726 [14:11:50<15:24:30, 61.29s/it] 48%|████▊ | 821/1726 [14:11:50<15:24:30, 61.29s/it] 48%|████▊ | 822/1726 [14:12:52<15:28:58, 61.66s/{'loss': 1.2302, 'learning_rate': 2.2508154184058077e-05, 'epoch': 0.48} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929 [2024-06-10 14:50:16,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.36 | bwd_microstep: 717.18 | bwd_inner_microstep: 717.05 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4415 [2024-06-10 14:50:19,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.01 | bwd_microstep: 1618.78 | bwd_inner_microstep: 1618.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2417 [2024-06-10 14:50:20,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.68 | bwd_microstep: 938.55 | bwd_inner_microstep: 938.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768 [2024-06-10 14:50:22,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.09 | bwd_microstep: 1369.08 | bwd_inner_microstep: 1369.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2470 [2024-06-10 14:50:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.35 | bwd_microstep: 951.25 | bwd_inner_microstep: 951.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469 [2024-06-10 14:50:25,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.02 | bwd_microstep: 1310.25 | bwd_inner_microstep: 1310.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 14:50:27,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433 [2024-06-10 14:50:28,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.78 | bwd_microstep: 1154.98 | bwd_inner_microstep: 1154.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 14:50:30,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1525.06 | bwd_inner_microstep: 1525.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 14:50:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.52 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3674 [2024-06-10 14:50:35,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.30 | bwd_microstep: 1666.03 | bwd_inner_microstep: 1666.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 14:50:37,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.39 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 14:50:39,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1440.69 | bwd_inner_microstep: 1440.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3561 [2024-06-10 14:50:41,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.32 | bwd_microstep: 1693.72 | bwd_inner_microstep: 1693.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 14:50:43,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.72 | bwd_microstep: 1394.31 | bwd_inner_microstep: 1394.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3840 [2024-06-10 14:50:45,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1557.77 | bwd_inner_microstep: 1557.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 14:50:47,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-10 14:50:49,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.44 | bwd_microstep: 1189.57 | bwd_inner_microstep: 1189.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 14:50:51,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1395.29 | bwd_inner_microstep: 1395.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3820 [2024-06-10 14:50:53,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1600.65 | bwd_inner_microstep: 1600.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 14:50:55,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1485.85 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 14:50:57,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1496.27 | bwd_inner_microstep: 1496.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476 [2024-06-10 14:50:59,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.04 | bwd_microstep: 1345.54 | bwd_inner_microstep: 1345.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 14:51:01,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1555.06 | bwd_inner_microstep: 1555.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 14:51:03,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1508.17 | bwd_inner_microstep: 1508.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 14:51:05,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.36 | bwd_microstep: 1158.61 | bwd_inner_microstep: 1158.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 14:51:07,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.62 | bwd_microstep: 1432.68 | bwd_inner_microstep: 1432.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 14:51:09,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1400.69 | bwd_inner_microstep: 1400.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034 [2024-06-10 14:51:10,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.81 | bwd_microstep: 905.67 | bwd_inner_microstep: 905.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 14:51:12,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.03 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3771 [2024-06-10 14:51:14,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1604.08 | bwd_inner_microstep: 1604.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 14:51:17,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.20 | optimizer_step: 6.57 [2024-06-10 14:51:17,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.24 | bwd_microstep: 2686.27 | bwd_inner_microstep: 1830.32 | bwd_allreduce_microstep: 855.91 | step_microstep: 38.19 [2024-06-10 14:51:17,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16516.48 | bwd: 45245.31 | bwd_inner: 44388.41 | bwd_allreduce: 856.18 | step: 39.84 {'loss': 1.2389, 'learning_rate': 2.2470912146402935e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 14:51:19,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.46 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530 [2024-06-10 14:51:21,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.55 | bwd_microstep: 1195.07 | bwd_inner_microstep: 1195.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2414 [2024-06-10 14:51:22,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.19 | bwd_microstep: 1001.69 | bwd_inner_microstep: 1001.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-10 14:51:25,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.06 | bwd_microstep: 1639.85 | bwd_inner_microstep: 1639.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 14:51:27,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.95 | bwd_microstep: 1542.60 | bwd_inner_microstep: 1542.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 14:51:29,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 14:51:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.80 | bwd_microstep: 1341.26 | bwd_inner_microstep: 1341.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3543 [2024-06-10 14:51:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.98 | bwd_microstep: 1197.14 | bwd_inner_microstep: 1197.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235 [2024-06-10 14:51:33,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.16 | bwd_microstep: 863.84 | bwd_inner_microstep: 863.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487 [2024-06-10 14:51:35,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.54 | bwd_microstep: 1189.05 | bwd_inner_microstep: 1189.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3410 [2024-06-10 14:51:37,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.66 | bwd_microstep: 1387.68 | bwd_inner_microstep: 1387.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2091 [2024-06-10 14:51:38,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.42 | bwd_microstep: 793.45 | bwd_inner_microstep: 793.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 14:51:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1386.21 | bwd_inner_microstep: 1386.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 14:51:42,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1517.91 | bwd_inner_microstep: 1517.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2887 [2024-06-10 14:51:44,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.04 | bwd_microstep: 1182.37 | bwd_inner_microstep: 1182.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 14:51:46,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831 [2024-06-10 14:51:47,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1358.28 | bwd_inner_microstep: 1358.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150 [2024-06-10 14:51:49,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.78 | bwd_microstep: 950.35 | bwd_inner_microstep: 950.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3687 [2024-06-10 14:51:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.94 | bwd_microstep: 1553.19 | bwd_inner_microstep: 1553.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 14:51:53,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.65 | bwd_microstep: 1659.03 | bwd_inner_microstep: 1659.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 14:51:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1253.08 | bwd_inner_microstep: 1253.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640 [2024-06-10 14:51:57,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.19 | bwd_microstep: 1409.15 | bwd_inner_microstep: 1409.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 14:51:58,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 800.36 | bwd_inner_microstep: 800.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 14:52:00,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1407.22 | bwd_inner_microstep: 1407.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 14:52:02,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3828 [2024-06-10 14:52:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.06 | bwd_microstep: 1723.55 | bwd_inner_microstep: 1723.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-10 14:52:06,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.81 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-10 14:52:07,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.84 | bwd_microstep: 910.88 | bwd_inner_microstep: 910.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486 [2024-06-10 14:52:09,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.31 | bwd_microstep: 1574.83 | bwd_inner_microstep: 1574.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3767 [2024-06-10 14:52:11,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.00 | bwd_microstep: 1607.10 | bwd_inner_microstep: 1607.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400 [2024-06-10 14:52:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.05 | bwd_microstep: 1439.06 | bwd_inner_microstep: 1439.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2053 [2024-06-10 14:52:19,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.22 | optimizer_step: 6.60 [2024-06-10 14:52:19,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.66 | bwd_microstep: 5666.15 | bwd_inner_microstep: 931.12 | bwd_allreduce_microstep: 4734.98 | step_microstep: 37.97 [2024-06-10 14:52:19,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.52 | bwd: 45965.81 | bwd_inner: 41229.78 | bwd_allreduce: 4735.29 | step: 39.55 {'loss': 1.2243, 'learning_rate': 2.2433661406208055e-05, 'epoch': 0.48} dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2704 [2024-06-10 14:52:21,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.28 | bwd_microstep: 1026.28 | bwd_inner_microstep: 1026.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 14:52:22,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1339.68 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 14:52:24,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.58 | bwd_microstep: 786.84 | bwd_inner_microstep: 786.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1957 [2024-06-10 14:52:25,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.57 | bwd_microstep: 730.78 | bwd_inner_microstep: 730.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3760 [2024-06-10 14:52:27,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1488.06 | bwd_inner_microstep: 1488.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 14:52:28,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.84 | bwd_microstep: 1287.88 | bwd_inner_microstep: 1287.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 14:52:30,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1387.20 | bwd_inner_microstep: 1387.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769 [2024-06-10 14:52:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.52 | bwd_microstep: 1444.71 | bwd_inner_microstep: 1444.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 14:52:34,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1286.10 | bwd_inner_microstep: 1286.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415 [2024-06-10 14:52:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.28 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3877 [2024-06-10 14:52:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.32 | bwd_microstep: 1718.76 | bwd_inner_microstep: 1718.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3669 [2024-06-10 14:52:40,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1386.46 | bwd_inner_microstep: 1386.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 14:52:42,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 14:52:44,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.85 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 14:52:46,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1604.45 | bwd_inner_microstep: 1604.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543 [2024-06-10 14:52:48,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.87 | bwd_microstep: 1589.09 | bwd_inner_microstep: 1589.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503 [2024-06-10 14:52:50,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 14:52:52,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.64 | bwd_microstep: 1352.46 | bwd_inner_microstep: 1352.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078 [2024-06-10 14:52:53,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.03 | bwd_microstep: 848.58 | bwd_inner_microstep: 848.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-10 14:52:55,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.61 | bwd_microstep: 1710.79 | bwd_inner_microstep: 1710.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2214 [2024-06-10 14:52:57,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.61 | bwd_microstep: 958.56 | bwd_inner_microstep: 958.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1973 [2024-06-10 14:52:57,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.26 | bwd_microstep: 704.65 | bwd_inner_microstep: 704.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960 [2024-06-10 14:52:58,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.32 | bwd_microstep: 701.90 | bwd_inner_microstep: 701.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-10 14:53:00,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.77 | bwd_microstep: 1340.97 | bwd_inner_microstep: 1340.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 14:53:02,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1398.96 | bwd_inner_microstep: 1398.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 14:53:04,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 1497.72 | bwd_inner_microstep: 1497.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 14:53:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.17 | bwd_microstep: 1607.09 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 14:53:09,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1553.63 | bwd_inner_microstep: 1553.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3724 [2024-06-10 14:53:11,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.08 | bwd_microstep: 1669.26 | bwd_inner_microstep: 1669.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273 [2024-06-10 14:53:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.59 | bwd_microstep: 1072.01 | bwd_inner_microstep: 1071.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 14:53:14,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1254.88 | bwd_inner_microstep: 1254.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 14:53:21,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 14:53:21,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.26 | bwd_microstep: 6331.68 | bwd_inner_microstep: 1686.32 | bwd_allreduce_microstep: 4645.31 | step_microstep: 38.06 [2024-06-10 14:53:21,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15440.05 | bwd: 46131.69 | bwd_inner: 41485.36 | bwd_allreduce: 4645.59 | step: 39.59 {'loss': 1.234, 'learning_rate': 2.2396402094670345e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 14:53:23,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.76 | bwd_microstep: 1327.62 | bwd_inner_microstep: 1327.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 14:53:25,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.45 | bwd_microstep: 1274.49 | bwd_inner_microstep: 1274.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-10 14:53:26,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1279.02 | bwd_inner_microstep: 1278.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 14:53:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.52 | bwd_microstep: 1272.71 | bwd_inner_microstep: 1272.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3629 [2024-06-10 14:53:30,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.76 | bwd_microstep: 1457.84 | bwd_inner_microstep: 1457.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 14:53:32,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.41 | bwd_microstep: 1279.97 | bwd_inner_microstep: 1279.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043 [2024-06-10 14:53:33,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.71 | bwd_microstep: 806.90 | bwd_inner_microstep: 806.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 14:53:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1249.77 | bwd_inner_microstep: 1249.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 14:53:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.49 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622 [2024-06-10 14:53:39,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1312.67 | bwd_inner_microstep: 1312.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 14:53:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 14:53:41,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.83 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2202 [2024-06-10 14:53:43,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.61 | bwd_microstep: 893.45 | bwd_inner_microstep: 893.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3386 [2024-06-10 14:53:44,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.89 | bwd_microstep: 1240.10 | bwd_inner_microstep: 1240.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505 [2024-06-10 14:53:46,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1428.04 | bwd_inner_microstep: 1428.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 14:53:49,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.81 | bwd_microstep: 1610.21 | bwd_inner_microstep: 1610.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1967 [2024-06-10 14:53:50,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.50 | bwd_microstep: 747.28 | bwd_inner_microstep: 747.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 14:53:52,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.95 | bwd_microstep: 1519.86 | bwd_inner_microstep: 1519.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 14:53:53,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1255.99 | bwd_inner_microstep: 1255.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 14:53:55,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1453.66 | bwd_inner_microstep: 1453.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2139 [2024-06-10 14:53:56,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.84 | bwd_microstep: 736.80 | bwd_inner_microstep: 736.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536 [2024-06-10 14:53:58,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1419.90 | bwd_inner_microstep: 1419.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587 [2024-06-10 14:54:01,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.36 | bwd_microstep: 1566.82 | bwd_inner_microstep: 1566.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 14:54:03,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1552.83 | bwd_inner_microstep: 1552.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 14:54:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.95 | bwd_microstep: 1374.13 | bwd_inner_microstep: 1374.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 14:54:07,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.26 | bwd_microstep: 1658.32 | bwd_inner_microstep: 1658.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 14:54:09,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1404.98 | bwd_inner_microstep: 1404.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 14:54:10,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.01 | bwd_microstep: 1185.55 | bwd_inner_microstep: 1185.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 14:54:13,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1655.53 | bwd_inner_microstep: 1655.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3753 [2024-06-10 14:54:15,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.58 | bwd_microstep: 1804.12 | bwd_inner_microstep: 1804.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3548 [2024-06-10 14:54:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1534.14 | bwd_inner_microstep: 1534.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456 [2024-06-10 14:54:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.37 | optimizer_step: 6.62 [2024-06-10 14:54:23,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 4845.57 | bwd_inner_microstep: 1566.47 | bwd_allreduce_microstep: 3279.03 | step_microstep: 39.41 [2024-06-10 14:54:23,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15764.93 | bwd: 45556.87 | bwd_inner: 42276.92 | bwd_allreduce: 3279.27 | step: 40.91 {'loss': 1.2255, 'learning_rate': 2.2359134343016926e-05, 'epoch': 0.48} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 14:54:25,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.51 | bwd_microstep: 1274.06 | bwd_inner_microstep: 1274.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943 [2024-06-10 14:54:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.28 | bwd_microstep: 758.16 | bwd_inner_microstep: 758.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 14:54:27,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1343.02 | bwd_inner_microstep: 1343.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 14:54:30,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1554.53 | bwd_inner_microstep: 1554.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 14:54:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1538.38 | bwd_inner_microstep: 1538.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407 [2024-06-10 14:54:33,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.98 | bwd_microstep: 1178.87 | bwd_inner_microstep: 1178.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 14:54:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1244.90 | bwd_inner_microstep: 1244.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 14:54:37,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.21 | bwd_microstep: 1278.21 | bwd_inner_microstep: 1278.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3692 [2024-06-10 14:54:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.83 | bwd_microstep: 1604.78 | bwd_inner_microstep: 1604.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 14:54:41,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3676 [2024-06-10 14:54:43,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.98 | bwd_microstep: 1447.09 | bwd_inner_microstep: 1447.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384 [2024-06-10 14:54:45,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.70 | bwd_microstep: 1433.31 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 14:54:47,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.46 | bwd_microstep: 1479.34 | bwd_inner_microstep: 1479.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 14:54:49,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1245.50 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 14:54:51,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.90 | bwd_microstep: 1274.39 | bwd_inner_microstep: 1274.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 14:54:52,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1350.41 | bwd_inner_microstep: 1350.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003 [2024-06-10 14:54:54,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.82 | bwd_microstep: 898.59 | bwd_inner_microstep: 898.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3676 [2024-06-10 14:54:56,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.20 | bwd_microstep: 1358.24 | bwd_inner_microstep: 1358.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 14:54:58,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1395.06 | bwd_inner_microstep: 1395.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 14:55:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1557.09 | bwd_inner_microstep: 1557.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 14:55:01,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.15 | bwd_microstep: 974.96 | bwd_inner_microstep: 974.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 14:55:03,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1295.75 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 14:55:04,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 797.43 | bwd_inner_microstep: 797.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 14:55:05,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.74 | bwd_microstep: 698.02 | bwd_inner_microstep: 697.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 14:55:07,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1510.28 | bwd_inner_microstep: 1510.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777 [2024-06-10 14:55:09,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1348.13 | bwd_inner_microstep: 1348.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2432 [2024-06-10 14:55:10,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.07 | bwd_microstep: 876.25 | bwd_inner_microstep: 876.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-10 14:55:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1509.29 | bwd_inner_microstep: 1509.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 14:55:14,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.72 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-10 14:55:16,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.26 | bwd_microstep: 1481.30 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3588 [2024-06-10 14:55:18,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.40 | bwd_microstep: 1366.93 | bwd_inner_microstep: 1366.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 14:55:26,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 14:55:26,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 7377.35 | bwd_inner_microstep: 1521.18 | bwd_allreduce_microstep: 5856.12 | step_microstep: 38.28 [2024-06-10 14:55:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15467.27 | bwd: 47215.90 | bwd_inner: 41358.88 | bwd_allreduce: 5856.34 | step: 39.71 {'loss': 1.2273, 'learning_rate': 2.2321858282504606e-05, 'epoch': 0.48} it] 48%|████▊ | 822/1726 [14:12:52<15:28:58, 61.66s/it] 48%|████▊ | 823/1726 [14:13:54<15:29:59, 61.79s/it] 48%|████▊ | 823/1726 [14:13:54<15:29:59, 61.79s/it] 48%|████▊ | 824/1726 [14:14:56<15:28:36, 61.77s/it] 48%|████▊ | 824/1726 [14:14:56<15:28:36, 61.77s/it] 48%|████▊ | 825/1726 [14:15:58<15:28:12, 61.81s/it] 48%|████▊ | 825/1726 [14:15:58<15:28:12, 61.81s/it] 48%|████▊ | 826/1726 [14:17:00<15:26:28, 61.76s/it] 48%|████▊ | 826/1726 [14:17:00<15:26:28, 61.76s/it] 48%|████▊ | 827/1726 [14:18:03<15:31:01, 62.14s/it] 4dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 14:55:28,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.51 | bwd_microstep: 1467.92 | bwd_inner_microstep: 1467.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3985 [2024-06-10 14:55:30,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.44 | bwd_microstep: 1702.59 | bwd_inner_microstep: 1702.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1856 [2024-06-10 14:55:31,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.65 | bwd_microstep: 670.20 | bwd_inner_microstep: 670.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 14:55:33,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.02 | bwd_microstep: 1374.28 | bwd_inner_microstep: 1374.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3800 [2024-06-10 14:55:35,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.51 | bwd_microstep: 1412.28 | bwd_inner_microstep: 1412.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-10 14:55:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.12 | bwd_microstep: 1534.94 | bwd_inner_microstep: 1534.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-10 14:55:39,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.91 | bwd_microstep: 1435.68 | bwd_inner_microstep: 1435.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-10 14:55:41,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.19 | bwd_microstep: 1189.05 | bwd_inner_microstep: 1189.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627 [2024-06-10 14:55:43,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1351 [2024-06-10 14:55:43,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.94 | bwd_microstep: 550.81 | bwd_inner_microstep: 550.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4274 [2024-06-10 14:55:46,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 686.77 | bwd_microstep: 1874.79 | bwd_inner_microstep: 1874.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1989 [2024-06-10 14:55:47,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.71 | bwd_microstep: 735.00 | bwd_inner_microstep: 734.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2946 [2024-06-10 14:55:48,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.30 | bwd_microstep: 1014.74 | bwd_inner_microstep: 1014.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 14:55:50,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.52 | bwd_microstep: 1321.39 | bwd_inner_microstep: 1321.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 14:55:52,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1481.62 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 14:55:54,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1488.20 | bwd_inner_microstep: 1488.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518 [2024-06-10 14:55:56,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.31 | bwd_microstep: 1586.68 | bwd_inner_microstep: 1586.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390 [2024-06-10 14:55:58,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.00 | bwd_microstep: 1274.45 | bwd_inner_microstep: 1274.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 14:56:00,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3553 [2024-06-10 14:56:02,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.81 | bwd_microstep: 1557.65 | bwd_inner_microstep: 1557.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919 [2024-06-10 14:56:04,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.29 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 14:56:06,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 14:56:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1389.17 | bwd_inner_microstep: 1389.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3611 [2024-06-10 14:56:09,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1340.64 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 14:56:12,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1608.88 | bwd_inner_microstep: 1608.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813 [2024-06-10 14:56:14,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.07 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3720 [2024-06-10 14:56:16,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.84 | bwd_microstep: 1732.54 | bwd_inner_microstep: 1732.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813 [2024-06-10 14:56:18,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.05 | bwd_microstep: 1704.97 | bwd_inner_microstep: 1704.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3770 [2024-06-10 14:56:20,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1502.80 | bwd_inner_microstep: 1502.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 14:56:22,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1258.70 | bwd_inner_microstep: 1258.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 14:56:24,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1495.44 | bwd_inner_microstep: 1495.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3809 [2024-06-10 14:56:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-10 14:56:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1858.66 | bwd_inner_microstep: 1778.21 | bwd_allreduce_microstep: 80.41 | step_microstep: 37.61 [2024-06-10 14:56:27,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16419.00 | bwd: 44186.21 | bwd_inner: 44104.90 | bwd_allreduce: 80.63 | step: 39.12 {'loss': 1.2363, 'learning_rate': 2.228457404441949e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 14:56:29,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1370.35 | bwd_inner_microstep: 1370.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 14:56:30,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1244.83 | bwd_inner_microstep: 1244.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 14:56:32,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1552.87 | bwd_inner_microstep: 1552.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 14:56:35,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1549.84 | bwd_inner_microstep: 1549.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2438 [2024-06-10 14:56:36,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 913.85 | bwd_inner_microstep: 913.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4134 [2024-06-10 14:56:38,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.64 | bwd_microstep: 1640.03 | bwd_inner_microstep: 1640.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2884 [2024-06-10 14:56:40,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.78 | bwd_microstep: 1057.01 | bwd_inner_microstep: 1056.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 14:56:42,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.06 | bwd_microstep: 1382.53 | bwd_inner_microstep: 1382.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 844 [2024-06-10 14:56:42,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.22 | bwd_microstep: 345.14 | bwd_inner_microstep: 345.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-10 14:56:44,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.67 | bwd_microstep: 1623.18 | bwd_inner_microstep: 1623.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 14:56:46,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.98 | bwd_microstep: 1624.27 | bwd_inner_microstep: 1624.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422 [2024-06-10 14:56:48,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.08 | bwd_microstep: 1185.05 | bwd_inner_microstep: 1185.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 14:56:50,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1411.24 | bwd_inner_microstep: 1411.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 14:56:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1341.05 | bwd_inner_microstep: 1341.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3487 [2024-06-10 14:56:54,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.04 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 14:56:56,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1275.30 | bwd_inner_microstep: 1275.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827 [2024-06-10 14:56:58,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.33 | bwd_microstep: 1580.07 | bwd_inner_microstep: 1580.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012 [2024-06-10 14:56:59,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 810.30 | bwd_inner_microstep: 810.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513 [2024-06-10 14:57:01,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.59 | bwd_microstep: 1189.92 | bwd_inner_microstep: 1189.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 14:57:02,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 14:57:04,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1380.06 | bwd_inner_microstep: 1380.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 14:57:06,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1288.47 | bwd_inner_microstep: 1288.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 14:57:08,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.71 | bwd_microstep: 1626.66 | bwd_inner_microstep: 1626.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2274 [2024-06-10 14:57:09,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.94 | bwd_microstep: 781.18 | bwd_inner_microstep: 781.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 14:57:11,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1356.65 | bwd_inner_microstep: 1356.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 14:57:13,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 14:57:15,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.06 | bwd_microstep: 1453.72 | bwd_inner_microstep: 1453.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3771 [2024-06-10 14:57:18,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.30 | bwd_microstep: 1674.79 | bwd_inner_microstep: 1674.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 14:57:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.49 | bwd_microstep: 1377.93 | bwd_inner_microstep: 1377.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3616 [2024-06-10 14:57:22,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.04 | bwd_microstep: 1556.36 | bwd_inner_microstep: 1556.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 14:57:24,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1548.73 | bwd_inner_microstep: 1548.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058 [2024-06-10 14:57:27,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 14:57:27,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.10 | bwd_microstep: 3056.31 | bwd_inner_microstep: 1043.65 | bwd_allreduce_microstep: 2012.61 | step_microstep: 37.94 [2024-06-10 14:57:27,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.80 | bwd: 44301.66 | bwd_inner: 42288.15 | bwd_allreduce: 2012.83 | step: 39.44 {'loss': 1.2594, 'learning_rate': 2.2247281760076468e-05, 'epoch': 0.48} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 14:57:29,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1472.67 | bwd_inner_microstep: 1472.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473 [2024-06-10 14:57:31,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.06 | bwd_microstep: 1237.05 | bwd_inner_microstep: 1237.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 14:57:33,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1278.32 | bwd_inner_microstep: 1278.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 14:57:35,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.10 | bwd_microstep: 1448.15 | bwd_inner_microstep: 1448.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798 [2024-06-10 14:57:37,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.26 | bwd_microstep: 1546.98 | bwd_inner_microstep: 1546.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 14:57:38,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.69 | bwd_microstep: 678.34 | bwd_inner_microstep: 678.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792 [2024-06-10 14:57:40,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.27 | bwd_microstep: 1643.86 | bwd_inner_microstep: 1643.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-10 14:57:42,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.11 | bwd_microstep: 1302.81 | bwd_inner_microstep: 1302.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-10 14:57:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.78 | bwd_microstep: 1526.90 | bwd_inner_microstep: 1526.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 14:57:45,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 798.69 | bwd_inner_microstep: 798.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 14:57:47,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.00 | bwd_microstep: 1190.20 | bwd_inner_microstep: 1190.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526 [2024-06-10 14:57:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.64 | bwd_microstep: 1322.67 | bwd_inner_microstep: 1322.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 14:57:50,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.77 | bwd_microstep: 1317.80 | bwd_inner_microstep: 1317.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3506 [2024-06-10 14:57:52,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1251.81 | bwd_inner_microstep: 1251.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 14:57:54,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.04 | bwd_microstep: 1386.50 | bwd_inner_microstep: 1386.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 14:57:56,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1474.38 | bwd_inner_microstep: 1474.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2203 [2024-06-10 14:57:57,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.20 | bwd_microstep: 988.33 | bwd_inner_microstep: 988.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 14:57:59,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1405.89 | bwd_inner_microstep: 1405.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082 [2024-06-10 14:58:01,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.26 | bwd_microstep: 851.03 | bwd_inner_microstep: 851.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862 [2024-06-10 14:58:01,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.62 | bwd_microstep: 677.26 | bwd_inner_microstep: 677.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3548 [2024-06-10 14:58:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.64 | bwd_microstep: 1522.74 | bwd_inner_microstep: 1522.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 14:58:05,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.10 | bwd_microstep: 1187.25 | bwd_inner_microstep: 1187.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 14:58:07,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.08 | bwd_microstep: 1503.55 | bwd_inner_microstep: 1503.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546 [2024-06-10 14:58:09,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1199.98 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3467 [2024-06-10 14:58:11,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.77 | bwd_microstep: 1214.22 | bwd_inner_microstep: 1214.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 14:58:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1286.91 | bwd_inner_microstep: 1286.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-10 14:58:14,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.50 | bwd_microstep: 1192.62 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 5317 [2024-06-10 14:58:17,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 737.35 | bwd_microstep: 1965.92 | bwd_inner_microstep: 1965.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 14:58:19,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1591.65 | bwd_inner_microstep: 1591.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 14:58:21,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 14:58:23,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1497.44 | bwd_inner_microstep: 1497.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282 [2024-06-10 14:58:27,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 14:58:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.71 | bwd_microstep: 3627.17 | bwd_inner_microstep: 1214.52 | bwd_allreduce_microstep: 2412.60 | step_microstep: 38.08 [2024-06-10 14:58:27,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15601.87 | bwd: 43939.58 | bwd_inner: 41526.08 | bwd_allreduce: 2412.82 | step: 39.58 {'loss': 1.2062, 'learning_rate': 2.2209981560818763e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 14:58:29,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1332.20 | bwd_inner_microstep: 1332.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-10 14:58:31,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1561.31 | bwd_inner_microstep: 1561.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 14:58:33,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1652.03 | bwd_inner_microstep: 1652.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 14:58:35,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 14:58:37,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.19 | bwd_microstep: 1543.54 | bwd_inner_microstep: 1543.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-10 14:58:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.35 | bwd_microstep: 1629.25 | bwd_inner_microstep: 1629.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 14:58:41,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.27 | bwd_microstep: 1147.02 | bwd_inner_microstep: 1146.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3538 [2024-06-10 14:58:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1230.96 | bwd_inner_microstep: 1230.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 14:58:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1389.43 | bwd_inner_microstep: 1389.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1887 [2024-06-10 14:58:46,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.91 | bwd_microstep: 744.77 | bwd_inner_microstep: 744.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 14:58:47,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-10 14:58:49,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.18 | bwd_microstep: 804.58 | bwd_inner_microstep: 804.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2906 [2024-06-10 14:58:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.16 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512 [2024-06-10 14:58:52,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.43 | bwd_microstep: 1587.24 | bwd_inner_microstep: 1587.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971 [2024-06-10 14:58:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 887.78 | bwd_inner_microstep: 887.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2405 [2024-06-10 14:58:55,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.47 | bwd_microstep: 942.85 | bwd_inner_microstep: 942.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3529 [2024-06-10 14:58:57,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1555.79 | bwd_inner_microstep: 1555.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 14:58:58,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 705.94 | bwd_inner_microstep: 705.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686 [2024-06-10 14:59:00,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.33 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 14:59:02,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1294.56 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 14:59:04,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1287.49 | bwd_inner_microstep: 1287.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011 [2024-06-10 14:59:05,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.15 | bwd_microstep: 801.84 | bwd_inner_microstep: 801.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607 [2024-06-10 14:59:06,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1307.99 | bwd_inner_microstep: 1307.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758 [2024-06-10 14:59:09,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.50 | bwd_microstep: 1542.78 | bwd_inner_microstep: 1542.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016 [2024-06-10 14:59:10,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.08 | bwd_microstep: 711.86 | bwd_inner_microstep: 711.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3652 [2024-06-10 14:59:12,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.81 | bwd_microstep: 1482.42 | bwd_inner_microstep: 1482.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 14:59:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.59 | bwd_microstep: 1304.11 | bwd_inner_microstep: 1304.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803 [2024-06-10 14:59:15,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1455.99 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 14:59:18,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1650.30 | bwd_inner_microstep: 1650.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2927 [2024-06-10 14:59:19,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.95 | bwd_microstep: 1031.28 | bwd_inner_microstep: 1031.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-10 14:59:21,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1591.97 | bwd_inner_microstep: 1591.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463 [2024-06-10 14:59:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 14:59:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.62 | bwd_microstep: 5733.30 | bwd_inner_microstep: 1625.78 | bwd_allreduce_microstep: 4107.47 | step_microstep: 38.22 [2024-06-10 14:59:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15292.86 | bwd: 45063.65 | bwd_inner: 40955.27 | bwd_allreduce: 4107.70 | step: 39.73 {'loss': 1.2794, 'learning_rate': 2.2172673578017497e-05, 'epoch': 0.48} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 14:59:30,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1467.97 | bwd_inner_microstep: 1467.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 14:59:31,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.84 | bwd_microstep: 1239.14 | bwd_inner_microstep: 1239.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 14:59:34,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.30 | bwd_microstep: 1546.30 | bwd_inner_microstep: 1546.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860 [2024-06-10 14:59:35,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.06 | bwd_microstep: 676.65 | bwd_inner_microstep: 676.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 14:59:36,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1389.03 | bwd_inner_microstep: 1389.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 14:59:38,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.60 | bwd_microstep: 1181.19 | bwd_inner_microstep: 1181.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 14:59:40,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1384.25 | bwd_inner_microstep: 1384.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1037 [2024-06-10 14:59:41,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.75 | bwd_microstep: 399.93 | bwd_inner_microstep: 399.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-10 14:59:43,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.57 | bwd_microstep: 1520.07 | bwd_inner_microstep: 1520.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 14:59:44,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1246.70 | bwd_inner_microstep: 1246.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 14:59:46,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.20 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 14:59:48,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.58 | bwd_microstep: 1340.44 | bwd_inner_microstep: 1340.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3646 [2024-06-10 14:59:50,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1312.74 | bwd_inner_microstep: 1312.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 14:59:52,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1377.34 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1885 [2024-06-10 14:59:53,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.60 | bwd_microstep: 773.63 | bwd_inner_microstep: 773.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3631 [2024-06-10 14:59:55,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1541.32 | bwd_inner_microstep: 1541.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1958 [2024-06-10 14:59:56,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.21 | bwd_microstep: 852.77 | bwd_inner_microstep: 852.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3842 [2024-06-10 14:59:59,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.33 | bwd_microstep: 1725.04 | bwd_inner_microstep: 1725.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 15:00:01,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1507.19 | bwd_inner_microstep: 1507.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 15:00:02,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-10 15:00:05,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1507.01 | bwd_inner_microstep: 1506.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4018 [2024-06-10 15:00:07,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1616.03 | bwd_inner_microstep: 1616.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2279 [2024-06-10 15:00:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.69 | bwd_microstep: 933.08 | bwd_inner_microstep: 933.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895 [2024-06-10 15:00:09,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.41 | bwd_microstep: 715.83 | bwd_inner_microstep: 715.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 15:00:11,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.88 | bwd_microstep: 1647.72 | bwd_inner_microstep: 1647.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767 [2024-06-10 15:00:13,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.30 | bwd_microstep: 1546.67 | bwd_inner_microstep: 1546.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695 [2024-06-10 15:00:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.21 | bwd_microstep: 1460.58 | bwd_inner_microstep: 1460.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074 [2024-06-10 15:00:17,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.84 | bwd_microstep: 912.45 | bwd_inner_microstep: 912.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2262 [2024-06-10 15:00:18,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.62 | bwd_microstep: 884.00 | bwd_inner_microstep: 883.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3478 [2024-06-10 15:00:20,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.01 | bwd_microstep: 1341.07 | bwd_inner_microstep: 1341.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041 [2024-06-10 15:00:21,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.08 | bwd_microstep: 842.66 | bwd_inner_microstep: 842.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 15:00:27,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.60 [2024-06-10 15:00:27,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.86 | bwd_microstep: 5440.39 | bwd_inner_microstep: 1732.50 | bwd_allreduce_microstep: 3707.83 | step_microstep: 38.19 [2024-06-10 15:00:27,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14991.09 | bwd: 43998.82 | bwd_inner: 40290.08 | bwd_allreduce: 3708.06 | step: 39.66 {'loss': 1.275, 'learning_rate': 2.213535794307118e-05, 'epoch': 0.48} 8%|████▊ | 827/1726 [14:18:03<15:31:01, 62.14s/it] 48%|████▊ | 828/1726 [14:19:03<15:24:37, 61.78s/it] 48%|████▊ | 828/1726 [14:19:03<15:24:37, 61.78s/it] 48%|████▊ | 829/1726 [14:20:04<15:17:29, 61.37s/it] 48%|████▊ | 829/1726 [14:20:04<15:17:29, 61.37s/it] 48%|████▊ | 830/1726 [14:21:04<15:09:44, 60.92s/it] 48%|████▊ | 830/1726 [14:21:04<15:09:44, 60.92s/it] 48%|████▊ | 831/1726 [14:22:04<15:07:40, 60.85s/it] 48%|████▊ | 831/1726 [14:22:04<15:07:40, 60.85s/it] 48%|████▊ | 832/1726 [14:23:04<14:59:47, 60.39s/it] 48%|████▊ | 832/1726 [14:23:04<14:59:47, 60.39s/it]dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3401 [2024-06-10 15:00:29,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.59 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397 [2024-06-10 15:00:31,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.15 | bwd_microstep: 1145.71 | bwd_inner_microstep: 1145.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-10 15:00:32,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1307.24 | bwd_inner_microstep: 1307.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-10 15:00:34,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.52 | bwd_microstep: 1543.33 | bwd_inner_microstep: 1543.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 15:00:36,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1278.23 | bwd_inner_microstep: 1278.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 15:00:38,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 15:00:40,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1382.16 | bwd_inner_microstep: 1382.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3782 [2024-06-10 15:00:42,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1413.19 | bwd_inner_microstep: 1413.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 15:00:44,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1256.03 | bwd_inner_microstep: 1256.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 15:00:46,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1410.31 | bwd_inner_microstep: 1410.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 15:00:47,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1157.90 | bwd_inner_microstep: 1157.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 15:00:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1501.32 | bwd_inner_microstep: 1501.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-10 15:00:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1436.69 | bwd_inner_microstep: 1436.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 15:00:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.72 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489 [2024-06-10 15:00:55,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.38 | bwd_microstep: 1429.18 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 15:00:57,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1483.71 | bwd_inner_microstep: 1483.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631 [2024-06-10 15:01:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.61 | bwd_microstep: 1811.40 | bwd_inner_microstep: 1811.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627 [2024-06-10 15:01:02,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.79 | bwd_microstep: 1602.52 | bwd_inner_microstep: 1602.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-10 15:01:04,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.01 | bwd_microstep: 1579.49 | bwd_inner_microstep: 1579.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528 [2024-06-10 15:01:06,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1324.56 | bwd_inner_microstep: 1324.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3869 [2024-06-10 15:01:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 15:01:10,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1157.82 | bwd_inner_microstep: 1157.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1375 [2024-06-10 15:01:10,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.73 | bwd_microstep: 522.22 | bwd_inner_microstep: 522.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-10 15:01:12,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.95 | bwd_microstep: 1534.16 | bwd_inner_microstep: 1534.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2133 [2024-06-10 15:01:14,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.79 | bwd_microstep: 803.26 | bwd_inner_microstep: 803.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 15:01:15,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.00 | bwd_microstep: 1357.49 | bwd_inner_microstep: 1357.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-10 15:01:17,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.32 | bwd_microstep: 1306.95 | bwd_inner_microstep: 1306.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 15:01:19,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.03 | bwd_microstep: 1451.86 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3819 [2024-06-10 15:01:22,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.23 | bwd_microstep: 1853.69 | bwd_inner_microstep: 1853.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 15:01:24,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 1505.38 | bwd_inner_microstep: 1505.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229 [2024-06-10 15:01:25,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.97 | bwd_microstep: 963.71 | bwd_inner_microstep: 963.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 15:01:30,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.30 | optimizer_step: 6.59 [2024-06-10 15:01:30,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 4091.40 | bwd_inner_microstep: 1945.31 | bwd_allreduce_microstep: 2146.04 | step_microstep: 38.26 [2024-06-10 15:01:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16345.64 | bwd: 46268.85 | bwd_inner: 44121.91 | bwd_allreduce: 2146.27 | step: 39.74 {'loss': 1.2421, 'learning_rate': 2.2098034787405288e-05, 'epoch': 0.48} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-10 15:01:32,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.81 | bwd_microstep: 1362.46 | bwd_inner_microstep: 1362.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3400 [2024-06-10 15:01:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.34 | bwd_microstep: 1196.28 | bwd_inner_microstep: 1196.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-10 15:01:36,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1506.84 | bwd_inner_microstep: 1506.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 15:01:38,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.71 | bwd_microstep: 1488.02 | bwd_inner_microstep: 1487.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780 [2024-06-10 15:01:40,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.46 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763 [2024-06-10 15:01:42,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.46 | bwd_microstep: 1639.62 | bwd_inner_microstep: 1639.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4107 [2024-06-10 15:01:44,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1532.90 | bwd_inner_microstep: 1532.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 840 [2024-06-10 15:01:45,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.64 | bwd_microstep: 345.95 | bwd_inner_microstep: 345.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 15:01:46,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918 [2024-06-10 15:01:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.53 | bwd_microstep: 689.03 | bwd_inner_microstep: 689.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 15:01:50,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1558.21 | bwd_inner_microstep: 1558.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676 [2024-06-10 15:01:52,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.01 | bwd_microstep: 1719.18 | bwd_inner_microstep: 1719.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976 [2024-06-10 15:01:53,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.24 | bwd_microstep: 889.63 | bwd_inner_microstep: 889.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 15:01:55,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1384.32 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635 [2024-06-10 15:01:57,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.66 | bwd_microstep: 1604.09 | bwd_inner_microstep: 1604.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463 [2024-06-10 15:01:59,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1438.15 | bwd_inner_microstep: 1438.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 15:02:01,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036 [2024-06-10 15:02:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 811.97 | bwd_inner_microstep: 811.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 15:02:04,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.19 | bwd_microstep: 1489.66 | bwd_inner_microstep: 1489.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-10 15:02:06,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1327.75 | bwd_inner_microstep: 1327.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 15:02:08,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1492.11 | bwd_inner_microstep: 1492.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 15:02:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.08 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3754 [2024-06-10 15:02:12,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.35 | bwd_microstep: 1440.89 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2931 [2024-06-10 15:02:14,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.87 | bwd_microstep: 1166.79 | bwd_inner_microstep: 1166.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 15:02:16,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1393.44 | bwd_inner_microstep: 1393.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3611 [2024-06-10 15:02:18,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.05 | bwd_microstep: 1534.36 | bwd_inner_microstep: 1534.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507 [2024-06-10 15:02:20,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1317.95 | bwd_inner_microstep: 1317.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728 [2024-06-10 15:02:22,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.93 | bwd_microstep: 1335.47 | bwd_inner_microstep: 1335.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240 [2024-06-10 15:02:23,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 965.53 | bwd_inner_microstep: 965.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798 [2024-06-10 15:02:25,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.07 | bwd_microstep: 1546.38 | bwd_inner_microstep: 1546.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3575 [2024-06-10 15:02:27,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.26 | bwd_microstep: 1643.11 | bwd_inner_microstep: 1643.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 15:02:33,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 15:02:33,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.66 | bwd_microstep: 5469.74 | bwd_inner_microstep: 1434.26 | bwd_allreduce_microstep: 4035.43 | step_microstep: 37.94 [2024-06-10 15:02:33,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15985.93 | bwd: 46992.06 | bwd_inner: 42955.69 | bwd_allreduce: 4035.67 | step: 39.46 {'loss': 1.2224, 'learning_rate': 2.206070424247178e-05, 'epoch': 0.48} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452 [2024-06-10 15:02:35,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.85 | bwd_microstep: 1369.71 | bwd_inner_microstep: 1369.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 15:02:37,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1380.90 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 15:02:39,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1350.32 | bwd_inner_microstep: 1350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3796 [2024-06-10 15:02:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.11 | bwd_microstep: 1575.75 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 15:02:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.06 | bwd_microstep: 1538.68 | bwd_inner_microstep: 1538.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750 [2024-06-10 15:02:45,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.83 | bwd_microstep: 1445.63 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 15:02:47,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1386.00 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 15:02:49,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.85 | bwd_microstep: 1628.92 | bwd_inner_microstep: 1628.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 15:02:51,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1244.48 | bwd_inner_microstep: 1244.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 15:02:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 15:02:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 15:02:57,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1387.27 | bwd_inner_microstep: 1387.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 15:02:58,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1389.76 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 15:03:00,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1473.74 | bwd_inner_microstep: 1473.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3493 [2024-06-10 15:03:03,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1680.41 | bwd_inner_microstep: 1680.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440 [2024-06-10 15:03:05,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1380.88 | bwd_inner_microstep: 1380.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 15:03:07,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1375.64 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3526 [2024-06-10 15:03:09,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 1403.45 | bwd_inner_microstep: 1403.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 15:03:11,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1557.14 | bwd_inner_microstep: 1557.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 15:03:13,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.10 | bwd_microstep: 1557.97 | bwd_inner_microstep: 1557.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-10 15:03:14,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.07 | bwd_microstep: 1185.24 | bwd_inner_microstep: 1185.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 15:03:17,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1655.62 | bwd_inner_microstep: 1655.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504 [2024-06-10 15:03:18,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1255.93 | bwd_inner_microstep: 1255.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062 [2024-06-10 15:03:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 818.67 | bwd_inner_microstep: 818.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521 [2024-06-10 15:03:21,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.18 | bwd_microstep: 1195.23 | bwd_inner_microstep: 1195.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547 [2024-06-10 15:03:23,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280 [2024-06-10 15:03:25,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1069.06 | bwd_inner_microstep: 1069.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 15:03:26,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-10 15:03:27,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 805.28 | bwd_inner_microstep: 805.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 15:03:30,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1588.74 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918 [2024-06-10 15:03:31,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.82 | bwd_microstep: 686.79 | bwd_inner_microstep: 686.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2928 [2024-06-10 15:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-10 15:03:35,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.29 | bwd_microstep: 3888.09 | bwd_inner_microstep: 1385.24 | bwd_allreduce_microstep: 2502.79 | step_microstep: 38.00 [2024-06-10 15:03:35,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.26 | bwd: 45389.20 | bwd_inner: 42885.50 | bwd_allreduce: 2503.02 | step: 39.42 {'loss': 1.2109, 'learning_rate': 2.2023366439748647e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 15:03:37,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1328.74 | bwd_inner_microstep: 1328.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4576 [2024-06-10 15:03:39,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.98 | bwd_microstep: 1750.22 | bwd_inner_microstep: 1750.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 15:03:41,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1381.92 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 15:03:43,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 15:03:45,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-10 15:03:46,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 805.10 | bwd_inner_microstep: 805.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 15:03:47,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 789.96 | bwd_inner_microstep: 789.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 15:03:49,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1315.57 | bwd_inner_microstep: 1315.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 15:03:51,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 15:03:52,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.76 | bwd_microstep: 788.67 | bwd_inner_microstep: 788.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 15:03:54,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.89 | bwd_microstep: 1627.26 | bwd_inner_microstep: 1627.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424 [2024-06-10 15:03:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.84 | bwd_microstep: 1154.69 | bwd_inner_microstep: 1154.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 15:03:58,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.03 | bwd_microstep: 1444.57 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 15:04:00,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.76 | bwd_microstep: 1476.89 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 15:04:02,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.29 | bwd_microstep: 1617.20 | bwd_inner_microstep: 1617.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-10 15:04:04,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1417.66 | bwd_inner_microstep: 1417.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3487 [2024-06-10 15:04:06,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.16 | bwd_microstep: 1217.90 | bwd_inner_microstep: 1217.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-10 15:04:08,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.01 | bwd_microstep: 1476.92 | bwd_inner_microstep: 1476.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 15:04:10,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.16 | bwd_microstep: 1555.43 | bwd_inner_microstep: 1555.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 15:04:12,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1480.93 | bwd_inner_microstep: 1480.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-10 15:04:14,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.86 | bwd_microstep: 1604.46 | bwd_inner_microstep: 1604.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3532 [2024-06-10 15:04:16,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 1592.05 | bwd_inner_microstep: 1592.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 15:04:18,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.06 | bwd_microstep: 1461.10 | bwd_inner_microstep: 1461.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978 [2024-06-10 15:04:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.61 | bwd_microstep: 735.64 | bwd_inner_microstep: 735.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 15:04:21,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1372.57 | bwd_inner_microstep: 1372.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 15:04:23,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.60 | bwd_microstep: 1253.43 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-10 15:04:25,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.95 | bwd_microstep: 1403.28 | bwd_inner_microstep: 1403.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451 [2024-06-10 15:04:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1413.74 | bwd_inner_microstep: 1413.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 15:04:29,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1490.12 | bwd_inner_microstep: 1490.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795 [2024-06-10 15:04:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3777 [2024-06-10 15:04:33,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1285.19 | bwd_inner_microstep: 1285.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3808 [2024-06-10 15:04:36,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 15:04:36,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.09 | bwd_microstep: 2158.75 | bwd_inner_microstep: 1627.38 | bwd_allreduce_microstep: 531.33 | step_microstep: 37.75 [2024-06-10 15:04:36,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16244.36 | bwd: 44120.02 | bwd_inner: 43587.79 | bwd_allreduce: 531.55 | step: 39.29 {'loss': 1.2274, 'learning_rate': 2.198602151073943e-05, 'epoch': 0.48} dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870 [2024-06-10 15:04:37,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.37 | bwd_microstep: 732.73 | bwd_inner_microstep: 732.63 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402 [2024-06-10 15:04:39,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.73 | bwd_microstep: 1367.46 | bwd_inner_microstep: 1367.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2785 [2024-06-10 15:04:40,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.49 | bwd_microstep: 1121.07 | bwd_inner_microstep: 1121.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792 [2024-06-10 15:04:42,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.82 | bwd_microstep: 1646.56 | bwd_inner_microstep: 1646.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 15:04:45,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1535.90 | bwd_inner_microstep: 1535.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 15:04:46,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 15:04:48,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1282.82 | bwd_inner_microstep: 1282.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 15:04:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-10 15:04:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.67 | bwd_microstep: 1528.86 | bwd_inner_microstep: 1528.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 15:04:54,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1289.09 | bwd_inner_microstep: 1289.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3650 [2024-06-10 15:04:56,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1389.37 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3673 [2024-06-10 15:04:58,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.95 | bwd_microstep: 1571.82 | bwd_inner_microstep: 1571.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3693 [2024-06-10 15:05:00,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.11 | bwd_microstep: 1473.95 | bwd_inner_microstep: 1473.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517 [2024-06-10 15:05:02,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1579.43 | bwd_inner_microstep: 1579.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3583 [2024-06-10 15:05:04,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.17 | bwd_microstep: 1333.35 | bwd_inner_microstep: 1333.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514 [2024-06-10 15:05:06,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1578.32 | bwd_inner_microstep: 1578.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1937 [2024-06-10 15:05:07,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.06 | bwd_microstep: 819.67 | bwd_inner_microstep: 819.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-10 15:05:08,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.55 | bwd_microstep: 888.51 | bwd_inner_microstep: 888.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3826 [2024-06-10 15:05:11,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.18 | bwd_microstep: 1723.26 | bwd_inner_microstep: 1723.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292 [2024-06-10 15:05:12,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.28 | bwd_microstep: 975.21 | bwd_inner_microstep: 975.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613 [2024-06-10 15:05:14,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 15:05:16,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1379.76 | bwd_inner_microstep: 1379.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 15:05:18,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1657.40 | bwd_inner_microstep: 1657.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 15:05:20,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.77 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 15:05:22,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1408.03 | bwd_inner_microstep: 1408.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721 [2024-06-10 15:05:24,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1562.29 | bwd_inner_microstep: 1562.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 15:05:26,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1658.27 | bwd_inner_microstep: 1658.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600 [2024-06-10 15:05:28,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.28 | bwd_microstep: 1430.88 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782 [2024-06-10 15:05:30,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1444.54 | bwd_inner_microstep: 1444.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3715 [2024-06-10 15:05:32,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.56 | bwd_microstep: 1733.66 | bwd_inner_microstep: 1733.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398 [2024-06-10 15:05:34,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.51 | bwd_microstep: 1369.65 | bwd_inner_microstep: 1369.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 15:05:38,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-10 15:05:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.56 | bwd_microstep: 2845.78 | bwd_inner_microstep: 1740.53 | bwd_allreduce_microstep: 1105.20 | step_microstep: 37.88 [2024-06-10 15:05:38,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16435.47 | bwd: 45386.51 | bwd_inner: 44280.34 | bwd_allreduce: 1105.47 | step: 39.36 {'loss': 1.2077, 'learning_rate': 2.1948669586972776e-05, 'epoch': 0.48} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 15:05:40,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1339.87 | bwd_inner_microstep: 1339.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478 [2024-06-10 15:05:42,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.07 | bwd_microstep: 1439.34 | bwd_inner_microstep: 1439.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 15:05:44,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1483.19 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 15:05:46,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.76 | bwd_microstep: 1651.32 | bwd_inner_microstep: 1651.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 15:05:48,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 15:05:50,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.89 | bwd_microstep: 1393.00 | bwd_inner_microstep: 1392.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 15:05:52,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490 [2024-06-10 15:05:53,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.17 | bwd_microstep: 1218.08 | bwd_inner_microstep: 1218.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 15:05:55,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1383.13 | bwd_inner_microstep: 1383.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 15:05:57,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 15:05:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-10 15:06:01,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.67 | bwd_microstep: 1185.49 | bwd_inner_microstep: 1185.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502 [2024-06-10 15:06:02,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1219.56 | bwd_inner_microstep: 1219.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-10 15:06:05,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.11 | bwd_microstep: 1617.02 | bwd_inner_microstep: 1617.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 15:06:07,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1393.21 | bwd_inner_microstep: 1393.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550 [2024-06-10 15:06:09,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-10 15:06:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1612.91 | bwd_inner_microstep: 1612.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2085 [2024-06-10 15:06:12,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.96 | bwd_microstep: 884.73 | bwd_inner_microstep: 884.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 15:06:14,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.12 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 15:06:16,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017 [2024-06-10 15:06:17,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.05 | bwd_microstep: 712.84 | bwd_inner_microstep: 712.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2289 [2024-06-10 15:06:18,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 784.51 | bwd_inner_microstep: 784.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 15:06:20,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1544.45 | bwd_inner_microstep: 1544.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727 [2024-06-10 15:06:22,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1434.25 | bwd_inner_microstep: 1434.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3601 [2024-06-10 15:06:24,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1468.00 | bwd_inner_microstep: 1467.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 15:06:26,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1290.76 | bwd_inner_microstep: 1290.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 15:06:28,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1377.74 | bwd_inner_microstep: 1377.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 15:06:30,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1390.15 | bwd_inner_microstep: 1390.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3427 [2024-06-10 15:06:32,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1511.15 | bwd_inner_microstep: 1511.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 15:06:34,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.33 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043 [2024-06-10 15:06:35,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.88 | bwd_microstep: 908.95 | bwd_inner_microstep: 908.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 15:06:38,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.59 [2024-06-10 15:06:38,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.38 | bwd_microstep: 1779.18 | bwd_inner_microstep: 1474.57 | bwd_allreduce_microstep: 304.57 | step_microstep: 37.66 [2024-06-10 15:06:38,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16116.41 | bwd: 43360.76 | bwd_inner: 43055.30 | bwd_allreduce: 304.79 | step: 39.14 48%|████▊ | 833/1726 [14:24:07<15:10:13, 61.16s/it] 48%|████▊ | 833/1726 [14:24:07<15:10:13, 61.16s/it] 48%|████▊ | 834/1726 [14:25:10<15:18:48, 61.80s/it] 48%|████▊ | 834/1726 [14:25:10<15:18:48, 61.80s/it] 48%|████▊ | 835/1726 [14:26:12<15:17:26, 61.78s/it] 48%|████▊ | 835/1726 [14:26:12<15:17:26, 61.78s/it] 48%|████▊ | 836/1726 [14:27:12<15:11:35, 61.46s/it] 48%|████▊ | 836/1726 [14:27:12<15:11:35, 61.46s/it] 48%|████▊ | 837/1726 [14:28:15<15:13:42, 61.67s/it] 48%|████▊ | 837/1726 [14:28:15<15:13:42, 61.67s/it] 49%|████▊ | 838/1726 [14:29:14<15:04:24, 61.11s/{'loss': 1.1835, 'learning_rate': 2.1911310800001967e-05, 'epoch': 0.49} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 15:06:40,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1349.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952 [2024-06-10 15:06:42,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.83 | bwd_microstep: 1694.80 | bwd_inner_microstep: 1694.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883 [2024-06-10 15:06:44,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.01 | bwd_microstep: 1685.51 | bwd_inner_microstep: 1685.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 15:06:46,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1340.58 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 15:06:48,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.13 | bwd_microstep: 1244.78 | bwd_inner_microstep: 1244.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-10 15:06:49,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 787.56 | bwd_inner_microstep: 787.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 15:06:51,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1281.62 | bwd_inner_microstep: 1281.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586 [2024-06-10 15:06:52,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.25 | bwd_microstep: 1212.26 | bwd_inner_microstep: 1212.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706 [2024-06-10 15:06:55,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1627.22 | bwd_inner_microstep: 1627.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 15:06:56,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1342.22 | bwd_inner_microstep: 1342.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-10 15:06:58,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.69 | bwd_microstep: 1181.97 | bwd_inner_microstep: 1181.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3484 [2024-06-10 15:07:00,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.78 | bwd_microstep: 1248.10 | bwd_inner_microstep: 1248.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512 [2024-06-10 15:07:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.04 | bwd_microstep: 1430.90 | bwd_inner_microstep: 1430.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-10 15:07:04,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1365.45 | bwd_inner_microstep: 1365.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3570 [2024-06-10 15:07:06,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.66 | bwd_microstep: 1629.51 | bwd_inner_microstep: 1629.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423 [2024-06-10 15:07:08,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-10 15:07:09,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.98 | bwd_microstep: 797.87 | bwd_inner_microstep: 797.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 15:07:11,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 15:07:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1486.57 | bwd_inner_microstep: 1486.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995 [2024-06-10 15:07:14,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.59 | bwd_microstep: 706.69 | bwd_inner_microstep: 706.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 15:07:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1344.69 | bwd_inner_microstep: 1344.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465 [2024-06-10 15:07:17,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1333.24 | bwd_inner_microstep: 1333.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 15:07:19,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3612 [2024-06-10 15:07:21,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.87 | bwd_microstep: 1368.80 | bwd_inner_microstep: 1368.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715 [2024-06-10 15:07:23,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1430.20 | bwd_inner_microstep: 1430.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-10 15:07:25,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.18 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3721 [2024-06-10 15:07:28,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1697.52 | bwd_inner_microstep: 1697.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899 [2024-06-10 15:07:29,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.04 | bwd_microstep: 714.22 | bwd_inner_microstep: 714.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-10 15:07:30,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.15 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 15:07:32,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1404.53 | bwd_inner_microstep: 1404.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 15:07:34,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.92 | bwd_microstep: 1637.87 | bwd_inner_microstep: 1637.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3573 [2024-06-10 15:07:37,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 15:07:37,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 3078.12 | bwd_inner_microstep: 1728.78 | bwd_allreduce_microstep: 1349.29 | step_microstep: 37.94 [2024-06-10 15:07:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15786.35 | bwd: 43706.06 | bwd_inner: 42355.87 | bwd_allreduce: 1349.52 | step: 39.45 {'loss': 1.2662, 'learning_rate': 2.187394528140445e-05, 'epoch': 0.49} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414 [2024-06-10 15:07:39,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1295.62 | bwd_inner_microstep: 1295.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3960 [2024-06-10 15:07:41,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1424.69 | bwd_inner_microstep: 1424.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 15:07:43,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1248.98 | bwd_inner_microstep: 1248.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 15:07:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.45 | bwd_microstep: 1375.16 | bwd_inner_microstep: 1375.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 15:07:47,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.62 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 15:07:48,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 792.67 | bwd_inner_microstep: 792.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 15:07:50,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1246.00 | bwd_inner_microstep: 1245.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 15:07:52,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.75 | bwd_microstep: 1530.81 | bwd_inner_microstep: 1530.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 15:07:54,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.99 | bwd_microstep: 1150.27 | bwd_inner_microstep: 1150.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 15:07:55,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.08 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1885 [2024-06-10 15:07:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.93 | bwd_microstep: 761.22 | bwd_inner_microstep: 761.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3542 [2024-06-10 15:07:59,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1442.37 | bwd_inner_microstep: 1442.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2622 [2024-06-10 15:08:00,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.89 | bwd_microstep: 1107.66 | bwd_inner_microstep: 1107.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-10 15:08:02,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1579.96 | bwd_inner_microstep: 1579.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486 [2024-06-10 15:08:04,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.48 | bwd_microstep: 1578.91 | bwd_inner_microstep: 1578.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 15:08:06,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.76 | bwd_microstep: 1484.99 | bwd_inner_microstep: 1484.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-10 15:08:08,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1444.47 | bwd_inner_microstep: 1444.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3411 [2024-06-10 15:08:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.56 | bwd_microstep: 1438.87 | bwd_inner_microstep: 1438.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-10 15:08:12,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1440.10 | bwd_inner_microstep: 1440.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 15:08:14,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1292.46 | bwd_inner_microstep: 1292.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 15:08:16,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1292.47 | bwd_inner_microstep: 1292.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824 [2024-06-10 15:08:18,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 1586.08 | bwd_inner_microstep: 1586.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817 [2024-06-10 15:08:20,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.39 | bwd_microstep: 1583.01 | bwd_inner_microstep: 1582.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-10 15:08:22,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.21 | bwd_microstep: 1426.07 | bwd_inner_microstep: 1426.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3447 [2024-06-10 15:08:24,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1398.34 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 15:08:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1487.94 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3722 [2024-06-10 15:08:29,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1612.45 | bwd_inner_microstep: 1612.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 15:08:30,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1391.33 | bwd_inner_microstep: 1391.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-10 15:08:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1537.55 | bwd_inner_microstep: 1537.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-10 15:08:34,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.74 | bwd_microstep: 1308.25 | bwd_inner_microstep: 1308.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 15:08:36,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.66 | bwd_microstep: 1355.27 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 15:08:38,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.16 | optimizer_step: 6.63 [2024-06-10 15:08:38,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1336.77 | bwd_inner_microstep: 1328.53 | bwd_allreduce_microstep: 8.20 | step_microstep: 37.65 [2024-06-10 15:08:38,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16420.19 | bwd: 43883.11 | bwd_inner: 43874.01 | bwd_allreduce: 8.42 | step: 39.19 {'loss': 1.2786, 'learning_rate': 2.1836573162781406e-05, 'epoch': 0.49} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-10 15:08:40,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.08 | bwd_microstep: 1312.98 | bwd_inner_microstep: 1312.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 15:08:42,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 15:08:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883 [2024-06-10 15:08:46,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.95 | bwd_microstep: 1583.09 | bwd_inner_microstep: 1583.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 15:08:48,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2312 [2024-06-10 15:08:49,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.63 | bwd_microstep: 915.99 | bwd_inner_microstep: 915.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-10 15:08:51,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.37 | bwd_microstep: 1443.45 | bwd_inner_microstep: 1443.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 15:08:53,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 15:08:54,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1246.22 | bwd_inner_microstep: 1246.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 15:08:56,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1250.59 | bwd_inner_microstep: 1250.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 15:08:58,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.82 | bwd_microstep: 1356.54 | bwd_inner_microstep: 1356.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-10 15:09:00,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1311.87 | bwd_inner_microstep: 1311.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-10 15:09:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1314.92 | bwd_inner_microstep: 1314.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-10 15:09:03,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.68 | bwd_microstep: 1315.14 | bwd_inner_microstep: 1315.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665 [2024-06-10 15:09:05,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1334.85 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 15:09:07,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110 [2024-06-10 15:09:08,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.76 | bwd_microstep: 923.97 | bwd_inner_microstep: 923.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 15:09:11,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1590.11 | bwd_inner_microstep: 1590.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 15:09:13,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1476.47 | bwd_inner_microstep: 1476.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 15:09:15,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.93 | bwd_microstep: 1383.23 | bwd_inner_microstep: 1383.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 15:09:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1491.26 | bwd_inner_microstep: 1491.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-10 15:09:19,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.46 | bwd_microstep: 1596.88 | bwd_inner_microstep: 1596.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2243 [2024-06-10 15:09:20,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.37 | bwd_microstep: 1026.42 | bwd_inner_microstep: 1026.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 15:09:22,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-10 15:09:23,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.51 | bwd_microstep: 810.16 | bwd_inner_microstep: 810.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3557 [2024-06-10 15:09:25,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.04 | bwd_microstep: 1359.10 | bwd_inner_microstep: 1359.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 15:09:27,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1416.46 | bwd_inner_microstep: 1416.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3561 [2024-06-10 15:09:29,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.51 | bwd_microstep: 1568.37 | bwd_inner_microstep: 1568.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 15:09:31,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1357.38 | bwd_inner_microstep: 1357.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:09:33,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1373.27 | bwd_inner_microstep: 1373.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 15:09:35,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.65 | bwd_microstep: 1340.19 | bwd_inner_microstep: 1340.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1845 [2024-06-10 15:09:39,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.64 [2024-06-10 15:09:39,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.42 | bwd_microstep: 3458.19 | bwd_inner_microstep: 766.02 | bwd_allreduce_microstep: 2692.12 | step_microstep: 38.03 [2024-06-10 15:09:39,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15653.14 | bwd: 44497.09 | bwd_inner: 41804.06 | bwd_allreduce: 2692.35 | step: 39.45 {'loss': 1.2411, 'learning_rate': 2.179919457575722e-05, 'epoch': 0.49} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 15:09:40,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.76 | bwd_microstep: 779.46 | bwd_inner_microstep: 779.37 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3995 [2024-06-10 15:09:42,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.31 | bwd_microstep: 1600.85 | bwd_inner_microstep: 1600.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2321 [2024-06-10 15:09:43,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.02 | bwd_microstep: 981.71 | bwd_inner_microstep: 981.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3478 [2024-06-10 15:09:45,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1229.62 | bwd_inner_microstep: 1229.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791 [2024-06-10 15:09:47,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1547.23 | bwd_inner_microstep: 1547.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 15:09:49,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.29 | bwd_microstep: 1149.16 | bwd_inner_microstep: 1149.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 15:09:50,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1275.08 | bwd_inner_microstep: 1275.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 15:09:52,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1390.05 | bwd_inner_microstep: 1390.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 15:09:54,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900 [2024-06-10 15:09:55,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.43 | bwd_microstep: 683.64 | bwd_inner_microstep: 683.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605 [2024-06-10 15:09:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.88 | bwd_microstep: 1440.63 | bwd_inner_microstep: 1440.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2220 [2024-06-10 15:09:58,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.00 | bwd_microstep: 893.24 | bwd_inner_microstep: 893.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498 [2024-06-10 15:10:00,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.72 | bwd_microstep: 1316.57 | bwd_inner_microstep: 1316.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-10 15:10:02,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.21 | bwd_microstep: 1509.35 | bwd_inner_microstep: 1509.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397 [2024-06-10 15:10:04,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1402.61 | bwd_inner_microstep: 1402.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691 [2024-06-10 15:10:06,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.50 | bwd_microstep: 1617.86 | bwd_inner_microstep: 1617.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449 [2024-06-10 15:10:08,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.68 | bwd_microstep: 1378.13 | bwd_inner_microstep: 1378.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 15:10:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 15:10:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1509.74 | bwd_inner_microstep: 1509.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 15:10:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.63 | bwd_microstep: 1256.22 | bwd_inner_microstep: 1256.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 15:10:16,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.58 | bwd_microstep: 1426.02 | bwd_inner_microstep: 1425.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 15:10:18,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1390.60 | bwd_inner_microstep: 1390.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 15:10:20,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.30 | bwd_microstep: 1657.81 | bwd_inner_microstep: 1657.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 15:10:23,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.35 | bwd_microstep: 1658.14 | bwd_inner_microstep: 1658.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 15:10:25,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1497.81 | bwd_inner_microstep: 1497.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1904 [2024-06-10 15:10:26,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.69 | bwd_microstep: 763.93 | bwd_inner_microstep: 763.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3718 [2024-06-10 15:10:28,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.47 | bwd_microstep: 1422.08 | bwd_inner_microstep: 1422.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 15:10:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1475.01 | bwd_inner_microstep: 1474.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3566 [2024-06-10 15:10:32,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.59 | bwd_microstep: 1346.30 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 15:10:34,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3596 [2024-06-10 15:10:36,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.00 | bwd_microstep: 1636.89 | bwd_inner_microstep: 1636.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-10 15:10:39,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 15:10:39,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.67 | bwd_microstep: 2697.21 | bwd_inner_microstep: 1809.69 | bwd_allreduce_microstep: 887.47 | step_microstep: 38.03 [2024-06-10 15:10:39,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.42 | bwd: 44187.66 | bwd_inner: 43299.22 | bwd_allreduce: 887.74 | step: 39.55 {'loss': 1.2561, 'learning_rate': 2.1761809651979098e-05, 'epoch': 0.49} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-10 15:10:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.50 | bwd_microstep: 1302.45 | bwd_inner_microstep: 1302.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 15:10:43,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1345.15 | bwd_inner_microstep: 1345.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-10 15:10:45,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.46 | bwd_microstep: 1560.91 | bwd_inner_microstep: 1560.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 15:10:47,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1480.13 | bwd_inner_microstep: 1480.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3837 [2024-06-10 15:10:49,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.28 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237 [2024-06-10 15:10:50,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.39 | bwd_microstep: 864.95 | bwd_inner_microstep: 864.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 15:10:52,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1381.82 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418 [2024-06-10 15:10:54,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.40 | bwd_microstep: 1181.25 | bwd_inner_microstep: 1181.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 15:10:55,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.40 | bwd_microstep: 1149.89 | bwd_inner_microstep: 1149.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433 [2024-06-10 15:10:57,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.92 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488 [2024-06-10 15:10:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.66 | bwd_microstep: 1347.49 | bwd_inner_microstep: 1347.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128 [2024-06-10 15:11:00,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.86 | bwd_microstep: 924.58 | bwd_inner_microstep: 924.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 15:11:02,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1377.95 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 15:11:04,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1492.16 | bwd_inner_microstep: 1492.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 15:11:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.15 | bwd_microstep: 1480.24 | bwd_inner_microstep: 1480.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675 [2024-06-10 15:11:09,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.72 | bwd_microstep: 1715.40 | bwd_inner_microstep: 1715.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3521 [2024-06-10 15:11:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1191.93 | bwd_inner_microstep: 1191.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3520 [2024-06-10 15:11:12,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1433.93 | bwd_inner_microstep: 1433.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629 [2024-06-10 15:11:15,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1711.06 | bwd_inner_microstep: 1711.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3731 [2024-06-10 15:11:16,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.74 | bwd_microstep: 1334.25 | bwd_inner_microstep: 1334.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3512 [2024-06-10 15:11:18,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1347.50 | bwd_inner_microstep: 1347.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 15:11:20,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723 [2024-06-10 15:11:22,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1331.53 | bwd_inner_microstep: 1331.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913 [2024-06-10 15:11:23,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.85 | bwd_microstep: 686.43 | bwd_inner_microstep: 686.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 15:11:25,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.93 | bwd_microstep: 1313.36 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 15:11:27,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1376.30 | bwd_inner_microstep: 1376.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1611 [2024-06-10 15:11:27,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 244.58 | bwd_microstep: 644.20 | bwd_inner_microstep: 644.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007 [2024-06-10 15:11:29,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 899.91 | bwd_inner_microstep: 899.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3818 [2024-06-10 15:11:31,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.84 | bwd_microstep: 1752.32 | bwd_inner_microstep: 1752.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 15:11:33,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1244.47 | bwd_inner_microstep: 1244.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-10 15:11:35,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.93 | bwd_microstep: 1590.20 | bwd_inner_microstep: 1590.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239 [2024-06-10 15:11:40,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.61 [2024-06-10 15:11:40,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.27 | bwd_microstep: 4648.00 | bwd_inner_microstep: 1089.34 | bwd_allreduce_microstep: 3558.59 | step_microstep: 39.05 [2024-06-10 15:11:40,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15490.30 | bwd: 44963.08 | bwd_inner: 41403.57 | bwd_allreduce: 3558.84 | step: 40.61 {'loss': 1.2329, 'learning_rate': 2.1724418523116534e-05, 'epoch': 0.49} it] 49%|████▊ | 838/1726 [14:29:14<15:04:24, 61.11s/it] 49%|████▊ | 839/1726 [14:30:14<14:57:42, 60.72s/it] 49%|████▊ | 839/1726 [14:30:14<14:57:42, 60.72s/it] 49%|████▊ | 840/1726 [14:31:15<14:56:19, 60.70s/it] 49%|████▊ | 840/1726 [14:31:15<14:56:19, 60.70s/it] 49%|████▊ | 841/1726 [14:32:15<14:54:18, 60.63s/it] 49%|████▊ | 841/1726 [14:32:15<14:54:18, 60.63s/it] 49%|████▉ | 842/1726 [14:33:16<14:53:24, 60.64s/it] 49%|████▉ | 842/1726 [14:33:16<14:53:24, 60.64s/it] 49%|████▉ | 843/1726 [14:34:17<14:53:03, 60.68s/it] 4dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 15:11:42,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.27 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3969 [2024-06-10 15:11:44,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.40 | bwd_microstep: 1497.94 | bwd_inner_microstep: 1497.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 15:11:46,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.16 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-10 15:11:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 787.75 | bwd_inner_microstep: 787.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 15:11:49,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.63 | bwd_microstep: 1479.77 | bwd_inner_microstep: 1479.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475 [2024-06-10 15:11:51,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1407.99 | bwd_inner_microstep: 1407.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 15:11:52,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.31 | bwd_microstep: 808.24 | bwd_inner_microstep: 808.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 15:11:54,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1402.02 | bwd_inner_microstep: 1401.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2482 [2024-06-10 15:11:56,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.62 | bwd_microstep: 926.02 | bwd_inner_microstep: 926.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2394 [2024-06-10 15:11:57,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.42 | bwd_microstep: 1119.80 | bwd_inner_microstep: 1119.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977 [2024-06-10 15:11:58,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.97 | bwd_microstep: 894.54 | bwd_inner_microstep: 894.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3839 [2024-06-10 15:12:01,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.87 | bwd_microstep: 1715.01 | bwd_inner_microstep: 1714.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3443 [2024-06-10 15:12:03,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3506 [2024-06-10 15:12:05,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1335.08 | bwd_inner_microstep: 1335.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-10 15:12:06,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1414.69 | bwd_inner_microstep: 1414.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 15:12:08,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3645 [2024-06-10 15:12:10,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.63 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525 [2024-06-10 15:12:12,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.72 | bwd_microstep: 1320.29 | bwd_inner_microstep: 1320.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 15:12:14,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.83 | bwd_microstep: 1354.62 | bwd_inner_microstep: 1354.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-10 15:12:16,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1609.91 | bwd_inner_microstep: 1609.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289 [2024-06-10 15:12:17,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.71 | bwd_microstep: 876.34 | bwd_inner_microstep: 876.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 15:12:19,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.26 | bwd_microstep: 1451.14 | bwd_inner_microstep: 1451.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805 [2024-06-10 15:12:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.54 | bwd_microstep: 1686.93 | bwd_inner_microstep: 1686.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.00 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 15:12:24,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2192 [2024-06-10 15:12:25,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.70 | bwd_microstep: 1051.26 | bwd_inner_microstep: 1051.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 15:12:27,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.33 | bwd_microstep: 1295.50 | bwd_inner_microstep: 1295.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 15:12:29,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775 [2024-06-10 15:12:31,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.52 | bwd_microstep: 1482.73 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 15:12:33,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.62 | bwd_microstep: 1653.50 | bwd_inner_microstep: 1653.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 15:12:35,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1499.25 | bwd_inner_microstep: 1499.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 15:12:37,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.21 | bwd_microstep: 1275.31 | bwd_inner_microstep: 1275.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-10 15:12:40,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-10 15:12:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1846.87 | bwd_inner_microstep: 1573.45 | bwd_allreduce_microstep: 273.37 | step_microstep: 37.69 [2024-06-10 15:12:40,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16013.91 | bwd: 43174.26 | bwd_inner: 42899.99 | bwd_allreduce: 273.60 | step: 41.16 {'loss': 1.2126, 'learning_rate': 2.1687021320860893e-05, 'epoch': 0.49} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-10 15:12:41,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.96 | bwd_microstep: 1267.99 | bwd_inner_microstep: 1267.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3895 [2024-06-10 15:12:43,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3890 [2024-06-10 15:12:45,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 15:12:47,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1349.06 | bwd_inner_microstep: 1349.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 15:12:48,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.07 | bwd_microstep: 873.41 | bwd_inner_microstep: 873.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-10 15:12:50,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.71 | bwd_microstep: 1180.33 | bwd_inner_microstep: 1180.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2464 [2024-06-10 15:12:51,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.80 | bwd_microstep: 948.70 | bwd_inner_microstep: 948.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901 [2024-06-10 15:12:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.10 | bwd_microstep: 712.14 | bwd_inner_microstep: 712.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 15:12:54,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.19 | bwd_microstep: 1181.07 | bwd_inner_microstep: 1181.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3718 [2024-06-10 15:12:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1332.25 | bwd_inner_microstep: 1332.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 15:12:57,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 812.50 | bwd_inner_microstep: 812.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 15:12:59,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1339.66 | bwd_inner_microstep: 1339.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 15:13:00,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1292.40 | bwd_inner_microstep: 1292.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3521 [2024-06-10 15:13:02,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1415.36 | bwd_inner_microstep: 1415.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 15:13:04,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.78 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 15:13:06,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 15:13:08,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1247.20 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000 [2024-06-10 15:13:09,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.17 | bwd_microstep: 801.94 | bwd_inner_microstep: 801.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 15:13:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828 [2024-06-10 15:13:13,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.35 | bwd_microstep: 1358.03 | bwd_inner_microstep: 1358.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 15:13:15,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.71 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-10 15:13:16,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.71 | bwd_microstep: 696.42 | bwd_inner_microstep: 696.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 15:13:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.83 | bwd_microstep: 1482.89 | bwd_inner_microstep: 1482.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3545 [2024-06-10 15:13:20,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 15:13:21,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1296.68 | bwd_inner_microstep: 1296.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 15:13:24,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1651.41 | bwd_inner_microstep: 1651.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 15:13:25,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.00 | bwd_microstep: 1181.71 | bwd_inner_microstep: 1181.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844 [2024-06-10 15:13:27,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1468.42 | bwd_inner_microstep: 1468.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3725 [2024-06-10 15:13:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1416.61 | bwd_inner_microstep: 1416.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 15:13:31,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.41 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 15:13:33,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066 [2024-06-10 15:13:40,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.64 | optimizer_gradients: 4.29 | optimizer_step: 6.61 [2024-06-10 15:13:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.07 | bwd_microstep: 5940.13 | bwd_inner_microstep: 938.30 | bwd_allreduce_microstep: 5001.76 | step_microstep: 40.21 [2024-06-10 15:13:40,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14967.09 | bwd: 44785.07 | bwd_inner: 39782.40 | bwd_allreduce: 5002.00 | step: 41.68 {'loss': 1.2146, 'learning_rate': 2.164961817692494e-05, 'epoch': 0.49} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 15:13:41,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.35 | bwd_microstep: 1273.43 | bwd_inner_microstep: 1273.35 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3943 [2024-06-10 15:13:44,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.58 | bwd_microstep: 1592.52 | bwd_inner_microstep: 1592.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2323 [2024-06-10 15:13:45,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.33 | bwd_microstep: 981.26 | bwd_inner_microstep: 981.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797 [2024-06-10 15:13:47,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1544.96 | bwd_inner_microstep: 1544.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2047 [2024-06-10 15:13:48,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.10 | bwd_microstep: 780.90 | bwd_inner_microstep: 780.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 15:13:50,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.16 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 15:13:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1218.37 | bwd_inner_microstep: 1218.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 15:13:54,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1279.65 | bwd_inner_microstep: 1279.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 15:13:55,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1353.53 | bwd_inner_microstep: 1353.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3986 [2024-06-10 15:13:58,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1536.87 | bwd_inner_microstep: 1536.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440 [2024-06-10 15:13:59,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1311.86 | bwd_inner_microstep: 1311.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488 [2024-06-10 15:14:01,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.07 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3633 [2024-06-10 15:14:04,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3818 [2024-06-10 15:14:06,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.54 | bwd_microstep: 1856.24 | bwd_inner_microstep: 1856.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650 [2024-06-10 15:14:08,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.67 | bwd_microstep: 1522.10 | bwd_inner_microstep: 1522.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 15:14:10,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1514.09 | bwd_inner_microstep: 1514.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-10 15:14:13,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.95 | bwd_microstep: 1624.22 | bwd_inner_microstep: 1624.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-10 15:14:15,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1509.86 | bwd_inner_microstep: 1509.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 15:14:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.96 | bwd_microstep: 801.73 | bwd_inner_microstep: 801.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 15:14:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1493.72 | bwd_inner_microstep: 1493.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2282 [2024-06-10 15:14:19,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.27 | bwd_microstep: 812.85 | bwd_inner_microstep: 812.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 15:14:21,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 15:14:23,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1355.10 | bwd_inner_microstep: 1355.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-10 15:14:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1606.73 | bwd_inner_microstep: 1606.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 15:14:26,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.45 | bwd_microstep: 697.15 | bwd_inner_microstep: 697.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 15:14:28,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1553.35 | bwd_inner_microstep: 1553.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3587 [2024-06-10 15:14:30,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1240.68 | bwd_inner_microstep: 1240.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 15:14:32,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.89 | bwd_microstep: 1505.49 | bwd_inner_microstep: 1505.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716 [2024-06-10 15:14:34,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.60 | bwd_microstep: 1732.54 | bwd_inner_microstep: 1732.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 15:14:36,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.43 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3384 [2024-06-10 15:14:38,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.70 | bwd_microstep: 1366.96 | bwd_inner_microstep: 1366.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 15:14:42,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 15:14:42,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 2833.61 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 1306.26 | step_microstep: 37.81 [2024-06-10 15:14:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16372.80 | bwd: 45392.42 | bwd_inner: 44085.20 | bwd_allreduce: 1306.53 | step: 39.35 {'loss': 1.2301, 'learning_rate': 2.1612209223042346e-05, 'epoch': 0.49} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 15:14:44,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904 [2024-06-10 15:14:46,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.09 | bwd_microstep: 1585.99 | bwd_inner_microstep: 1585.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 15:14:48,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.58 | bwd_microstep: 1443.23 | bwd_inner_microstep: 1443.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 15:14:50,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424 [2024-06-10 15:14:51,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.10 | bwd_microstep: 1150.57 | bwd_inner_microstep: 1150.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 15:14:53,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1302.78 | bwd_inner_microstep: 1302.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 15:14:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2672 [2024-06-10 15:14:56,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.56 | bwd_microstep: 1118.95 | bwd_inner_microstep: 1118.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 15:14:58,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.72 | bwd_inner_microstep: 1443.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2315 [2024-06-10 15:15:00,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2993 [2024-06-10 15:15:01,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.00 | bwd_microstep: 1201.14 | bwd_inner_microstep: 1201.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 15:15:03,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1341.92 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-10 15:15:05,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1313.56 | bwd_inner_microstep: 1313.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1980 [2024-06-10 15:15:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.93 | bwd_microstep: 831.30 | bwd_inner_microstep: 831.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2902 [2024-06-10 15:15:08,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.96 | bwd_microstep: 1154.74 | bwd_inner_microstep: 1154.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2102 [2024-06-10 15:15:09,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.38 | bwd_microstep: 1018.13 | bwd_inner_microstep: 1018.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513 [2024-06-10 15:15:11,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.14 | bwd_microstep: 1317.60 | bwd_inner_microstep: 1317.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676 [2024-06-10 15:15:13,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.81 | bwd_microstep: 1423.34 | bwd_inner_microstep: 1423.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 15:15:15,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 15:15:17,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1395.70 | bwd_inner_microstep: 1395.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 15:15:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1248.17 | bwd_inner_microstep: 1248.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 15:15:21,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1379.91 | bwd_inner_microstep: 1379.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2214 [2024-06-10 15:15:22,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.43 | bwd_microstep: 860.70 | bwd_inner_microstep: 860.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 15:15:24,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 15:15:25,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1281.45 | bwd_inner_microstep: 1281.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4000 [2024-06-10 15:15:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.40 | bwd_microstep: 1642.23 | bwd_inner_microstep: 1642.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 15:15:30,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1608.39 | bwd_inner_microstep: 1608.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1400 [2024-06-10 15:15:31,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.48 | bwd_microstep: 525.83 | bwd_inner_microstep: 525.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602 [2024-06-10 15:15:33,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.25 | bwd_microstep: 1438.31 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 15:15:34,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.42 | bwd_microstep: 977.33 | bwd_inner_microstep: 977.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-10 15:15:36,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.41 | bwd_microstep: 1644.60 | bwd_inner_microstep: 1644.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3604 [2024-06-10 15:15:41,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 15:15:41,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 4570.84 | bwd_inner_microstep: 1923.80 | bwd_allreduce_microstep: 2646.99 | step_microstep: 37.84 [2024-06-10 15:15:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.01 | bwd: 44037.82 | bwd_inner: 41389.92 | bwd_allreduce: 2647.22 | step: 39.45 {'loss': 1.2385, 'learning_rate': 2.157479459096724e-05, 'epoch': 0.49} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-10 15:15:44,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1579.64 | bwd_inner_microstep: 1579.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3933 [2024-06-10 15:15:46,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.33 | bwd_microstep: 1455.93 | bwd_inner_microstep: 1455.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486 [2024-06-10 15:15:48,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.03 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-10 15:15:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.84 | bwd_microstep: 1546.83 | bwd_inner_microstep: 1546.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2020 [2024-06-10 15:15:51,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.79 | bwd_microstep: 741.33 | bwd_inner_microstep: 741.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4121 [2024-06-10 15:15:53,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.60 | bwd_microstep: 1737.89 | bwd_inner_microstep: 1737.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 15:15:55,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.31 | bwd_microstep: 1281.93 | bwd_inner_microstep: 1281.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496 [2024-06-10 15:15:57,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.33 | bwd_microstep: 1219.44 | bwd_inner_microstep: 1219.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-10 15:15:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.05 | bwd_microstep: 1309.67 | bwd_inner_microstep: 1309.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4068 [2024-06-10 15:16:01,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.71 | bwd_microstep: 1726.00 | bwd_inner_microstep: 1725.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2635 [2024-06-10 15:16:02,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.17 | bwd_microstep: 1048.72 | bwd_inner_microstep: 1048.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 15:16:04,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.56 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 15:16:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 15:16:08,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 15:16:10,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1377.42 | bwd_inner_microstep: 1377.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 15:16:12,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.39 | bwd_inner_microstep: 1297.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 15:16:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.56 | bwd_microstep: 1451.80 | bwd_inner_microstep: 1451.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 15:16:16,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 15:16:18,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1511.09 | bwd_inner_microstep: 1511.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 15:16:20,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.49 | bwd_microstep: 1656.32 | bwd_inner_microstep: 1656.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 15:16:22,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2676 [2024-06-10 15:16:23,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.54 | bwd_microstep: 1027.23 | bwd_inner_microstep: 1027.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 15:16:25,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1433.55 | bwd_inner_microstep: 1433.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3687 [2024-06-10 15:16:27,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.04 | bwd_microstep: 1477.94 | bwd_inner_microstep: 1477.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071 [2024-06-10 15:16:29,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.02 | bwd_microstep: 914.66 | bwd_inner_microstep: 914.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2677 [2024-06-10 15:16:30,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.65 | bwd_microstep: 1122.98 | bwd_inner_microstep: 1122.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 15:16:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1318.28 | bwd_inner_microstep: 1318.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473 [2024-06-10 15:16:34,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1325.46 | bwd_inner_microstep: 1325.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442 [2024-06-10 15:16:36,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1447.94 | bwd_inner_microstep: 1447.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 15:16:38,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588 [2024-06-10 15:16:40,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.80 | bwd_microstep: 1603.09 | bwd_inner_microstep: 1603.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3731 [2024-06-10 15:16:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.13 | optimizer_step: 6.60 [2024-06-10 15:16:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1406.98 | bwd_allreduce_microstep: 246.01 | step_microstep: 37.77 [2024-06-10 15:16:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16402.18 | bwd: 44203.45 | bwd_inner: 43956.55 | bwd_allreduce: 246.24 | step: 39.21 {'loss': 1.2467, 'learning_rate': 2.1537374412473773e-05, 'epoch': 0.49} 9%|████▉ | 843/1726 [14:34:17<14:53:03, 60.68s/it] 49%|████▉ | 844/1726 [14:35:16<14:46:56, 60.34s/it] 49%|████▉ | 844/1726 [14:35:16<14:46:56, 60.34s/it] 49%|████▉ | 845/1726 [14:36:16<14:44:48, 60.26s/it] 49%|████▉ | 845/1726 [14:36:16<14:44:48, 60.26s/it] 49%|████▉ | 846/1726 [14:37:18<14:51:54, 60.81s/it] 49%|████▉ | 846/1726 [14:37:18<14:51:54, 60.81s/it] 49%|████▉ | 847/1726 [14:38:18<14:45:59, 60.48s/it] 49%|████▉ | 847/1726 [14:38:18<14:45:59, 60.48s/it] 49%|████▉ | 848/1726 [14:39:19<14:46:59, 60.61s/it] 49%|████▉ | 848/1726 [14:39:19<14:46:59, 60.61s/it]dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473 [2024-06-10 15:16:45,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.27 | bwd_microstep: 1575.96 | bwd_inner_microstep: 1575.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 15:16:46,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1378.09 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-10 15:16:49,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.66 | bwd_microstep: 1552.88 | bwd_inner_microstep: 1552.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 15:16:51,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1407.44 | bwd_inner_microstep: 1407.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 15:16:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1349.71 | bwd_inner_microstep: 1349.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 15:16:54,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1439.78 | bwd_inner_microstep: 1439.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 15:16:56,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.78 | bwd_microstep: 1283.89 | bwd_inner_microstep: 1283.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705 [2024-06-10 15:16:58,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.73 | bwd_microstep: 1460.23 | bwd_inner_microstep: 1460.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 15:16:59,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.52 | bwd_inner_microstep: 790.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2982 [2024-06-10 15:17:01,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.67 | bwd_microstep: 1104.82 | bwd_inner_microstep: 1104.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 15:17:03,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1345.31 | bwd_inner_microstep: 1345.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 15:17:05,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.65 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-10 15:17:07,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.21 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856 [2024-06-10 15:17:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.75 | bwd_microstep: 1666.07 | bwd_inner_microstep: 1666.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-10 15:17:11,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.49 | bwd_microstep: 1610.82 | bwd_inner_microstep: 1610.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454 [2024-06-10 15:17:13,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1156.93 | bwd_inner_microstep: 1156.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-10 15:17:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1315.17 | bwd_inner_microstep: 1315.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 15:17:16,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1418.88 | bwd_inner_microstep: 1418.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469 [2024-06-10 15:17:18,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.28 | bwd_microstep: 1437.36 | bwd_inner_microstep: 1437.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 15:17:20,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1414.09 | bwd_inner_microstep: 1414.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 15:17:22,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1278.86 | bwd_inner_microstep: 1278.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 15:17:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1450.88 | bwd_inner_microstep: 1450.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 15:17:26,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1495.63 | bwd_inner_microstep: 1495.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 15:17:28,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1350.55 | bwd_inner_microstep: 1350.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931 [2024-06-10 15:17:29,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.97 | bwd_microstep: 728.32 | bwd_inner_microstep: 728.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2293 [2024-06-10 15:17:30,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 846.88 | bwd_inner_microstep: 846.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 15:17:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.95 | bwd_microstep: 1246.50 | bwd_inner_microstep: 1246.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 15:17:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1256.07 | bwd_inner_microstep: 1256.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1897 [2024-06-10 15:17:35,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.79 | bwd_microstep: 715.07 | bwd_inner_microstep: 715.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 15:17:37,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.74 | bwd_microstep: 1551.34 | bwd_inner_microstep: 1551.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 15:17:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1497.18 | bwd_inner_microstep: 1497.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 15:17:43,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.57 | optimizer_gradients: 4.29 | optimizer_step: 6.61 [2024-06-10 15:17:43,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.02 | bwd_microstep: 4061.53 | bwd_inner_microstep: 895.58 | bwd_allreduce_microstep: 3165.90 | step_microstep: 38.09 [2024-06-10 15:17:43,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.96 | bwd: 45008.04 | bwd_inner: 41841.23 | bwd_allreduce: 3166.13 | step: 39.56 {'loss': 1.2339, 'learning_rate': 2.1499948819355626e-05, 'epoch': 0.49} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 15:17:45,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.00 | bwd_microstep: 1474.96 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898 [2024-06-10 15:17:48,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.10 | bwd_microstep: 1581.79 | bwd_inner_microstep: 1581.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2367 [2024-06-10 15:17:49,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.98 | bwd_microstep: 892.69 | bwd_inner_microstep: 892.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797 [2024-06-10 15:17:51,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1510.18 | bwd_inner_microstep: 1510.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 15:17:53,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 15:17:54,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1245.48 | bwd_inner_microstep: 1245.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 15:17:56,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.78 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 15:17:58,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 15:18:00,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1283.11 | bwd_inner_microstep: 1283.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 15:18:02,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.20 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3718 [2024-06-10 15:18:04,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1236.77 | bwd_inner_microstep: 1236.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 15:18:05,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.66 | bwd_microstep: 798.38 | bwd_inner_microstep: 798.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 15:18:07,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1530.54 | bwd_inner_microstep: 1530.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 15:18:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.93 | bwd_microstep: 1340.51 | bwd_inner_microstep: 1340.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2448 [2024-06-10 15:18:10,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.24 | bwd_microstep: 1014.80 | bwd_inner_microstep: 1014.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 15:18:12,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.55 | bwd_microstep: 1343.49 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 15:18:14,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.90 | bwd_microstep: 1370.22 | bwd_inner_microstep: 1370.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2023 [2024-06-10 15:18:15,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.70 | bwd_microstep: 963.86 | bwd_inner_microstep: 963.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 15:18:17,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1452.32 | bwd_inner_microstep: 1452.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3675 [2024-06-10 15:18:19,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1324.71 | bwd_inner_microstep: 1324.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-10 15:18:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 685.38 | bwd_inner_microstep: 685.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082 [2024-06-10 15:18:21,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.54 | bwd_microstep: 822.72 | bwd_inner_microstep: 822.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 15:18:23,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.80 | bwd_microstep: 1301.39 | bwd_inner_microstep: 1301.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 15:18:25,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1533.40 | bwd_inner_microstep: 1533.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 15:18:27,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1295.32 | bwd_inner_microstep: 1295.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593 [2024-06-10 15:18:29,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.44 | bwd_microstep: 1533.59 | bwd_inner_microstep: 1533.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-10 15:18:31,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.36 | bwd_microstep: 1575.68 | bwd_inner_microstep: 1575.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3571 [2024-06-10 15:18:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.05 | bwd_microstep: 1265.36 | bwd_inner_microstep: 1265.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3777 [2024-06-10 15:18:35,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.49 | bwd_microstep: 1794.88 | bwd_inner_microstep: 1794.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2724 [2024-06-10 15:18:37,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.48 | bwd_microstep: 1149.35 | bwd_inner_microstep: 1149.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 15:18:39,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.44 | bwd_microstep: 1590.99 | bwd_inner_microstep: 1590.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 15:18:44,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-10 15:18:44,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 3831.53 | bwd_inner_microstep: 1564.54 | bwd_allreduce_microstep: 2266.93 | step_microstep: 37.95 [2024-06-10 15:18:44,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.22 | bwd: 44200.29 | bwd_inner: 41932.33 | bwd_allreduce: 2267.23 | step: 39.48 {'loss': 1.2289, 'learning_rate': 2.1462517943425523e-05, 'epoch': 0.49} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862 [2024-06-10 15:18:44,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.80 | bwd_microstep: 668.71 | bwd_inner_microstep: 668.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 15:18:46,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1272.78 | bwd_inner_microstep: 1272.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 15:18:48,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.34 | bwd_microstep: 1148.71 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:18:50,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 15:18:52,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1293.52 | bwd_inner_microstep: 1293.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2638 [2024-06-10 15:18:53,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.68 | bwd_microstep: 1017.71 | bwd_inner_microstep: 1017.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 15:18:55,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720 [2024-06-10 15:18:57,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1332.63 | bwd_inner_microstep: 1332.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 15:18:58,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1248.52 | bwd_inner_microstep: 1248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 15:19:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 15:19:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.23 | bwd_microstep: 1487.54 | bwd_inner_microstep: 1487.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3496 [2024-06-10 15:19:04,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1514.24 | bwd_inner_microstep: 1514.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384 [2024-06-10 15:19:06,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1237.48 | bwd_inner_microstep: 1237.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-10 15:19:08,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.64 | bwd_microstep: 1582.27 | bwd_inner_microstep: 1582.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 15:19:10,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.59 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 15:19:12,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.31 | bwd_microstep: 1473.43 | bwd_inner_microstep: 1473.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 15:19:14,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977 [2024-06-10 15:19:15,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.20 | bwd_microstep: 826.17 | bwd_inner_microstep: 826.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 15:19:17,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1287.06 | bwd_inner_microstep: 1287.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658 [2024-06-10 15:19:19,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.83 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-10 15:19:21,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1548.72 | bwd_inner_microstep: 1548.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 15:19:23,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1384.56 | bwd_inner_microstep: 1384.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 15:19:24,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.76 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 15:19:26,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1376.72 | bwd_inner_microstep: 1376.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 15:19:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.64 | bwd_microstep: 1354.98 | bwd_inner_microstep: 1354.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3539 [2024-06-10 15:19:30,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1360.33 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 15:19:32,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.16 | bwd_microstep: 1463.57 | bwd_inner_microstep: 1463.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 15:19:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1395.00 | bwd_inner_microstep: 1394.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3775 [2024-06-10 15:19:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3425 [2024-06-10 15:19:37,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.27 | bwd_microstep: 1200.64 | bwd_inner_microstep: 1200.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766 [2024-06-10 15:19:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.64 | bwd_microstep: 1609.51 | bwd_inner_microstep: 1609.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 15:19:46,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.62 [2024-06-10 15:19:46,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.83 | bwd_microstep: 5740.56 | bwd_inner_microstep: 2176.29 | bwd_allreduce_microstep: 3564.22 | step_microstep: 37.91 [2024-06-10 15:19:46,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15841.48 | bwd: 46285.12 | bwd_inner: 42719.97 | bwd_allreduce: 3564.46 | step: 39.44 {'loss': 1.2425, 'learning_rate': 2.1425081916514827e-05, 'epoch': 0.49} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505 [2024-06-10 15:19:48,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1338.10 | bwd_inner_microstep: 1338.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 15:19:50,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1242.14 | bwd_inner_microstep: 1242.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 15:19:51,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 15:19:54,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1653.30 | bwd_inner_microstep: 1653.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-10 15:19:56,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.13 | bwd_microstep: 1646.30 | bwd_inner_microstep: 1646.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868 [2024-06-10 15:19:57,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 709.09 | bwd_inner_microstep: 709.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3483 [2024-06-10 15:19:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.35 | bwd_microstep: 1244.87 | bwd_inner_microstep: 1244.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-10 15:20:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.11 | bwd_microstep: 809.72 | bwd_inner_microstep: 809.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 15:20:02,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1249.13 | bwd_inner_microstep: 1249.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-10 15:20:04,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.69 | bwd_microstep: 1542.15 | bwd_inner_microstep: 1542.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 15:20:05,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 794.42 | bwd_inner_microstep: 794.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 15:20:07,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.91 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3991 [2024-06-10 15:20:09,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1543.64 | bwd_inner_microstep: 1543.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 15:20:11,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3428 [2024-06-10 15:20:12,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.31 | bwd_microstep: 1187.20 | bwd_inner_microstep: 1187.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 15:20:14,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.47 | bwd_microstep: 1300.41 | bwd_inner_microstep: 1300.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 15:20:16,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1292.23 | bwd_inner_microstep: 1292.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 15:20:17,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.04 | bwd_microstep: 820.11 | bwd_inner_microstep: 820.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 15:20:19,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1389.28 | bwd_inner_microstep: 1389.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 15:20:21,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546 [2024-06-10 15:20:22,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1202.28 | bwd_inner_microstep: 1202.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3624 [2024-06-10 15:20:24,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.84 | bwd_microstep: 1215.95 | bwd_inner_microstep: 1215.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 15:20:26,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1293.13 | bwd_inner_microstep: 1293.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3572 [2024-06-10 15:20:28,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.17 | bwd_microstep: 1235.87 | bwd_inner_microstep: 1235.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 15:20:29,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.11 | bwd_microstep: 1288.41 | bwd_inner_microstep: 1288.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-10 15:20:31,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.31 | bwd_microstep: 1373.08 | bwd_inner_microstep: 1373.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2067 [2024-06-10 15:20:32,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.32 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 15:20:34,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1344.06 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 15:20:36,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1373.84 | bwd_inner_microstep: 1373.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 15:20:38,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.92 | bwd_microstep: 1500.15 | bwd_inner_microstep: 1500.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3628 [2024-06-10 15:20:41,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.65 | bwd_microstep: 1708.08 | bwd_inner_microstep: 1708.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2460 [2024-06-10 15:20:46,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.31 | optimizer_step: 6.59 [2024-06-10 15:20:46,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.15 | bwd_microstep: 4615.78 | bwd_inner_microstep: 1190.23 | bwd_allreduce_microstep: 3425.50 | step_microstep: 38.32 [2024-06-10 15:20:46,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.93 | bwd: 44135.17 | bwd_inner: 40708.77 | bwd_allreduce: 3425.72 | step: 39.82 {'loss': 1.2225, 'learning_rate': 2.1387640870473033e-05, 'epoch': 0.49} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 15:20:47,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1239.11 | bwd_inner_microstep: 1239.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3983 [2024-06-10 15:20:49,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1370.00 | bwd_inner_microstep: 1369.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3864 [2024-06-10 15:20:52,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1590.84 | bwd_inner_microstep: 1590.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-10 15:20:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.68 | bwd_microstep: 970.74 | bwd_inner_microstep: 970.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733 [2024-06-10 15:20:55,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.11 | bwd_microstep: 1629.38 | bwd_inner_microstep: 1629.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 15:20:57,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1250.29 | bwd_inner_microstep: 1250.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1887 [2024-06-10 15:20:58,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.56 | bwd_microstep: 712.20 | bwd_inner_microstep: 712.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3725 [2024-06-10 15:21:00,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1268.20 | bwd_inner_microstep: 1268.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 15:21:02,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1390.71 | bwd_inner_microstep: 1390.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 15:21:03,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1380.17 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416 [2024-06-10 15:21:05,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.55 | bwd_microstep: 1307.45 | bwd_inner_microstep: 1307.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 15:21:07,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1406.78 | bwd_inner_microstep: 1406.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 15:21:09,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.26 | bwd_microstep: 1625.15 | bwd_inner_microstep: 1625.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3483 [2024-06-10 15:21:11,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1365.32 | bwd_inner_microstep: 1365.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3449 [2024-06-10 15:21:13,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1498.07 | bwd_inner_microstep: 1498.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 15:21:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1483.67 | bwd_inner_microstep: 1483.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668 [2024-06-10 15:21:17,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.61 | bwd_microstep: 1456.33 | bwd_inner_microstep: 1456.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-10 15:21:20,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1613.88 | bwd_inner_microstep: 1613.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 15:21:22,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1510.39 | bwd_inner_microstep: 1510.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 15:21:24,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1395.21 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 15:21:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1478.51 | bwd_inner_microstep: 1478.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-10 15:21:27,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 806.60 | bwd_inner_microstep: 806.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 15:21:29,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1554.89 | bwd_inner_microstep: 1554.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 15:21:31,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1284.64 | bwd_inner_microstep: 1284.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078 [2024-06-10 15:21:32,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.78 | bwd_microstep: 818.73 | bwd_inner_microstep: 818.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2050 [2024-06-10 15:21:33,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.30 | bwd_microstep: 942.34 | bwd_inner_microstep: 942.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2227 [2024-06-10 15:21:35,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.05 | bwd_microstep: 992.34 | bwd_inner_microstep: 992.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3606 [2024-06-10 15:21:36,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1371.19 | bwd_inner_microstep: 1371.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-10 15:21:38,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1423.07 | bwd_inner_microstep: 1423.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032 [2024-06-10 15:21:40,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.07 | bwd_microstep: 901.46 | bwd_inner_microstep: 901.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-10 15:21:42,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1643.87 | bwd_inner_microstep: 1643.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-10 15:21:47,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 15:21:47,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.61 | bwd_microstep: 4165.19 | bwd_inner_microstep: 1866.12 | bwd_allreduce_microstep: 2299.02 | step_microstep: 38.01 [2024-06-10 15:21:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15800.67 | bwd: 44846.72 | bwd_inner: 42546.80 | bwd_allreduce: 2299.25 | step: 39.49 {'loss': 1.2137, 'learning_rate': 2.1350194937167307e-05, 'epoch': 0.49} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 15:21:49,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1474.07 | bwd_inner_microstep: 1474.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3966 [2024-06-10 15:21:51,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.89 | bwd_microstep: 1597.22 | bwd_inner_microstep: 1597.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 15:21:52,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.96 | bwd_microstep: 792.12 | bwd_inner_microstep: 792.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3843 [2024-06-10 15:21:54,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1493.15 | bwd_inner_microstep: 1493.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 15:21:56,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 15:21:58,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.66 | bwd_microstep: 1351.64 | bwd_inner_microstep: 1351.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 15:22:00,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1349.43 | bwd_inner_microstep: 1349.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 15:22:01,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1254.75 | bwd_inner_microstep: 1254.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 15:22:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1395.64 | bwd_inner_microstep: 1395.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 15:22:05,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1253.59 | bwd_inner_microstep: 1253.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495 [2024-06-10 15:22:07,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.81 | bwd_microstep: 1217.62 | bwd_inner_microstep: 1217.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 15:22:08,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 796.02 | bwd_inner_microstep: 795.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495 [2024-06-10 15:22:10,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1575.77 | bwd_inner_microstep: 1575.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3713 [2024-06-10 15:22:12,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.57 | bwd_microstep: 1727.62 | bwd_inner_microstep: 1727.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1574 [2024-06-10 15:22:13,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 220.65 | bwd_microstep: 574.33 | bwd_inner_microstep: 574.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017 [2024-06-10 15:22:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.37 | bwd_microstep: 900.80 | bwd_inner_microstep: 900.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683 [2024-06-10 15:22:17,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1722.82 | bwd_inner_microstep: 1722.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637 [2024-06-10 15:22:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1407.26 | bwd_inner_microstep: 1407.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3595 [2024-06-10 15:22:20,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 15:22:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1511.72 | bwd_inner_microstep: 1511.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622 [2024-06-10 15:22:25,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1613.16 | bwd_inner_microstep: 1613.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-10 15:22:27,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.69 | bwd_microstep: 1542.20 | bwd_inner_microstep: 1542.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019 [2024-06-10 15:22:28,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.15 | bwd_microstep: 839.33 | bwd_inner_microstep: 839.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998 [2024-06-10 15:22:29,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.98 | bwd_microstep: 709.04 | bwd_inner_microstep: 709.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693 [2024-06-10 15:22:31,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.30 | bwd_microstep: 1727.87 | bwd_inner_microstep: 1727.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3811 [2024-06-10 15:22:34,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.83 | bwd_microstep: 1685.49 | bwd_inner_microstep: 1685.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 15:22:35,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1299.48 | bwd_inner_microstep: 1299.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3601 [2024-06-10 15:22:38,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.51 | bwd_microstep: 1805.10 | bwd_inner_microstep: 1805.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811 [2024-06-10 15:22:40,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.30 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2246 [2024-06-10 15:22:42,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.62 | bwd_microstep: 932.32 | bwd_inner_microstep: 932.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 15:22:44,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 1504.24 | bwd_inner_microstep: 1504.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589 [2024-06-10 15:22:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.44 | optimizer_step: 6.63 [2024-06-10 15:22:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1643.79 | bwd_inner_microstep: 1635.70 | bwd_allreduce_microstep: 8.03 | step_microstep: 42.77 [2024-06-10 15:22:46,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15961.43 | bwd: 42939.88 | bwd_inner: 42930.92 | bwd_allreduce: 8.26 | step: 44.20 49%|████▉ | 849/1726 [14:40:20<14:47:32, 60.72s/it] 49%|████▉ | 849/1726 [14:40:20<14:47:32, 60.72s/it] 49%|████▉ | 850/1726 [14:41:20<14:44:09, 60.56s/it] 49%|████▉ | 850/1726 [14:41:20<14:44:09, 60.56s/it] 49%|████▉ | 851/1726 [14:42:23<14:51:28, 61.13s/it] 49%|████▉ | 851/1726 [14:42:23<14:51:28, 61.13s/it] 49%|████▉ | 852/1726 [14:43:22<14:44:19, 60.71s/it] 49%|████▉ | 852/1726 [14:43:22<14:44:19, 60.71s/it] 49%|████▉ | 853/1726 [14:44:23<14:44:28, 60.79s/it] 49%|████▉ | 853/1726 [14:44:23<14:44:28, 60.79s/it] 49%|████▉ | 854/1726 [14:45:23<14:36:43, 60.33s/{'loss': 1.2028, 'learning_rate': 2.1312744248482035e-05, 'epoch': 0.49} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 15:22:48,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1284.64 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3928 [2024-06-10 15:22:50,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.63 | bwd_microstep: 1692.51 | bwd_inner_microstep: 1692.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892 [2024-06-10 15:22:52,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1586.20 | bwd_inner_microstep: 1586.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787 [2024-06-10 15:22:54,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.02 | bwd_microstep: 1443.46 | bwd_inner_microstep: 1443.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768 [2024-06-10 15:22:56,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1473.11 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 15:22:58,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.03 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 15:23:00,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936 [2024-06-10 15:23:02,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.84 | bwd_microstep: 1591.82 | bwd_inner_microstep: 1591.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716 [2024-06-10 15:23:04,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1463.54 | bwd_inner_microstep: 1463.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704 [2024-06-10 15:23:06,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.38 | bwd_microstep: 1626.33 | bwd_inner_microstep: 1626.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445 [2024-06-10 15:23:08,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1158.08 | bwd_inner_microstep: 1158.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 15:23:10,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.21 | bwd_microstep: 1386.75 | bwd_inner_microstep: 1386.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 15:23:12,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.05 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3537 [2024-06-10 15:23:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.51 | bwd_microstep: 1520.52 | bwd_inner_microstep: 1520.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3679 [2024-06-10 15:23:16,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1486.66 | bwd_inner_microstep: 1486.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925 [2024-06-10 15:23:17,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.99 | bwd_microstep: 819.16 | bwd_inner_microstep: 819.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641 [2024-06-10 15:23:19,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1579.67 | bwd_inner_microstep: 1579.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-10 15:23:20,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.68 | bwd_microstep: 796.59 | bwd_inner_microstep: 796.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 15:23:22,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1283.16 | bwd_inner_microstep: 1283.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2091 [2024-06-10 15:23:23,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.36 | bwd_microstep: 880.81 | bwd_inner_microstep: 880.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2688 [2024-06-10 15:23:25,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.99 | bwd_microstep: 1026.63 | bwd_inner_microstep: 1026.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827 [2024-06-10 15:23:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.42 | bwd_microstep: 1389.11 | bwd_inner_microstep: 1389.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 15:23:28,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1294.77 | bwd_inner_microstep: 1294.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3800 [2024-06-10 15:23:31,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.50 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1985 [2024-06-10 15:23:32,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.73 | bwd_microstep: 891.33 | bwd_inner_microstep: 891.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396 [2024-06-10 15:23:34,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1276.01 | bwd_inner_microstep: 1275.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-10 15:23:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.23 | bwd_microstep: 1449.04 | bwd_inner_microstep: 1449.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 15:23:38,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1497.73 | bwd_inner_microstep: 1497.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 15:23:40,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.53 | bwd_microstep: 1659.47 | bwd_inner_microstep: 1659.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1150 [2024-06-10 15:23:41,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 173.31 | bwd_microstep: 448.40 | bwd_inner_microstep: 448.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-10 15:23:43,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.56 | bwd_microstep: 1539.65 | bwd_inner_microstep: 1539.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009 [2024-06-10 15:23:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 15:23:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.06 | bwd_microstep: 3780.42 | bwd_inner_microstep: 839.47 | bwd_allreduce_microstep: 2940.89 | step_microstep: 37.96 [2024-06-10 15:23:47,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15639.94 | bwd: 44844.86 | bwd_inner: 41902.92 | bwd_allreduce: 2941.20 | step: 39.50 {'loss': 1.2404, 'learning_rate': 2.1275288936318334e-05, 'epoch': 0.5} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 15:23:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1466.45 | bwd_inner_microstep: 1466.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911 [2024-06-10 15:23:51,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.05 | bwd_microstep: 1583.85 | bwd_inner_microstep: 1583.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 15:23:52,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.06 | bwd_inner_microstep: 676.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3840 [2024-06-10 15:23:54,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1457.65 | bwd_inner_microstep: 1457.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 15:23:56,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.13 | bwd_microstep: 1639.50 | bwd_inner_microstep: 1639.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 15:23:58,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1383.29 | bwd_inner_microstep: 1383.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485 [2024-06-10 15:24:00,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.56 | bwd_microstep: 1186.93 | bwd_inner_microstep: 1186.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1958 [2024-06-10 15:24:01,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.43 | bwd_microstep: 824.90 | bwd_inner_microstep: 824.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 902 [2024-06-10 15:24:01,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 154.83 | bwd_microstep: 403.62 | bwd_inner_microstep: 403.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3379 [2024-06-10 15:24:03,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.60 | bwd_microstep: 1302.64 | bwd_inner_microstep: 1302.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2188 [2024-06-10 15:24:05,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.82 | bwd_microstep: 955.50 | bwd_inner_microstep: 955.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1882 [2024-06-10 15:24:06,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.61 | bwd_microstep: 773.21 | bwd_inner_microstep: 773.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 15:24:07,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.61 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 15:24:09,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1387.86 | bwd_inner_microstep: 1387.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 15:24:11,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1345.58 | bwd_inner_microstep: 1345.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2147 [2024-06-10 15:24:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.05 | bwd_microstep: 1054.96 | bwd_inner_microstep: 1054.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 15:24:15,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1338.90 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496 [2024-06-10 15:24:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.48 | bwd_inner_microstep: 1345.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 15:24:18,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1342.01 | bwd_inner_microstep: 1341.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 15:24:20,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1250.30 | bwd_inner_microstep: 1250.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683 [2024-06-10 15:24:22,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1329.52 | bwd_inner_microstep: 1329.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 15:24:24,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.20 | bwd_inner_microstep: 1298.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 15:24:26,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 15:24:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.22 | bwd_microstep: 1460.15 | bwd_inner_microstep: 1460.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 15:24:29,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.61 | bwd_microstep: 1299.78 | bwd_inner_microstep: 1299.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2300 [2024-06-10 15:24:31,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.58 | bwd_microstep: 979.55 | bwd_inner_microstep: 979.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292 [2024-06-10 15:24:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 976.15 | bwd_inner_microstep: 976.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 15:24:34,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1279.33 | bwd_inner_microstep: 1279.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 15:24:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1254.25 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2268 [2024-06-10 15:24:37,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.72 | bwd_microstep: 876.20 | bwd_inner_microstep: 876.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-10 15:24:39,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.72 | bwd_microstep: 1356.85 | bwd_inner_microstep: 1356.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585 [2024-06-10 15:24:50,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.38 | optimizer_step: 6.61 [2024-06-10 15:24:50,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.79 | bwd_microstep: 10947.31 | bwd_inner_microstep: 1725.22 | bwd_allreduce_microstep: 9222.02 | step_microstep: 39.01 [2024-06-10 15:24:50,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14621.34 | bwd: 48454.89 | bwd_inner: 39231.94 | bwd_allreduce: 9222.26 | step: 40.52 {'loss': 1.2363, 'learning_rate': 2.123782913259364e-05, 'epoch': 0.5} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 15:24:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1370.02 | bwd_inner_microstep: 1369.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 15:24:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.57 | bwd_microstep: 1268.45 | bwd_inner_microstep: 1268.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 15:24:56,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1287.68 | bwd_inner_microstep: 1287.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2432 [2024-06-10 15:24:57,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.07 | bwd_microstep: 1031.78 | bwd_inner_microstep: 1031.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866 [2024-06-10 15:24:58,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.32 | bwd_microstep: 706.26 | bwd_inner_microstep: 706.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 15:25:00,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1390.69 | bwd_inner_microstep: 1390.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219 [2024-06-10 15:25:01,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.86 | bwd_microstep: 955.92 | bwd_inner_microstep: 955.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 15:25:03,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.14 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 15:25:05,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2728 [2024-06-10 15:25:07,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.60 | bwd_microstep: 1132.30 | bwd_inner_microstep: 1132.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158 [2024-06-10 15:25:08,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.60 | bwd_microstep: 946.44 | bwd_inner_microstep: 946.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428 [2024-06-10 15:25:10,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1280.04 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-10 15:25:11,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.79 | bwd_microstep: 890.68 | bwd_inner_microstep: 890.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3461 [2024-06-10 15:25:13,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1497.45 | bwd_inner_microstep: 1497.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 15:25:15,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1242.68 | bwd_inner_microstep: 1242.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 15:25:17,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1347.93 | bwd_inner_microstep: 1347.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 15:25:19,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.36 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3510 [2024-06-10 15:25:21,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1410.41 | bwd_inner_microstep: 1410.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569 [2024-06-10 15:25:23,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.71 | bwd_microstep: 1455.47 | bwd_inner_microstep: 1455.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2664 [2024-06-10 15:25:24,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.06 | bwd_microstep: 1023.26 | bwd_inner_microstep: 1023.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603 [2024-06-10 15:25:26,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1439.91 | bwd_inner_microstep: 1439.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 15:25:28,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1294.03 | bwd_inner_microstep: 1294.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 15:25:30,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1352.66 | bwd_inner_microstep: 1352.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-10 15:25:31,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 15:25:33,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1353.06 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2047 [2024-06-10 15:25:34,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.39 | bwd_microstep: 868.55 | bwd_inner_microstep: 868.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2064 [2024-06-10 15:25:35,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.60 | bwd_microstep: 723.83 | bwd_inner_microstep: 723.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3916 [2024-06-10 15:25:38,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.99 | bwd_microstep: 1897.50 | bwd_inner_microstep: 1897.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 15:25:40,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.85 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2039 [2024-06-10 15:25:41,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.36 | bwd_microstep: 720.22 | bwd_inner_microstep: 720.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 15:25:43,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 15:25:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 15:25:50,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 5698.73 | bwd_inner_microstep: 1676.02 | bwd_allreduce_microstep: 4022.65 | step_microstep: 38.01 [2024-06-10 15:25:50,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14986.55 | bwd: 44188.12 | bwd_inner: 40164.57 | bwd_allreduce: 4022.88 | step: 39.46 {'loss': 1.3007, 'learning_rate': 2.120036496924117e-05, 'epoch': 0.5} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3418 [2024-06-10 15:25:52,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.44 | bwd_microstep: 1365.44 | bwd_inner_microstep: 1365.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3991 [2024-06-10 15:25:54,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.33 | bwd_microstep: 1501.77 | bwd_inner_microstep: 1501.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:25:56,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 831 [2024-06-10 15:25:56,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 123.68 | bwd_microstep: 314.82 | bwd_inner_microstep: 314.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 15:25:58,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1386.35 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2758 [2024-06-10 15:25:59,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.20 | bwd_microstep: 1079.36 | bwd_inner_microstep: 1079.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805 [2024-06-10 15:26:02,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1548.20 | bwd_inner_microstep: 1548.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2220 [2024-06-10 15:26:03,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.62 | bwd_microstep: 893.05 | bwd_inner_microstep: 893.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 15:26:05,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1288.13 | bwd_inner_microstep: 1288.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 15:26:06,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.69 | bwd_microstep: 789.52 | bwd_inner_microstep: 789.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 15:26:07,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.79 | bwd_microstep: 789.46 | bwd_inner_microstep: 789.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2456 [2024-06-10 15:26:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.40 | bwd_microstep: 977.66 | bwd_inner_microstep: 977.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3501 [2024-06-10 15:26:10,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1532.88 | bwd_inner_microstep: 1532.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-10 15:26:12,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1516.49 | bwd_inner_microstep: 1516.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 15:26:14,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 15:26:16,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1474.89 | bwd_inner_microstep: 1474.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3486 [2024-06-10 15:26:18,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.14 | bwd_microstep: 1225.92 | bwd_inner_microstep: 1225.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 15:26:20,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1540.18 | bwd_inner_microstep: 1540.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 15:26:22,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 15:26:24,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1393.37 | bwd_inner_microstep: 1393.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 15:26:26,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1291.71 | bwd_inner_microstep: 1291.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 15:26:27,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547 [2024-06-10 15:26:29,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1454.93 | bwd_inner_microstep: 1454.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 15:26:31,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1508.08 | bwd_inner_microstep: 1508.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667 [2024-06-10 15:26:33,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.68 | bwd_microstep: 1325.04 | bwd_inner_microstep: 1325.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3555 [2024-06-10 15:26:35,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.95 | bwd_microstep: 1561.95 | bwd_inner_microstep: 1561.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 15:26:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.02 | bwd_microstep: 1297.09 | bwd_inner_microstep: 1297.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470 [2024-06-10 15:26:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.72 | bwd_microstep: 1574.65 | bwd_inner_microstep: 1574.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3815 [2024-06-10 15:26:42,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1690.47 | bwd_inner_microstep: 1690.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 15:26:44,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1655.41 | bwd_inner_microstep: 1655.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597 [2024-06-10 15:26:46,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.18 | bwd_microstep: 1432.10 | bwd_inner_microstep: 1432.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 15:26:52,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 15:26:52,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 5817.45 | bwd_inner_microstep: 1685.75 | bwd_allreduce_microstep: 4131.64 | step_microstep: 38.45 [2024-06-10 15:26:52,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15814.41 | bwd: 46555.11 | bwd_inner: 42422.56 | bwd_allreduce: 4131.87 | step: 39.93 {'loss': 1.2305, 'learning_rate': 2.1162896578209517e-05, 'epoch': 0.5} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023 [2024-06-10 15:26:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.51 | bwd_microstep: 892.53 | bwd_inner_microstep: 892.39 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4510 [2024-06-10 15:26:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.97 | bwd_microstep: 1738.65 | bwd_inner_microstep: 1738.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 15:26:58,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1547.44 | bwd_inner_microstep: 1547.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 15:27:00,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1277.48 | bwd_inner_microstep: 1277.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 15:27:02,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.44 | bwd_microstep: 1276.73 | bwd_inner_microstep: 1276.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 15:27:03,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.50 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 15:27:05,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 15:27:07,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3766 [2024-06-10 15:27:09,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1344.29 | bwd_inner_microstep: 1344.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419 [2024-06-10 15:27:11,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1307.69 | bwd_inner_microstep: 1307.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548 [2024-06-10 15:27:12,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.41 | bwd_microstep: 1326.78 | bwd_inner_microstep: 1326.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 15:27:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1481.77 | bwd_inner_microstep: 1481.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 15:27:17,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.79 | bwd_microstep: 1521.64 | bwd_inner_microstep: 1521.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971 [2024-06-10 15:27:18,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.06 | bwd_microstep: 890.06 | bwd_inner_microstep: 890.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3521 [2024-06-10 15:27:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1583.92 | bwd_inner_microstep: 1583.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2501 [2024-06-10 15:27:22,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.76 | bwd_microstep: 1083.10 | bwd_inner_microstep: 1083.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982 [2024-06-10 15:27:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.70 | bwd_microstep: 826.76 | bwd_inner_microstep: 826.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530 [2024-06-10 15:27:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.01 | bwd_microstep: 1196.48 | bwd_inner_microstep: 1196.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 15:27:26,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1397.46 | bwd_inner_microstep: 1397.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 637 [2024-06-10 15:27:27,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.50 | bwd_microstep: 264.01 | bwd_inner_microstep: 263.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2469 [2024-06-10 15:27:28,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.96 | bwd_microstep: 955.58 | bwd_inner_microstep: 955.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2619 [2024-06-10 15:27:30,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.53 | bwd_microstep: 1141.21 | bwd_inner_microstep: 1141.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 15:27:31,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.20 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 15:27:33,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-10 15:27:35,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 1497.58 | bwd_inner_microstep: 1497.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902 [2024-06-10 15:27:36,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.13 | bwd_microstep: 684.87 | bwd_inner_microstep: 684.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3557 [2024-06-10 15:27:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.18 | bwd_microstep: 1330.76 | bwd_inner_microstep: 1330.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597 [2024-06-10 15:27:40,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.10 | bwd_microstep: 1304.37 | bwd_inner_microstep: 1304.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3734 [2024-06-10 15:27:42,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.03 | bwd_microstep: 1242.54 | bwd_inner_microstep: 1242.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576 [2024-06-10 15:27:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.40 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-10 15:27:46,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.80 | bwd_microstep: 1598.19 | bwd_inner_microstep: 1598.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3571 [2024-06-10 15:27:52,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 15:27:52,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.62 | bwd_microstep: 5247.16 | bwd_inner_microstep: 1920.22 | bwd_allreduce_microstep: 3326.88 | step_microstep: 38.10 [2024-06-10 15:27:52,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15152.64 | bwd: 43791.71 | bwd_inner: 40463.82 | bwd_allreduce: 3327.16 | step: 39.64 {'loss': 1.251, 'learning_rate': 2.112542409146217e-05, 'epoch': 0.5} it] 49%|████▉ | 854/1726 [14:45:23<14:36:43, 60.33s/it] 50%|████▉ | 855/1726 [14:46:23<14:37:51, 60.47s/it] 50%|████▉ | 855/1726 [14:46:23<14:37:51, 60.47s/it] 50%|████▉ | 856/1726 [14:47:27<14:49:34, 61.35s/it] 50%|████▉ | 856/1726 [14:47:27<14:49:34, 61.35s/it] 50%|████▉ | 857/1726 [14:48:26<14:40:30, 60.79s/it] 50%|████▉ | 857/1726 [14:48:26<14:40:30, 60.79s/it] 50%|████▉ | 858/1726 [14:49:29<14:47:46, 61.37s/it] 50%|████▉ | 858/1726 [14:49:29<14:47:46, 61.37s/it] 50%|████▉ | 859/1726 [14:50:28<14:37:41, 60.74s/it] 5dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 15:27:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3972 [2024-06-10 15:27:56,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.44 | bwd_microstep: 1597.61 | bwd_inner_microstep: 1597.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3895 [2024-06-10 15:27:58,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.84 | bwd_microstep: 1581.90 | bwd_inner_microstep: 1581.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896 [2024-06-10 15:28:00,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1583.10 | bwd_inner_microstep: 1583.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 15:28:02,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.96 | bwd_microstep: 1452.31 | bwd_inner_microstep: 1452.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-10 15:28:04,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.00 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3521 [2024-06-10 15:28:06,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1255.17 | bwd_inner_microstep: 1255.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 15:28:07,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 787.76 | bwd_inner_microstep: 787.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 15:28:09,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.93 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 15:28:11,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.96 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2929 [2024-06-10 15:28:12,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.22 | bwd_microstep: 1093.71 | bwd_inner_microstep: 1093.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 15:28:14,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.15 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 15:28:16,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1488.32 | bwd_inner_microstep: 1488.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5458 [2024-06-10 15:28:19,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 716.48 | bwd_microstep: 1891.43 | bwd_inner_microstep: 1891.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3387 [2024-06-10 15:28:21,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.08 | bwd_microstep: 1368.65 | bwd_inner_microstep: 1368.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441 [2024-06-10 15:28:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.38 | bwd_microstep: 945.79 | bwd_inner_microstep: 945.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 15:28:23,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.37 | bwd_microstep: 795.20 | bwd_inner_microstep: 795.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 15:28:25,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.71 | bwd_microstep: 1382.14 | bwd_inner_microstep: 1382.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-10 15:28:27,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1608.81 | bwd_inner_microstep: 1608.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 15:28:29,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 15:28:31,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1408.04 | bwd_inner_microstep: 1408.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 15:28:32,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.45 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673 [2024-06-10 15:28:34,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1590.73 | bwd_inner_microstep: 1590.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588 [2024-06-10 15:28:36,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 15:28:37,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 791.55 | bwd_inner_microstep: 791.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-10 15:28:39,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1436.64 | bwd_inner_microstep: 1436.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 15:28:42,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.85 | bwd_microstep: 1645.29 | bwd_inner_microstep: 1645.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570 [2024-06-10 15:28:44,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1457.43 | bwd_inner_microstep: 1457.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 15:28:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1444.25 | bwd_inner_microstep: 1444.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3561 [2024-06-10 15:28:48,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.37 | bwd_microstep: 1562.53 | bwd_inner_microstep: 1562.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3405 [2024-06-10 15:28:50,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1371.72 | bwd_inner_microstep: 1371.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3598 [2024-06-10 15:28:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 15:28:52,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.47 | bwd_microstep: 1689.79 | bwd_inner_microstep: 1681.70 | bwd_allreduce_microstep: 8.03 | step_microstep: 38.84 [2024-06-10 15:28:52,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16323.68 | bwd: 43691.18 | bwd_inner: 43682.21 | bwd_allreduce: 8.27 | step: 40.33 {'loss': 1.228, 'learning_rate': 2.1087947640977015e-05, 'epoch': 0.5} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-10 15:28:54,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1580.82 | bwd_inner_microstep: 1580.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 15:28:56,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.51 | bwd_microstep: 1478.43 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 15:28:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 15:29:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.95 | bwd_microstep: 1353.07 | bwd_inner_microstep: 1353.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-10 15:29:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.15 | bwd_microstep: 1291.02 | bwd_inner_microstep: 1290.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 15:29:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.58 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 15:29:05,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1284.02 | bwd_inner_microstep: 1283.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 934 [2024-06-10 15:29:06,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.76 | bwd_microstep: 412.35 | bwd_inner_microstep: 412.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3701 [2024-06-10 15:29:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.41 | bwd_microstep: 1480.25 | bwd_inner_microstep: 1480.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 15:29:10,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1378.66 | bwd_inner_microstep: 1378.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 15:29:12,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1243.58 | bwd_inner_microstep: 1243.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 15:29:14,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3640 [2024-06-10 15:29:16,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1486.28 | bwd_inner_microstep: 1486.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-10 15:29:18,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1408.96 | bwd_inner_microstep: 1408.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 15:29:20,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1395.83 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3519 [2024-06-10 15:29:21,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1348.50 | bwd_inner_microstep: 1348.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 15:29:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 15:29:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1510.96 | bwd_inner_microstep: 1510.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2084 [2024-06-10 15:29:27,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.02 | bwd_microstep: 849.91 | bwd_inner_microstep: 849.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 15:29:29,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.61 | bwd_microstep: 1554.77 | bwd_inner_microstep: 1554.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 15:29:31,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1394.37 | bwd_inner_microstep: 1394.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 15:29:33,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.55 | bwd_microstep: 1462.69 | bwd_inner_microstep: 1462.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 15:29:35,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 15:29:37,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1490.46 | bwd_inner_microstep: 1490.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 15:29:39,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1457.67 | bwd_inner_microstep: 1457.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3706 [2024-06-10 15:29:41,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1236.05 | bwd_inner_microstep: 1236.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846 [2024-06-10 15:29:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.34 | bwd_microstep: 1662.00 | bwd_inner_microstep: 1661.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549 [2024-06-10 15:29:45,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1329.42 | bwd_inner_microstep: 1329.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-10 15:29:47,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.20 | bwd_microstep: 1542.35 | bwd_inner_microstep: 1542.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 15:29:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3455 [2024-06-10 15:29:51,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.75 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 15:29:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.01 | optimizer_gradients: 4.15 | optimizer_step: 6.58 [2024-06-10 15:29:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.49 | bwd_microstep: 2149.66 | bwd_inner_microstep: 1574.78 | bwd_allreduce_microstep: 574.83 | step_microstep: 37.78 [2024-06-10 15:29:54,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16488.74 | bwd: 44729.04 | bwd_inner: 44153.31 | bwd_allreduce: 575.06 | step: 39.25 {'loss': 1.2358, 'learning_rate': 2.105046735874592e-05, 'epoch': 0.5} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466 [2024-06-10 15:29:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1562.33 | bwd_inner_microstep: 1562.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3968 [2024-06-10 15:29:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.09 | bwd_microstep: 1598.72 | bwd_inner_microstep: 1598.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865 [2024-06-10 15:30:00,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.43 | bwd_microstep: 1661.28 | bwd_inner_microstep: 1661.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792 [2024-06-10 15:30:02,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1646.23 | bwd_inner_microstep: 1646.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 15:30:04,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1278.31 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 15:30:06,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1245.03 | bwd_inner_microstep: 1245.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 15:30:08,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1257.30 | bwd_inner_microstep: 1257.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 15:30:10,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1435.09 | bwd_inner_microstep: 1435.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 15:30:12,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.51 | bwd_microstep: 1484.69 | bwd_inner_microstep: 1484.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3781 [2024-06-10 15:30:14,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.54 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3745 [2024-06-10 15:30:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1669.96 | bwd_inner_microstep: 1669.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 15:30:18,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1354.29 | bwd_inner_microstep: 1354.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380 [2024-06-10 15:30:20,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.26 | bwd_microstep: 1335.37 | bwd_inner_microstep: 1335.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 15:30:22,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1339.06 | bwd_inner_microstep: 1339.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 15:30:24,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.98 | bwd_microstep: 1506.76 | bwd_inner_microstep: 1506.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-10 15:30:26,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-10 15:30:28,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.59 | bwd_microstep: 1584.45 | bwd_inner_microstep: 1584.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3631 [2024-06-10 15:30:30,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.71 | bwd_microstep: 1708.59 | bwd_inner_microstep: 1708.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 15:30:32,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1417.04 | bwd_inner_microstep: 1417.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 15:30:34,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1387.77 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 15:30:36,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-10 15:30:38,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1307.81 | bwd_inner_microstep: 1307.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1931 [2024-06-10 15:30:39,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.61 | bwd_microstep: 819.87 | bwd_inner_microstep: 819.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3967 [2024-06-10 15:30:41,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.25 | bwd_microstep: 1605.31 | bwd_inner_microstep: 1605.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 15:30:43,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.47 | bwd_microstep: 1514.04 | bwd_inner_microstep: 1514.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3543 [2024-06-10 15:30:45,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.46 | bwd_microstep: 1229.84 | bwd_inner_microstep: 1229.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 15:30:47,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2516 [2024-06-10 15:30:48,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.47 | bwd_microstep: 1059.40 | bwd_inner_microstep: 1059.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433 [2024-06-10 15:30:50,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.66 | bwd_microstep: 1188.10 | bwd_inner_microstep: 1188.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811 [2024-06-10 15:30:52,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.38 | bwd_microstep: 1413.98 | bwd_inner_microstep: 1413.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599 [2024-06-10 15:30:54,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.40 | bwd_microstep: 1600.39 | bwd_inner_microstep: 1600.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211 [2024-06-10 15:30:57,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.64 [2024-06-10 15:30:57,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.70 | bwd_microstep: 2029.33 | bwd_inner_microstep: 1084.92 | bwd_allreduce_microstep: 944.37 | step_microstep: 38.05 [2024-06-10 15:30:57,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16736.24 | bwd: 45948.26 | bwd_inner: 45002.99 | bwd_allreduce: 944.59 | step: 39.54 {'loss': 1.2098, 'learning_rate': 2.1012983376774255e-05, 'epoch': 0.5} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 15:30:58,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.62 | bwd_microstep: 785.75 | bwd_inner_microstep: 785.62 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414 [2024-06-10 15:30:59,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.37 | bwd_microstep: 1307.17 | bwd_inner_microstep: 1307.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 15:31:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1379.50 | bwd_inner_microstep: 1379.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 15:31:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.99 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2705 [2024-06-10 15:31:05,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.80 | bwd_microstep: 1084.65 | bwd_inner_microstep: 1084.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 15:31:07,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.62 | bwd_microstep: 1148.36 | bwd_inner_microstep: 1148.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3406 [2024-06-10 15:31:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.34 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3571 [2024-06-10 15:31:10,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1333.35 | bwd_inner_microstep: 1333.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 15:31:12,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3683 [2024-06-10 15:31:14,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3511 [2024-06-10 15:31:16,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.51 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 15:31:18,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1489.82 | bwd_inner_microstep: 1489.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 15:31:20,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.88 | bwd_microstep: 1616.65 | bwd_inner_microstep: 1616.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 15:31:21,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 799.42 | bwd_inner_microstep: 799.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-10 15:31:23,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1422.85 | bwd_inner_microstep: 1422.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3541 [2024-06-10 15:31:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1541.87 | bwd_inner_microstep: 1541.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 15:31:27,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 801.06 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3416 [2024-06-10 15:31:28,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.31 | bwd_microstep: 1180.33 | bwd_inner_microstep: 1180.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-10 15:31:30,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.97 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2288 [2024-06-10 15:31:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.45 | bwd_microstep: 939.35 | bwd_inner_microstep: 939.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 15:31:34,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.78 | bwd_microstep: 1626.93 | bwd_inner_microstep: 1626.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828 [2024-06-10 15:31:36,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.88 | bwd_microstep: 1359.93 | bwd_inner_microstep: 1359.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2234 [2024-06-10 15:31:37,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.18 | bwd_microstep: 807.37 | bwd_inner_microstep: 807.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 15:31:39,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1657.16 | bwd_inner_microstep: 1657.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 15:31:41,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.82 | bwd_microstep: 1184.32 | bwd_inner_microstep: 1184.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 15:31:43,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3813 [2024-06-10 15:31:45,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.92 | bwd_microstep: 1509.36 | bwd_inner_microstep: 1509.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 15:31:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1507.69 | bwd_inner_microstep: 1507.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 15:31:49,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1396.17 | bwd_inner_microstep: 1396.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 15:31:51,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.20 | bwd_microstep: 1645.29 | bwd_inner_microstep: 1645.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 15:31:53,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.42 | bwd_microstep: 1542.72 | bwd_inner_microstep: 1542.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596 [2024-06-10 15:31:59,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.30 | optimizer_step: 6.59 [2024-06-10 15:31:59,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 5303.34 | bwd_inner_microstep: 1930.81 | bwd_allreduce_microstep: 3372.46 | step_microstep: 38.40 [2024-06-10 15:31:59,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15925.38 | bwd: 46161.61 | bwd_inner: 42788.14 | bwd_allreduce: 3372.74 | step: 39.90 {'loss': 1.2033, 'learning_rate': 2.0975495827080404e-05, 'epoch': 0.5} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403 [2024-06-10 15:32:01,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1299.65 | bwd_inner_microstep: 1299.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 15:32:02,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.15 | bwd_microstep: 1240.70 | bwd_inner_microstep: 1240.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 15:32:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.50 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2890 [2024-06-10 15:32:06,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.12 | bwd_microstep: 1181.34 | bwd_inner_microstep: 1181.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 15:32:08,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1245.37 | bwd_inner_microstep: 1245.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 15:32:09,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1280.23 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 15:32:11,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3717 [2024-06-10 15:32:13,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1461.35 | bwd_inner_microstep: 1461.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 15:32:14,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 790.92 | bwd_inner_microstep: 790.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2660 [2024-06-10 15:32:16,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.08 | bwd_microstep: 1025.56 | bwd_inner_microstep: 1025.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 15:32:18,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1411.52 | bwd_inner_microstep: 1411.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-10 15:32:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1647.71 | bwd_inner_microstep: 1647.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 15:32:22,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1348.61 | bwd_inner_microstep: 1348.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3976 [2024-06-10 15:32:24,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.18 | bwd_microstep: 1802.37 | bwd_inner_microstep: 1802.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 15:32:26,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.07 | bwd_microstep: 1286.19 | bwd_inner_microstep: 1286.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539 [2024-06-10 15:32:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.14 | bwd_microstep: 1199.27 | bwd_inner_microstep: 1199.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 15:32:30,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1612.13 | bwd_inner_microstep: 1612.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 15:32:32,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 15:32:34,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1393.95 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 15:32:36,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.03 | bwd_microstep: 1524.36 | bwd_inner_microstep: 1524.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-10 15:32:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3848 [2024-06-10 15:32:40,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1396.91 | bwd_inner_microstep: 1396.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3870 [2024-06-10 15:32:42,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.05 | bwd_microstep: 1672.03 | bwd_inner_microstep: 1672.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4157 [2024-06-10 15:32:45,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 677.04 | bwd_microstep: 1847.74 | bwd_inner_microstep: 1847.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507 [2024-06-10 15:32:47,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1318.57 | bwd_inner_microstep: 1318.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3550 [2024-06-10 15:32:49,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.63 | bwd_microstep: 1535.39 | bwd_inner_microstep: 1535.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 15:32:51,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.98 | bwd_microstep: 1477.45 | bwd_inner_microstep: 1477.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 15:32:53,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.04 | bwd_microstep: 1476.12 | bwd_inner_microstep: 1476.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786 [2024-06-10 15:32:55,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 1644.22 | bwd_inner_microstep: 1644.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2094 [2024-06-10 15:32:56,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.69 | bwd_microstep: 917.87 | bwd_inner_microstep: 917.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 15:32:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1376.63 | bwd_inner_microstep: 1376.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2962 [2024-06-10 15:33:00,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 15:33:00,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.41 | bwd_microstep: 1223.27 | bwd_inner_microstep: 1214.41 | bwd_allreduce_microstep: 8.81 | step_microstep: 37.60 [2024-06-10 15:33:00,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16490.72 | bwd: 44118.36 | bwd_inner: 44108.65 | bwd_allreduce: 9.03 | step: 39.11 {'loss': 1.1835, 'learning_rate': 2.093800484169532e-05, 'epoch': 0.5} 0%|████▉ | 859/1726 [14:50:28<14:37:41, 60.74s/it] 50%|████▉ | 860/1726 [14:51:29<14:34:59, 60.62s/it] 50%|████▉ | 860/1726 [14:51:29<14:34:59, 60.62s/it] 50%|████▉ | 861/1726 [14:52:30<14:37:58, 60.90s/it] 50%|████▉ | 861/1726 [14:52:30<14:37:58, 60.90s/it] 50%|████▉ | 862/1726 [14:53:33<14:46:07, 61.54s/it] 50%|████▉ | 862/1726 [14:53:33<14:46:07, 61.54s/it] 50%|█████ | 863/1726 [14:54:36<14:48:54, 61.80s/it] 50%|█████ | 863/1726 [14:54:36<14:48:54, 61.80s/it] 50%|█████ | 864/1726 [14:55:37<14:44:11, 61.54s/it] 50%|█████ | 864/1726 [14:55:37<14:44:11, 61.54s/it]dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896 [2024-06-10 15:33:01,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.91 | bwd_microstep: 804.24 | bwd_inner_microstep: 804.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-10 15:33:03,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 1563.76 | bwd_inner_microstep: 1563.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 15:33:05,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.54 | bwd_microstep: 1455.24 | bwd_inner_microstep: 1455.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3798 [2024-06-10 15:33:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.68 | bwd_microstep: 1550.03 | bwd_inner_microstep: 1550.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 15:33:09,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.37 | bwd_microstep: 1185.73 | bwd_inner_microstep: 1185.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 15:33:11,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.38 | bwd_microstep: 1245.17 | bwd_inner_microstep: 1245.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 15:33:12,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1287.55 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2134 [2024-06-10 15:33:14,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.13 | bwd_microstep: 928.35 | bwd_inner_microstep: 928.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-10 15:33:15,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.74 | bwd_microstep: 1157.28 | bwd_inner_microstep: 1157.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3950 [2024-06-10 15:33:18,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.84 | bwd_microstep: 1699.82 | bwd_inner_microstep: 1699.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 15:33:20,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.37 | bwd_microstep: 1614.73 | bwd_inner_microstep: 1614.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495 [2024-06-10 15:33:22,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1430.11 | bwd_inner_microstep: 1430.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 15:33:24,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.01 | bwd_microstep: 1499.07 | bwd_inner_microstep: 1499.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 15:33:26,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.82 | bwd_inner_microstep: 1508.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 15:33:28,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.99 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2898 [2024-06-10 15:33:29,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.08 | bwd_microstep: 1088.46 | bwd_inner_microstep: 1088.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948 [2024-06-10 15:33:31,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.09 | bwd_microstep: 821.73 | bwd_inner_microstep: 821.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814 [2024-06-10 15:33:33,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.55 | bwd_microstep: 1753.81 | bwd_inner_microstep: 1753.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 15:33:34,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.44 | bwd_microstep: 977.72 | bwd_inner_microstep: 977.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 15:33:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1602.52 | bwd_inner_microstep: 1602.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489 [2024-06-10 15:33:38,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1318.09 | bwd_inner_microstep: 1318.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 15:33:41,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.02 | bwd_microstep: 1659.57 | bwd_inner_microstep: 1659.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3398 [2024-06-10 15:33:43,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1467.78 | bwd_inner_microstep: 1467.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2068 [2024-06-10 15:33:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.82 | bwd_microstep: 756.44 | bwd_inner_microstep: 756.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2061 [2024-06-10 15:33:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.62 | bwd_microstep: 817.14 | bwd_inner_microstep: 817.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086 [2024-06-10 15:33:46,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.54 | bwd_microstep: 823.01 | bwd_inner_microstep: 822.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 15:33:48,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1397.61 | bwd_inner_microstep: 1397.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286 [2024-06-10 15:33:49,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.40 | bwd_microstep: 879.64 | bwd_inner_microstep: 879.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 15:33:51,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1295.36 | bwd_inner_microstep: 1295.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 15:33:53,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1283.59 | bwd_inner_microstep: 1283.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2167 [2024-06-10 15:33:54,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.08 | bwd_microstep: 951.64 | bwd_inner_microstep: 951.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 15:34:03,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.33 | optimizer_step: 6.58 [2024-06-10 15:34:03,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 8556.02 | bwd_inner_microstep: 1544.77 | bwd_allreduce_microstep: 7011.18 | step_microstep: 38.47 [2024-06-10 15:34:03,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15179.79 | bwd: 47756.98 | bwd_inner: 40744.83 | bwd_allreduce: 7011.45 | step: 39.98 {'loss': 1.2617, 'learning_rate': 2.0900510552662057e-05, 'epoch': 0.5} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460 [2024-06-10 15:34:05,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.17 | bwd_microstep: 1564.78 | bwd_inner_microstep: 1564.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-10 15:34:07,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1322.15 | bwd_inner_microstep: 1322.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2376 [2024-06-10 15:34:08,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.40 | bwd_microstep: 928.63 | bwd_inner_microstep: 928.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 15:34:10,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1345.93 | bwd_inner_microstep: 1345.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3842 [2024-06-10 15:34:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1683.80 | bwd_inner_microstep: 1683.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 15:34:14,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1337.31 | bwd_inner_microstep: 1337.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 15:34:16,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1383.50 | bwd_inner_microstep: 1383.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484 [2024-06-10 15:34:18,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.55 | bwd_microstep: 1504.45 | bwd_inner_microstep: 1504.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936 [2024-06-10 15:34:21,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1583.53 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 15:34:23,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1515.39 | bwd_inner_microstep: 1515.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3663 [2024-06-10 15:34:25,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.49 | bwd_microstep: 1814.75 | bwd_inner_microstep: 1814.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484 [2024-06-10 15:34:27,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.56 | bwd_microstep: 1573.53 | bwd_inner_microstep: 1573.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3651 [2024-06-10 15:34:30,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.75 | bwd_microstep: 1709.93 | bwd_inner_microstep: 1709.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3636 [2024-06-10 15:34:32,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.73 | bwd_microstep: 1779.47 | bwd_inner_microstep: 1779.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-10 15:34:34,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1411.13 | bwd_inner_microstep: 1411.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 15:34:36,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.31 | bwd_microstep: 1285.15 | bwd_inner_microstep: 1285.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666 [2024-06-10 15:34:38,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1426.29 | bwd_inner_microstep: 1426.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3920 [2024-06-10 15:34:40,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1393.42 | bwd_inner_microstep: 1393.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 15:34:41,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 796.72 | bwd_inner_microstep: 796.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:34:43,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1378.85 | bwd_inner_microstep: 1378.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 15:34:45,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1345.11 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688 [2024-06-10 15:34:47,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.09 | bwd_microstep: 1328.12 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 15:34:49,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1552.29 | bwd_inner_microstep: 1552.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2272 [2024-06-10 15:34:50,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.16 | bwd_microstep: 1032.78 | bwd_inner_microstep: 1032.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3809 [2024-06-10 15:34:52,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1594.25 | bwd_inner_microstep: 1594.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3819 [2024-06-10 15:34:55,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.87 | bwd_microstep: 1803.33 | bwd_inner_microstep: 1803.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3526 [2024-06-10 15:34:57,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.05 | bwd_microstep: 1534.35 | bwd_inner_microstep: 1534.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815 [2024-06-10 15:34:59,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.13 | bwd_microstep: 1592.24 | bwd_inner_microstep: 1592.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768 [2024-06-10 15:35:01,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1473.93 | bwd_inner_microstep: 1473.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2023 [2024-06-10 15:35:02,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.37 | bwd_microstep: 868.86 | bwd_inner_microstep: 868.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-10 15:35:04,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.24 | bwd_microstep: 914.48 | bwd_inner_microstep: 914.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227 [2024-06-10 15:35:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.16 | optimizer_step: 6.63 [2024-06-10 15:35:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.43 | bwd_microstep: 999.97 | bwd_inner_microstep: 991.73 | bwd_allreduce_microstep: 8.19 | step_microstep: 38.00 [2024-06-10 15:35:05,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16643.50 | bwd: 44778.42 | bwd_inner: 44769.33 | bwd_allreduce: 8.41 | step: 39.46 {'loss': 1.2107, 'learning_rate': 2.0863013092035312e-05, 'epoch': 0.5} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 15:35:07,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.44 | bwd_microstep: 1474.29 | bwd_inner_microstep: 1474.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904 [2024-06-10 15:35:09,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889 [2024-06-10 15:35:11,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.97 | bwd_microstep: 1580.93 | bwd_inner_microstep: 1580.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 15:35:13,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.22 | bwd_microstep: 1647.56 | bwd_inner_microstep: 1647.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 15:35:15,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1243.30 | bwd_inner_microstep: 1243.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 15:35:17,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.82 | bwd_microstep: 1311.84 | bwd_inner_microstep: 1311.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 15:35:19,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 15:35:20,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 791.39 | bwd_inner_microstep: 791.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3573 [2024-06-10 15:35:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1263.28 | bwd_inner_microstep: 1263.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3483 [2024-06-10 15:35:23,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1290.45 | bwd_inner_microstep: 1290.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 15:35:25,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1283.09 | bwd_inner_microstep: 1283.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523 [2024-06-10 15:35:27,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1421.54 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3683 [2024-06-10 15:35:29,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.72 | bwd_microstep: 1445.64 | bwd_inner_microstep: 1445.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3653 [2024-06-10 15:35:31,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.20 | bwd_microstep: 1677.58 | bwd_inner_microstep: 1677.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 15:35:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1381.93 | bwd_inner_microstep: 1381.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 15:35:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.35 | bwd_microstep: 1455.81 | bwd_inner_microstep: 1455.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3526 [2024-06-10 15:35:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1588.04 | bwd_inner_microstep: 1588.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-10 15:35:40,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.78 | bwd_microstep: 1455.46 | bwd_inner_microstep: 1455.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3607 [2024-06-10 15:35:42,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1635.49 | bwd_inner_microstep: 1635.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 15:35:44,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1374.16 | bwd_inner_microstep: 1374.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-10 15:35:45,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.56 | bwd_microstep: 912.72 | bwd_inner_microstep: 912.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 15:35:47,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1652.30 | bwd_inner_microstep: 1652.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-10 15:35:49,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2202 [2024-06-10 15:35:50,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.83 | bwd_microstep: 907.78 | bwd_inner_microstep: 907.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 15:35:52,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1407.86 | bwd_inner_microstep: 1407.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-10 15:35:54,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.15 | bwd_microstep: 972.26 | bwd_inner_microstep: 972.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 15:35:56,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1611.76 | bwd_inner_microstep: 1611.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 15:35:58,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.00 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562 [2024-06-10 15:36:00,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.47 | bwd_microstep: 1591.19 | bwd_inner_microstep: 1591.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568 [2024-06-10 15:36:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.27 | bwd_microstep: 1585.39 | bwd_inner_microstep: 1585.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1431 [2024-06-10 15:36:03,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 218.33 | bwd_microstep: 565.69 | bwd_inner_microstep: 565.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3817 [2024-06-10 15:36:07,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 15:36:07,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 3217.98 | bwd_inner_microstep: 1786.51 | bwd_allreduce_microstep: 1431.42 | step_microstep: 37.93 [2024-06-10 15:36:07,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16252.53 | bwd: 45070.29 | bwd_inner: 43637.98 | bwd_allreduce: 1431.64 | step: 39.40 {'loss': 1.2016, 'learning_rate': 2.082551259188094e-05, 'epoch': 0.5} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 15:36:09,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.15 | bwd_microstep: 1467.65 | bwd_inner_microstep: 1467.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975 [2024-06-10 15:36:10,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.78 | bwd_microstep: 828.44 | bwd_inner_microstep: 828.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 15:36:12,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1551.68 | bwd_inner_microstep: 1551.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1945 [2024-06-10 15:36:13,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 820.46 | bwd_inner_microstep: 820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 15:36:15,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1242.27 | bwd_inner_microstep: 1242.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 15:36:17,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-10 15:36:19,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.15 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 15:36:21,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1384.82 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3659 [2024-06-10 15:36:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2149 [2024-06-10 15:36:24,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 880.03 | bwd_inner_microstep: 880.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096 [2024-06-10 15:36:25,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.95 | bwd_microstep: 917.66 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 15:36:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.03 | bwd_microstep: 1474.39 | bwd_inner_microstep: 1474.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-10 15:36:29,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.08 | bwd_microstep: 1619.91 | bwd_inner_microstep: 1619.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3222 [2024-06-10 15:36:31,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.12 | bwd_microstep: 1208.90 | bwd_inner_microstep: 1208.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647 [2024-06-10 15:36:33,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1575.75 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3543 [2024-06-10 15:36:35,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3537 [2024-06-10 15:36:37,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1526.09 | bwd_inner_microstep: 1526.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624 [2024-06-10 15:36:39,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.02 | bwd_microstep: 1432.72 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995 [2024-06-10 15:36:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.85 | bwd_microstep: 832.30 | bwd_inner_microstep: 832.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 639 [2024-06-10 15:36:41,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.03 | bwd_microstep: 265.28 | bwd_inner_microstep: 265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 15:36:43,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 1512.14 | bwd_inner_microstep: 1512.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 15:36:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1557.86 | bwd_inner_microstep: 1557.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 15:36:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1548.36 | bwd_inner_microstep: 1548.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716 [2024-06-10 15:36:49,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.68 | bwd_microstep: 1731.26 | bwd_inner_microstep: 1731.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604 [2024-06-10 15:36:52,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.22 | bwd_microstep: 1600.51 | bwd_inner_microstep: 1600.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 15:36:53,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.87 | bwd_microstep: 1255.59 | bwd_inner_microstep: 1255.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 15:36:56,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1507.42 | bwd_inner_microstep: 1507.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2056 [2024-06-10 15:36:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.96 | bwd_microstep: 722.32 | bwd_inner_microstep: 722.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3811 [2024-06-10 15:36:59,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.21 | bwd_microstep: 1603.48 | bwd_inner_microstep: 1603.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2270 [2024-06-10 15:37:00,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.92 | bwd_microstep: 934.62 | bwd_inner_microstep: 934.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3591 [2024-06-10 15:37:02,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.91 | bwd_microstep: 1210.13 | bwd_inner_microstep: 1210.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 15:37:12,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 15:37:12,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 9535.99 | bwd_inner_microstep: 1548.85 | bwd_allreduce_microstep: 7987.08 | step_microstep: 38.94 [2024-06-10 15:37:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15422.31 | bwd: 49430.18 | bwd_inner: 41442.19 | bwd_allreduce: 7987.32 | step: 40.46 {'loss': 1.2657, 'learning_rate': 2.0788009184275514e-05, 'epoch': 0.5} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 15:37:14,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1266.72 | bwd_inner_microstep: 1266.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 15:37:16,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.68 | bwd_microstep: 1473.98 | bwd_inner_microstep: 1473.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856 [2024-06-10 15:37:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.54 | bwd_microstep: 1557.87 | bwd_inner_microstep: 1557.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 15:37:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.86 | bwd_microstep: 1646.97 | bwd_inner_microstep: 1646.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2303 [2024-06-10 15:37:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.96 | bwd_microstep: 972.38 | bwd_inner_microstep: 972.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502 [2024-06-10 15:37:23,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.09 | bwd_microstep: 1188.98 | bwd_inner_microstep: 1188.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 15:37:25,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.69 | bwd_microstep: 1529.78 | bwd_inner_microstep: 1529.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502 [2024-06-10 15:37:27,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.87 | bwd_microstep: 1188.15 | bwd_inner_microstep: 1188.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 15:37:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1279.37 | bwd_inner_microstep: 1279.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 15:37:30,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1246.59 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3455 [2024-06-10 15:37:32,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.53 | bwd_microstep: 1211.85 | bwd_inner_microstep: 1211.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963 [2024-06-10 15:37:33,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.72 | bwd_microstep: 827.38 | bwd_inner_microstep: 827.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 15:37:35,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3644 [2024-06-10 15:37:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1469.91 | bwd_inner_microstep: 1469.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3643 [2024-06-10 15:37:39,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.94 | bwd_microstep: 1654.90 | bwd_inner_microstep: 1654.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 15:37:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1342.62 | bwd_inner_microstep: 1342.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 15:37:43,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1392.21 | bwd_inner_microstep: 1392.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3020 [2024-06-10 15:37:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 431.34 | bwd_microstep: 1130.31 | bwd_inner_microstep: 1130.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 15:37:47,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1480.01 | bwd_inner_microstep: 1479.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 15:37:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 15:37:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.53 | bwd_microstep: 1653.35 | bwd_inner_microstep: 1653.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548 [2024-06-10 15:37:53,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1230.02 | bwd_inner_microstep: 1229.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 15:37:55,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1532.48 | bwd_inner_microstep: 1532.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-10 15:37:57,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3502 [2024-06-10 15:37:58,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.23 | bwd_microstep: 1219.57 | bwd_inner_microstep: 1219.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 15:38:01,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1595.48 | bwd_inner_microstep: 1595.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 15:38:03,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.26 | bwd_microstep: 1605.66 | bwd_inner_microstep: 1605.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 15:38:05,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.29 | bwd_microstep: 1275.21 | bwd_inner_microstep: 1275.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3754 [2024-06-10 15:38:07,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.02 | bwd_microstep: 1672.05 | bwd_inner_microstep: 1672.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 15:38:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1585.65 | bwd_inner_microstep: 1585.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612 [2024-06-10 15:38:11,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1600.74 | bwd_inner_microstep: 1600.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3745 [2024-06-10 15:38:14,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-10 15:38:14,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.15 | bwd_microstep: 1770.66 | bwd_inner_microstep: 1762.94 | bwd_allreduce_microstep: 7.68 | step_microstep: 37.69 [2024-06-10 15:38:14,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16755.96 | bwd: 44879.28 | bwd_inner: 44870.72 | bwd_allreduce: 7.91 | step: 39.15 {'loss': 1.2419, 'learning_rate': 2.0750503001305832e-05, 'epoch': 0.5} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 15:38:16,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794 [2024-06-10 15:38:18,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1444.79 | bwd_inner_microstep: 1444.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 15:38:20,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1482.60 | bwd_inner_microstep: 1482.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 15:38:22,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1538.88 | bwd_inner_microstep: 1538.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 840 [2024-06-10 15:38:22,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.19 | bwd_microstep: 344.92 | bwd_inner_microstep: 344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 15:38:24,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1422.87 | bwd_inner_microstep: 1422.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3409 [2024-06-10 15:38:26,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1956 [2024-06-10 15:38:27,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.96 | bwd_microstep: 760.81 | bwd_inner_microstep: 760.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 15:38:29,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1485.73 | bwd_inner_microstep: 1485.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3426 [2024-06-10 15:38:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-10 15:38:33,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1444.13 | bwd_inner_microstep: 1444.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 15:38:35,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1317.68 | bwd_inner_microstep: 1317.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 15:38:37,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.40 | bwd_microstep: 1506.63 | bwd_inner_microstep: 1506.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503 [2024-06-10 15:38:39,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1576.91 | bwd_inner_microstep: 1576.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 15:38:40,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 698.63 | bwd_inner_microstep: 698.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873 [2024-06-10 15:38:42,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.61 | bwd_microstep: 1580.91 | bwd_inner_microstep: 1580.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 15:38:44,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1280.04 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3613 [2024-06-10 15:38:46,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1340.54 | bwd_inner_microstep: 1340.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3869 [2024-06-10 15:38:48,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1366.81 | bwd_inner_microstep: 1366.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140 [2024-06-10 15:38:49,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.30 | bwd_microstep: 832.09 | bwd_inner_microstep: 832.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3720 [2024-06-10 15:38:51,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1237.72 | bwd_inner_microstep: 1237.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 15:38:53,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.62 | bwd_inner_microstep: 1348.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 15:38:54,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1255.73 | bwd_inner_microstep: 1255.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-10 15:38:56,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.12 | bwd_microstep: 919.63 | bwd_inner_microstep: 919.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2194 [2024-06-10 15:38:57,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.86 | bwd_microstep: 829.54 | bwd_inner_microstep: 829.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905 [2024-06-10 15:38:58,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.20 | bwd_microstep: 689.14 | bwd_inner_microstep: 689.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2080 [2024-06-10 15:38:59,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.45 | bwd_microstep: 896.13 | bwd_inner_microstep: 896.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454 [2024-06-10 15:39:01,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.92 | bwd_microstep: 1314.33 | bwd_inner_microstep: 1314.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 15:39:03,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.28 | bwd_microstep: 1299.77 | bwd_inner_microstep: 1299.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3554 [2024-06-10 15:39:05,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.75 | bwd_microstep: 1660.68 | bwd_inner_microstep: 1660.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 15:39:07,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1493.73 | bwd_inner_microstep: 1493.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-10 15:39:16,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.22 | optimizer_step: 6.57 [2024-06-10 15:39:16,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.17 | bwd_microstep: 8023.32 | bwd_inner_microstep: 1802.08 | bwd_allreduce_microstep: 6221.19 | step_microstep: 38.00 [2024-06-10 15:39:16,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15003.39 | bwd: 46457.13 | bwd_inner: 40235.04 | bwd_allreduce: 6221.41 | step: 39.41 50%|█████ | 865/1726 [14:56:40<14:50:36, 62.06s/it] 50%|█████ | 865/1726 [14:56:40<14:50:36, 62.06s/it] 50%|█████ | 866/1726 [14:57:42<14:48:17, 61.97s/it] 50%|█████ | 866/1726 [14:57:42<14:48:17, 61.97s/it] 50%|█████ | 867/1726 [14:58:43<14:45:54, 61.88s/it] 50%|█████ | 867/1726 [14:58:43<14:45:54, 61.88s/it] 50%|█████ | 868/1726 [14:59:49<14:59:04, 62.87s/it] 50%|█████ | 868/1726 [14:59:49<14:59:04, 62.87s/it] 50%|█████ | 869/1726 [15:00:51<14:54:10, 62.60s/it] 50%|█████ | 869/1726 [15:00:51<14:54:10, 62.60s/it] 50%|█████ | 870/1726 [15:01:52<14:49:36, 62.36s/{'loss': 1.2339, 'learning_rate': 2.071299417506849e-05, 'epoch': 0.5} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1932 [2024-06-10 15:39:17,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.80 | bwd_microstep: 844.04 | bwd_inner_microstep: 843.94 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3956 [2024-06-10 15:39:19,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1496.99 | bwd_inner_microstep: 1496.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 15:39:20,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 792.60 | bwd_inner_microstep: 792.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 15:39:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.48 | bwd_microstep: 1341.27 | bwd_inner_microstep: 1341.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-10 15:39:24,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1439.72 | bwd_inner_microstep: 1439.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 15:39:26,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.27 | bwd_microstep: 1384.13 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 15:39:28,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1386.29 | bwd_inner_microstep: 1386.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 15:39:29,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.69 | bwd_microstep: 797.02 | bwd_inner_microstep: 796.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 15:39:31,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1520.03 | bwd_inner_microstep: 1520.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695 [2024-06-10 15:39:33,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1422.03 | bwd_inner_microstep: 1422.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 15:39:34,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 797.40 | bwd_inner_microstep: 797.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3439 [2024-06-10 15:39:36,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1393.20 | bwd_inner_microstep: 1393.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2105 [2024-06-10 15:39:37,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.07 | bwd_microstep: 1012.02 | bwd_inner_microstep: 1011.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3216 [2024-06-10 15:39:39,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1400.24 | bwd_inner_microstep: 1400.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 15:39:41,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1293.70 | bwd_inner_microstep: 1293.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972 [2024-06-10 15:39:42,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 733.04 | bwd_inner_microstep: 733.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517 [2024-06-10 15:39:44,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1317.84 | bwd_inner_microstep: 1317.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 15:39:46,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1391.09 | bwd_inner_microstep: 1391.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2404 [2024-06-10 15:39:47,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.58 | bwd_microstep: 899.67 | bwd_inner_microstep: 899.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183 [2024-06-10 15:39:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.04 | bwd_microstep: 858.71 | bwd_inner_microstep: 858.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830 [2024-06-10 15:39:50,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.61 | bwd_microstep: 1585.94 | bwd_inner_microstep: 1585.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 15:39:52,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539 [2024-06-10 15:39:55,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 15:39:56,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1256.02 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 15:39:59,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.80 | bwd_microstep: 1652.96 | bwd_inner_microstep: 1652.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 15:40:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.62 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 15:40:03,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.08 | bwd_microstep: 1628.22 | bwd_inner_microstep: 1628.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584 [2024-06-10 15:40:05,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1399.58 | bwd_inner_microstep: 1399.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 15:40:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.63 | bwd_microstep: 1543.67 | bwd_inner_microstep: 1543.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 15:40:09,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.40 | bwd_microstep: 1487.12 | bwd_inner_microstep: 1487.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561 [2024-06-10 15:40:11,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1328.84 | bwd_inner_microstep: 1328.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 15:40:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.58 [2024-06-10 15:40:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 5408.65 | bwd_inner_microstep: 1532.09 | bwd_allreduce_microstep: 3876.51 | step_microstep: 38.15 [2024-06-10 15:40:17,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15471.40 | bwd: 45418.87 | bwd_inner: 41541.38 | bwd_allreduce: 3876.78 | step: 39.62 {'loss': 1.2369, 'learning_rate': 2.0675482837669367e-05, 'epoch': 0.5} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 15:40:19,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1467.58 | bwd_inner_microstep: 1467.42 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 15:40:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.87 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3838 [2024-06-10 15:40:23,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.27 | bwd_microstep: 1290.61 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860 [2024-06-10 15:40:25,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.87 | bwd_microstep: 1464.44 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4148 [2024-06-10 15:40:27,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1479.19 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956 [2024-06-10 15:40:28,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.94 | bwd_microstep: 701.82 | bwd_inner_microstep: 701.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 15:40:29,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1248.76 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 15:40:31,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 15:40:32,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.81 | bwd_microstep: 793.04 | bwd_inner_microstep: 793.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2022 [2024-06-10 15:40:34,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.44 | bwd_microstep: 838.62 | bwd_inner_microstep: 838.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 15:40:36,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 15:40:38,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092 [2024-06-10 15:40:39,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 919.65 | bwd_inner_microstep: 919.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417 [2024-06-10 15:40:41,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.67 | bwd_microstep: 1537.94 | bwd_inner_microstep: 1537.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3645 [2024-06-10 15:40:43,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.97 | bwd_microstep: 1573.30 | bwd_inner_microstep: 1573.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648 [2024-06-10 15:40:45,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.05 | bwd_microstep: 1613.65 | bwd_inner_microstep: 1613.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 15:40:47,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1407.61 | bwd_inner_microstep: 1407.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 15:40:49,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1254.88 | bwd_inner_microstep: 1254.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972 [2024-06-10 15:40:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.28 | bwd_microstep: 703.55 | bwd_inner_microstep: 703.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 15:40:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676 [2024-06-10 15:40:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466 [2024-06-10 15:40:56,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1310.38 | bwd_inner_microstep: 1310.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067 [2024-06-10 15:40:57,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 820.80 | bwd_inner_microstep: 820.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 15:40:59,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1411.40 | bwd_inner_microstep: 1411.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 15:41:01,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1292.99 | bwd_inner_microstep: 1292.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 15:41:03,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.09 | bwd_microstep: 1506.53 | bwd_inner_microstep: 1506.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598 [2024-06-10 15:41:04,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.83 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559 [2024-06-10 15:41:07,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.95 | bwd_microstep: 1527.94 | bwd_inner_microstep: 1527.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-10 15:41:08,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1292.33 | bwd_inner_microstep: 1292.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588 [2024-06-10 15:41:11,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.52 | bwd_microstep: 1597.16 | bwd_inner_microstep: 1597.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 15:41:12,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1405.61 | bwd_inner_microstep: 1405.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 15:41:18,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.36 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 15:41:18,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 5231.42 | bwd_inner_microstep: 1869.69 | bwd_allreduce_microstep: 3361.67 | step_microstep: 38.90 [2024-06-10 15:41:18,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15689.82 | bwd: 45530.52 | bwd_inner: 42167.82 | bwd_allreduce: 3361.96 | step: 40.39 {'loss': 1.2485, 'learning_rate': 2.06379691212232e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-10 15:41:20,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1436.70 | bwd_inner_microstep: 1436.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2396 [2024-06-10 15:41:22,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.38 | bwd_microstep: 997.51 | bwd_inner_microstep: 997.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 15:41:23,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.19 | bwd_microstep: 1275.47 | bwd_inner_microstep: 1275.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4273 [2024-06-10 15:41:26,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.16 | bwd_microstep: 1665.57 | bwd_inner_microstep: 1665.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-10 15:41:27,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.99 | bwd_microstep: 798.88 | bwd_inner_microstep: 798.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 15:41:29,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1422.42 | bwd_inner_microstep: 1422.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 15:41:30,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1183.53 | bwd_inner_microstep: 1183.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 15:41:32,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 15:41:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1389.69 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974 [2024-06-10 15:41:35,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.98 | bwd_microstep: 765.01 | bwd_inner_microstep: 764.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154 [2024-06-10 15:41:36,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.08 | bwd_microstep: 853.03 | bwd_inner_microstep: 853.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461 [2024-06-10 15:41:38,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.34 | bwd_microstep: 1212.34 | bwd_inner_microstep: 1212.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986 [2024-06-10 15:41:39,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.20 | bwd_microstep: 831.84 | bwd_inner_microstep: 831.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525 [2024-06-10 15:41:41,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1349.83 | bwd_inner_microstep: 1349.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 15:41:43,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.24 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 15:41:45,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3686 [2024-06-10 15:41:48,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.52 | bwd_microstep: 1760.40 | bwd_inner_microstep: 1760.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610 [2024-06-10 15:41:50,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.81 | bwd_microstep: 1598.14 | bwd_inner_microstep: 1598.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 15:41:52,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1393.49 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 15:41:54,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.67 | bwd_microstep: 1489.41 | bwd_inner_microstep: 1489.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3455 [2024-06-10 15:41:55,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.65 | bwd_microstep: 1190.30 | bwd_inner_microstep: 1190.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 15:41:57,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1504.85 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 15:41:59,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1398.55 | bwd_inner_microstep: 1398.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080 [2024-06-10 15:42:00,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.52 | bwd_microstep: 818.00 | bwd_inner_microstep: 817.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566 [2024-06-10 15:42:02,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.78 | bwd_microstep: 1202.77 | bwd_inner_microstep: 1202.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 15:42:03,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.59 | bwd_microstep: 801.43 | bwd_inner_microstep: 801.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 15:42:05,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.39 | bwd_microstep: 1313.56 | bwd_inner_microstep: 1313.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 15:42:07,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1413.26 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 15:42:08,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.90 | bwd_microstep: 698.19 | bwd_inner_microstep: 698.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462 [2024-06-10 15:42:10,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.55 | bwd_microstep: 1568.88 | bwd_inner_microstep: 1568.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 15:42:11,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.57 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 15:42:19,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-10 15:42:19,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 7349.15 | bwd_inner_microstep: 1817.51 | bwd_allreduce_microstep: 5531.59 | step_microstep: 37.92 [2024-06-10 15:42:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14979.66 | bwd: 45625.55 | bwd_inner: 40093.02 | bwd_allreduce: 5531.83 | step: 39.43 {'loss': 1.2182, 'learning_rate': 2.0600453157853103e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 15:42:21,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1476.29 | bwd_inner_microstep: 1476.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872 [2024-06-10 15:42:23,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.36 | bwd_microstep: 1556.90 | bwd_inner_microstep: 1556.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 15:42:25,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1243.15 | bwd_inner_microstep: 1243.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 15:42:26,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.98 | bwd_microstep: 723.35 | bwd_inner_microstep: 723.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-10 15:42:27,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.95 | bwd_microstep: 786.83 | bwd_inner_microstep: 786.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396 [2024-06-10 15:42:29,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1335.13 | bwd_inner_microstep: 1335.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 15:42:31,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1243.50 | bwd_inner_microstep: 1243.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 15:42:33,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1380.12 | bwd_inner_microstep: 1380.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 15:42:34,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 15:42:36,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.82 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969 [2024-06-10 15:42:37,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 823.59 | bwd_inner_microstep: 823.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3500 [2024-06-10 15:42:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1524.40 | bwd_inner_microstep: 1524.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3477 [2024-06-10 15:42:42,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.69 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 15:42:44,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.65 | bwd_microstep: 1445.42 | bwd_inner_microstep: 1445.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652 [2024-06-10 15:42:46,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.93 | bwd_microstep: 1713.08 | bwd_inner_microstep: 1713.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3688 [2024-06-10 15:42:48,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1384.73 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 15:42:50,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1556.11 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105 [2024-06-10 15:42:51,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.60 | bwd_microstep: 918.89 | bwd_inner_microstep: 918.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654 [2024-06-10 15:42:53,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.77 | bwd_microstep: 1426.04 | bwd_inner_microstep: 1426.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-10 15:42:54,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 686.31 | bwd_inner_microstep: 686.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688 [2024-06-10 15:42:56,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1521.60 | bwd_inner_microstep: 1521.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-10 15:42:58,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2159 [2024-06-10 15:43:00,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.18 | bwd_microstep: 949.62 | bwd_inner_microstep: 949.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999 [2024-06-10 15:43:01,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 739.87 | bwd_inner_microstep: 739.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 15:43:03,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.27 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 15:43:05,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.17 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 15:43:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 15:43:09,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.11 | bwd_microstep: 1609.70 | bwd_inner_microstep: 1609.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 15:43:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1494.73 | bwd_inner_microstep: 1494.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3400 [2024-06-10 15:43:13,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1469.65 | bwd_inner_microstep: 1469.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 15:43:15,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.08 | bwd_microstep: 1539.48 | bwd_inner_microstep: 1539.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779 [2024-06-10 15:43:20,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 15:43:20,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.49 | bwd_microstep: 4263.55 | bwd_inner_microstep: 1979.03 | bwd_allreduce_microstep: 2284.47 | step_microstep: 37.98 [2024-06-10 15:43:20,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15693.69 | bwd: 44542.95 | bwd_inner: 42257.58 | bwd_allreduce: 2284.70 | step: 39.44 {'loss': 1.2741, 'learning_rate': 2.05629350796901e-05, 'epoch': 0.51} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 15:43:22,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1393.65 | bwd_inner_microstep: 1393.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 15:43:24,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.96 | bwd_microstep: 1346.15 | bwd_inner_microstep: 1346.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 15:43:26,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 15:43:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.21 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 15:43:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1477.93 | bwd_inner_microstep: 1477.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 15:43:32,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.33 | bwd_inner_microstep: 1633.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897 [2024-06-10 15:43:33,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.87 | bwd_microstep: 682.37 | bwd_inner_microstep: 682.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-10 15:43:34,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.71 | bwd_microstep: 856.59 | bwd_inner_microstep: 856.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706 [2024-06-10 15:43:36,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.55 | bwd_microstep: 1627.56 | bwd_inner_microstep: 1627.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 15:43:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.97 | bwd_microstep: 1287.11 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 15:43:39,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1289.58 | bwd_inner_microstep: 1289.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004 [2024-06-10 15:43:41,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-10 15:43:42,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 876.53 | bwd_inner_microstep: 876.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520 [2024-06-10 15:43:44,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.89 | bwd_microstep: 1447.63 | bwd_inner_microstep: 1447.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3495 [2024-06-10 15:43:46,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.48 | bwd_microstep: 1331.24 | bwd_inner_microstep: 1331.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3838 [2024-06-10 15:43:48,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1660.06 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2144 [2024-06-10 15:43:49,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.14 | bwd_microstep: 1027.24 | bwd_inner_microstep: 1027.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3530 [2024-06-10 15:43:52,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.09 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 15:43:54,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.92 | bwd_microstep: 1647.82 | bwd_inner_microstep: 1647.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 15:43:55,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 800.62 | bwd_inner_microstep: 800.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 15:43:56,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 797.10 | bwd_inner_microstep: 797.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2280 [2024-06-10 15:43:57,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.73 | bwd_microstep: 811.08 | bwd_inner_microstep: 811.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 15:43:59,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.45 | bwd_inner_microstep: 1256.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 15:44:01,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1352.00 | bwd_inner_microstep: 1351.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 15:44:03,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 15:44:05,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1354.15 | bwd_inner_microstep: 1354.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 15:44:07,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.66 | bwd_microstep: 1544.69 | bwd_inner_microstep: 1544.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2285 [2024-06-10 15:44:08,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.68 | bwd_microstep: 942.09 | bwd_inner_microstep: 942.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799 [2024-06-10 15:44:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.78 | bwd_microstep: 1640.53 | bwd_inner_microstep: 1640.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3735 [2024-06-10 15:44:12,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1486.54 | bwd_inner_microstep: 1486.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043 [2024-06-10 15:44:14,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.66 | bwd_microstep: 905.13 | bwd_inner_microstep: 905.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 15:44:20,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.37 | optimizer_step: 6.56 [2024-06-10 15:44:20,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.68 | bwd_microstep: 5858.52 | bwd_inner_microstep: 1804.02 | bwd_allreduce_microstep: 4054.44 | step_microstep: 38.22 [2024-06-10 15:44:20,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15152.03 | bwd: 44846.87 | bwd_inner: 40791.52 | bwd_allreduce: 4054.68 | step: 39.70 {'loss': 1.2496, 'learning_rate': 2.0525415018872686e-05, 'epoch': 0.51} it] 50%|█████ | 870/1726 [15:01:52<14:49:36, 62.36s/it] 50%|█████ | 871/1726 [15:02:54<14:43:42, 62.01s/it] 50%|█████ | 871/1726 [15:02:54<14:43:42, 62.01s/it] 51%|█████ | 872/1726 [15:03:55<14:40:43, 61.88s/it] 51%|█████ | 872/1726 [15:03:55<14:40:43, 61.88s/it] 51%|█████ | 873/1726 [15:04:56<14:35:37, 61.59s/it] 51%|█████ | 873/1726 [15:04:56<14:35:37, 61.59s/it] 51%|█████ | 874/1726 [15:05:57<14:30:13, 61.28s/it] 51%|█████ | 874/1726 [15:05:57<14:30:13, 61.28s/it] 51%|█████ | 875/1726 [15:06:57<14:25:09, 61.00s/it] 5dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 15:44:22,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.02 | bwd_microstep: 1471.10 | bwd_inner_microstep: 1471.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 15:44:24,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1377.70 | bwd_inner_microstep: 1377.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 15:44:26,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1275.89 | bwd_inner_microstep: 1275.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 15:44:28,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4179 [2024-06-10 15:44:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.21 | bwd_microstep: 1648.88 | bwd_inner_microstep: 1648.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 846 [2024-06-10 15:44:31,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.42 | bwd_microstep: 345.75 | bwd_inner_microstep: 345.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 15:44:33,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.17 | bwd_microstep: 1485.45 | bwd_inner_microstep: 1485.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 15:44:34,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.50 | bwd_microstep: 794.20 | bwd_inner_microstep: 794.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 15:44:36,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1390.01 | bwd_inner_microstep: 1389.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 15:44:37,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.84 | bwd_microstep: 1151.12 | bwd_inner_microstep: 1151.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118 [2024-06-10 15:44:38,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.50 | bwd_microstep: 829.66 | bwd_inner_microstep: 829.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-10 15:44:40,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1444.67 | bwd_inner_microstep: 1444.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 15:44:43,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.69 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 15:44:44,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 15:44:47,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1617.20 | bwd_inner_microstep: 1617.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 15:44:48,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1391.27 | bwd_inner_microstep: 1391.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-10 15:44:49,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.59 | bwd_microstep: 698.70 | bwd_inner_microstep: 698.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 15:44:52,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.20 | bwd_microstep: 1518.83 | bwd_inner_microstep: 1518.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171 [2024-06-10 15:44:53,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.20 | bwd_microstep: 855.43 | bwd_inner_microstep: 855.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090 [2024-06-10 15:44:54,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.82 | bwd_microstep: 917.00 | bwd_inner_microstep: 916.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-10 15:44:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.25 | bwd_microstep: 1621.58 | bwd_inner_microstep: 1621.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-10 15:44:57,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.60 | bwd_microstep: 695.56 | bwd_inner_microstep: 695.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 15:44:59,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.28 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548 [2024-06-10 15:45:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 15:45:03,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1425.76 | bwd_inner_microstep: 1425.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 15:45:05,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.11 | bwd_microstep: 1647.35 | bwd_inner_microstep: 1647.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282 [2024-06-10 15:45:07,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.14 | bwd_microstep: 1004.69 | bwd_inner_microstep: 1004.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3832 [2024-06-10 15:45:09,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.76 | bwd_microstep: 1621.86 | bwd_inner_microstep: 1621.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618 [2024-06-10 15:45:11,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1599.37 | bwd_inner_microstep: 1599.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733 [2024-06-10 15:45:13,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.49 | bwd_microstep: 1441.56 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810 [2024-06-10 15:45:15,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.67 | bwd_microstep: 1751.64 | bwd_inner_microstep: 1751.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3812 [2024-06-10 15:45:21,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.39 | optimizer_step: 6.59 [2024-06-10 15:45:21,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.29 | bwd_microstep: 4562.40 | bwd_inner_microstep: 2099.95 | bwd_allreduce_microstep: 2462.39 | step_microstep: 39.58 [2024-06-10 15:45:21,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15609.89 | bwd: 44522.70 | bwd_inner: 42059.40 | bwd_allreduce: 2462.62 | step: 41.04 {'loss': 1.1974, 'learning_rate': 2.0487893107546298e-05, 'epoch': 0.51} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 15:45:22,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.90 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.07 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 15:45:24,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1278.60 | bwd_inner_microstep: 1278.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586 [2024-06-10 15:45:26,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1734 [2024-06-10 15:45:27,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.94 | bwd_microstep: 680.42 | bwd_inner_microstep: 680.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-10 15:45:29,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.14 | bwd_microstep: 1537.90 | bwd_inner_microstep: 1537.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3483 [2024-06-10 15:45:31,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1330.38 | bwd_inner_microstep: 1330.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729 [2024-06-10 15:45:33,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.85 | bwd_microstep: 1428.16 | bwd_inner_microstep: 1428.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718 [2024-06-10 15:45:35,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1462.37 | bwd_inner_microstep: 1462.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 15:45:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1395.70 | bwd_inner_microstep: 1395.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 15:45:39,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1295.05 | bwd_inner_microstep: 1295.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970 [2024-06-10 15:45:40,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.91 | bwd_microstep: 828.89 | bwd_inner_microstep: 828.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 15:45:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1489.12 | bwd_inner_microstep: 1489.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-10 15:45:44,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.75 | bwd_microstep: 1584.55 | bwd_inner_microstep: 1584.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 15:45:46,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1380.05 | bwd_inner_microstep: 1380.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-10 15:45:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.02 | bwd_microstep: 686.61 | bwd_inner_microstep: 686.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 15:45:49,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.05 | bwd_microstep: 1294.41 | bwd_inner_microstep: 1294.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3745 [2024-06-10 15:45:51,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1442.61 | bwd_inner_microstep: 1442.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143 [2024-06-10 15:45:52,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.72 | bwd_microstep: 830.74 | bwd_inner_microstep: 830.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519 [2024-06-10 15:45:54,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1318.80 | bwd_inner_microstep: 1318.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 15:45:56,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2325 [2024-06-10 15:45:57,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.06 | bwd_microstep: 918.85 | bwd_inner_microstep: 918.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2185 [2024-06-10 15:45:58,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.53 | bwd_microstep: 903.42 | bwd_inner_microstep: 903.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 15:46:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1399.97 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 15:46:02,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 15:46:03,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.98 | bwd_microstep: 971.73 | bwd_inner_microstep: 971.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-10 15:46:05,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.12 | bwd_microstep: 1341.19 | bwd_inner_microstep: 1341.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3601 [2024-06-10 15:46:07,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.05 | bwd_microstep: 1741.04 | bwd_inner_microstep: 1741.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-10 15:46:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1441.07 | bwd_inner_microstep: 1441.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 15:46:11,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1380.41 | bwd_inner_microstep: 1380.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 15:46:14,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.98 | bwd_microstep: 1646.69 | bwd_inner_microstep: 1646.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3462 [2024-06-10 15:46:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.22 | bwd_microstep: 1341.38 | bwd_inner_microstep: 1341.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 15:46:21,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.22 | optimizer_step: 6.62 [2024-06-10 15:46:21,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.13 | bwd_microstep: 4818.44 | bwd_inner_microstep: 1866.34 | bwd_allreduce_microstep: 2952.05 | step_microstep: 37.94 [2024-06-10 15:46:21,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15505.27 | bwd: 44484.40 | bwd_inner: 41531.31 | bwd_allreduce: 2952.35 | step: 39.49 {'loss': 1.2136, 'learning_rate': 2.0450369477862922e-05, 'epoch': 0.51} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 15:46:23,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1329.91 | bwd_inner_microstep: 1329.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 15:46:24,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.65 | bwd_microstep: 1149.11 | bwd_inner_microstep: 1149.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 15:46:26,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.97 | bwd_microstep: 1294.54 | bwd_inner_microstep: 1294.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3873 [2024-06-10 15:46:28,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1514.12 | bwd_inner_microstep: 1514.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489 [2024-06-10 15:46:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1315.96 | bwd_inner_microstep: 1315.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258 [2024-06-10 15:46:31,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 966.39 | bwd_inner_microstep: 966.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3779 [2024-06-10 15:46:33,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.85 | bwd_microstep: 1332.19 | bwd_inner_microstep: 1332.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-10 15:46:34,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.64 | bwd_microstep: 683.24 | bwd_inner_microstep: 683.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-10 15:46:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.15 | bwd_microstep: 1187.00 | bwd_inner_microstep: 1186.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 15:46:38,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 15:46:39,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1290.24 | bwd_inner_microstep: 1290.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3699 [2024-06-10 15:46:42,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.45 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 15:46:44,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1487.00 | bwd_inner_microstep: 1486.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 15:46:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.06 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 15:46:47,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1388.91 | bwd_inner_microstep: 1388.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3719 [2024-06-10 15:46:50,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.08 | bwd_microstep: 1780.96 | bwd_inner_microstep: 1780.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2158 [2024-06-10 15:46:51,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.43 | bwd_microstep: 1048.97 | bwd_inner_microstep: 1048.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537 [2024-06-10 15:46:53,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1583.75 | bwd_inner_microstep: 1583.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-10 15:46:55,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1509.11 | bwd_inner_microstep: 1509.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 15:46:57,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-10 15:46:59,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.71 | bwd_microstep: 1397.94 | bwd_inner_microstep: 1397.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623 [2024-06-10 15:47:01,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.13 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 15:47:03,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1411.21 | bwd_inner_microstep: 1411.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 15:47:05,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.65 | bwd_microstep: 1659.46 | bwd_inner_microstep: 1659.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3639 [2024-06-10 15:47:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799 [2024-06-10 15:47:20,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.94 | bwd_microstep: 1451.88 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 15:47:22,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1393.24 | bwd_inner_microstep: 1393.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3776 [2024-06-10 15:47:24,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.67 | bwd_microstep: 1469.94 | bwd_inner_microstep: 1469.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803 [2024-06-10 15:47:26,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.47 | bwd_microstep: 1448.10 | bwd_inner_microstep: 1448.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 15:47:28,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-10 15:47:30,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3454 [2024-06-10 15:47:34,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 15:47:34,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 3114.18 | bwd_inner_microstep: 1761.50 | bwd_allreduce_microstep: 1352.64 | step_microstep: 37.77 [2024-06-10 15:47:34,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16336.48 | bwd: 45144.43 | bwd_inner: 43790.89 | bwd_allreduce: 1352.86 | step: 39.30 {'loss': 1.2547, 'learning_rate': 2.0412844261980588e-05, 'epoch': 0.51} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457 [2024-06-10 15:47:36,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.01 | bwd_microstep: 1563.08 | bwd_inner_microstep: 1563.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 15:47:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1494.95 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 15:47:40,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461 [2024-06-10 15:47:42,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.07 | bwd_microstep: 1214.88 | bwd_inner_microstep: 1214.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3759 [2024-06-10 15:47:44,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.81 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879 [2024-06-10 15:47:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.64 | bwd_microstep: 679.15 | bwd_inner_microstep: 679.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3404 [2024-06-10 15:47:47,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.57 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 15:47:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 15:47:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1250.96 | bwd_inner_microstep: 1250.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-10 15:47:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1442.18 | bwd_inner_microstep: 1442.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661 [2024-06-10 15:47:54,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.29 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3620 [2024-06-10 15:47:56,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 1578.66 | bwd_inner_microstep: 1578.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3517 [2024-06-10 15:47:59,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.85 | bwd_microstep: 1689.07 | bwd_inner_microstep: 1689.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494 [2024-06-10 15:48:01,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1414.67 | bwd_inner_microstep: 1414.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3517 [2024-06-10 15:48:03,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 1607.50 | bwd_inner_microstep: 1607.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 15:48:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1485.26 | bwd_inner_microstep: 1485.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645 [2024-06-10 15:48:07,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.01 | bwd_microstep: 1712.60 | bwd_inner_microstep: 1712.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2071 [2024-06-10 15:48:08,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.94 | bwd_microstep: 866.96 | bwd_inner_microstep: 866.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 15:48:10,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1382.25 | bwd_inner_microstep: 1382.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3443 [2024-06-10 15:48:12,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1378.80 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427 [2024-06-10 15:48:14,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.85 | bwd_microstep: 1280.60 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606 [2024-06-10 15:48:16,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2291 [2024-06-10 15:48:17,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.72 | bwd_microstep: 942.48 | bwd_inner_microstep: 942.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 15:48:19,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1491.18 | bwd_inner_microstep: 1491.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3784 [2024-06-10 15:48:21,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 15:48:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-10 15:48:25,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1311.83 | bwd_inner_microstep: 1311.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 15:48:27,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1488.96 | bwd_inner_microstep: 1488.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004 [2024-06-10 15:48:28,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.11 | bwd_microstep: 896.13 | bwd_inner_microstep: 896.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 15:48:30,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1394.67 | bwd_inner_microstep: 1394.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 15:48:32,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.56 | bwd_microstep: 1542.47 | bwd_inner_microstep: 1542.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721 [2024-06-10 15:48:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-10 15:48:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.37 | bwd_microstep: 2110.84 | bwd_inner_microstep: 1790.88 | bwd_allreduce_microstep: 319.91 | step_microstep: 37.61 [2024-06-10 15:48:35,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16425.75 | bwd: 44563.74 | bwd_inner: 44242.93 | bwd_allreduce: 320.13 | step: 39.04 {'loss': 1.2721, 'learning_rate': 2.0375317592062912e-05, 'epoch': 0.51} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 15:48:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.49 | bwd_microstep: 791.24 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 15:48:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.25 | bwd_microstep: 1276.05 | bwd_inner_microstep: 1276.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2453 [2024-06-10 15:48:39,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.32 | bwd_microstep: 1015.89 | bwd_inner_microstep: 1015.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-10 15:48:41,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.52 | bwd_microstep: 875.63 | bwd_inner_microstep: 875.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 15:48:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1450.47 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-10 15:48:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.80 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 15:48:46,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.34 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 15:48:49,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 2014.88 | bwd_inner_microstep: 2014.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 15:48:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-10 15:48:52,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1304.50 | bwd_inner_microstep: 1304.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 15:48:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 795.02 | bwd_inner_microstep: 794.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 15:48:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.86 | bwd_microstep: 1478.58 | bwd_inner_microstep: 1478.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 15:48:57,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1478.70 | bwd_inner_microstep: 1478.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001 [2024-06-10 15:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.08 | bwd_microstep: 901.02 | bwd_inner_microstep: 901.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:49:01,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1379.02 | bwd_inner_microstep: 1379.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947 [2024-06-10 15:49:03,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.61 | bwd_microstep: 1602.28 | bwd_inner_microstep: 1602.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 15:49:05,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.40 | bwd_microstep: 1276.46 | bwd_inner_microstep: 1276.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084 [2024-06-10 15:49:06,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.49 | bwd_microstep: 822.28 | bwd_inner_microstep: 822.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 15:49:08,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.56 | bwd_microstep: 1657.38 | bwd_inner_microstep: 1657.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-10 15:49:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.12 | bwd_microstep: 1186.15 | bwd_inner_microstep: 1186.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 15:49:12,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1512.59 | bwd_inner_microstep: 1512.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 15:49:13,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1285.32 | bwd_inner_microstep: 1285.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 15:49:15,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1397.28 | bwd_inner_microstep: 1397.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 15:49:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.76 | bwd_microstep: 1218.17 | bwd_inner_microstep: 1218.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 15:49:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.74 | bwd_microstep: 1562.09 | bwd_inner_microstep: 1562.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-10 15:49:21,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.06 | bwd_microstep: 914.83 | bwd_inner_microstep: 914.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2252 [2024-06-10 15:49:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.30 | bwd_microstep: 939.59 | bwd_inner_microstep: 939.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485 [2024-06-10 15:49:24,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1334.06 | bwd_inner_microstep: 1334.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799 [2024-06-10 15:49:26,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1457.28 | bwd_inner_microstep: 1457.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740 [2024-06-10 15:49:28,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 15:49:30,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1446.11 | bwd_inner_microstep: 1446.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 15:49:37,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.35 | optimizer_step: 6.63 [2024-06-10 15:49:37,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 7402.60 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 5991.31 | step_microstep: 39.10 [2024-06-10 15:49:37,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15069.33 | bwd: 46939.69 | bwd_inner: 40947.35 | bwd_allreduce: 5991.60 | step: 40.60 {'loss': 1.2537, 'learning_rate': 2.0337789600278623e-05, 'epoch': 0.51} 1%|█████ | 875/1726 [15:06:57<14:25:09, 61.00s/it] 51%|█████ | 876/1726 [15:07:57<14:21:53, 60.84s/it] 51%|█████ | 876/1726 [15:07:57<14:21:53, 60.84s/it] 51%|█████ | 877/1726 [15:08:58<14:18:41, 60.69s/it] 51%|█████ | 877/1726 [15:08:58<14:18:41, 60.69s/it] 51%|█████ | 878/1726 [15:10:11<15:09:15, 64.33s/it] 51%|█████ | 878/1726 [15:10:11<15:09:15, 64.33s/it] 51%|█████ | 879/1726 [15:11:12<14:55:28, 63.43s/it] 51%|█████ | 879/1726 [15:11:12<14:55:28, 63.43s/it] 51%|█████ | 880/1726 [15:12:14<14:49:45, 63.10s/it] 51%|█████ | 880/1726 [15:12:14<14:49:45, 63.10s/it]dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 15:49:39,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1334.97 | bwd_inner_microstep: 1334.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 15:49:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1335.98 | bwd_inner_microstep: 1335.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3831 [2024-06-10 15:49:43,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1318.74 | bwd_inner_microstep: 1318.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2355 [2024-06-10 15:49:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.80 | bwd_microstep: 891.76 | bwd_inner_microstep: 891.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-10 15:49:46,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.60 | bwd_microstep: 1562.41 | bwd_inner_microstep: 1562.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-10 15:49:47,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.95 | bwd_microstep: 683.78 | bwd_inner_microstep: 683.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 15:49:49,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.52 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 15:49:51,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.65 | bwd_microstep: 1546.39 | bwd_inner_microstep: 1546.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 15:49:53,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1297.59 | bwd_inner_microstep: 1297.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 15:49:55,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.09 | bwd_microstep: 1383.01 | bwd_inner_microstep: 1382.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 15:49:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1252.75 | bwd_inner_microstep: 1252.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126 [2024-06-10 15:49:58,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.96 | bwd_microstep: 927.21 | bwd_inner_microstep: 927.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 15:50:00,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1396.44 | bwd_inner_microstep: 1396.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438 [2024-06-10 15:50:02,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.23 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3610 [2024-06-10 15:50:04,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.42 | bwd_microstep: 1555.75 | bwd_inner_microstep: 1555.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3076 [2024-06-10 15:50:05,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1308.54 | bwd_inner_microstep: 1308.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491 [2024-06-10 15:50:07,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1317.91 | bwd_inner_microstep: 1317.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 15:50:09,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.66 | bwd_microstep: 1291.67 | bwd_inner_microstep: 1291.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533 [2024-06-10 15:50:11,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1426.69 | bwd_inner_microstep: 1426.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 15:50:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1157.17 | bwd_inner_microstep: 1157.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831 [2024-06-10 15:50:15,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1388.60 | bwd_inner_microstep: 1388.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020 [2024-06-10 15:50:16,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.91 | bwd_microstep: 810.71 | bwd_inner_microstep: 810.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816 [2024-06-10 15:50:18,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1579.45 | bwd_inner_microstep: 1579.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 15:50:20,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1454.84 | bwd_inner_microstep: 1454.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808 [2024-06-10 15:50:22,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1613.84 | bwd_inner_microstep: 1613.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3552 [2024-06-10 15:50:24,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1588.08 | bwd_inner_microstep: 1588.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-10 15:50:26,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1448.38 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3389 [2024-06-10 15:50:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.59 | bwd_microstep: 1436.49 | bwd_inner_microstep: 1436.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3574 [2024-06-10 15:50:30,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1591.76 | bwd_inner_microstep: 1591.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804 [2024-06-10 15:50:32,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1354.43 | bwd_inner_microstep: 1354.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 15:50:34,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1374.54 | bwd_inner_microstep: 1374.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 15:50:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 15:50:39,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 3781.01 | bwd_inner_microstep: 1638.32 | bwd_allreduce_microstep: 2142.65 | step_microstep: 38.09 [2024-06-10 15:50:39,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15963.62 | bwd: 44844.04 | bwd_inner: 42700.49 | bwd_allreduce: 2142.87 | step: 39.56 {'loss': 1.2584, 'learning_rate': 2.0300260418801123e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 15:50:41,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.20 | bwd_microstep: 1490.69 | bwd_inner_microstep: 1490.51 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 15:50:42,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.93 | bwd_microstep: 1271.25 | bwd_inner_microstep: 1271.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 15:50:44,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1401.58 | bwd_inner_microstep: 1401.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 15:50:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.27 | bwd_microstep: 1479.06 | bwd_inner_microstep: 1479.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3752 [2024-06-10 15:50:49,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.20 | bwd_microstep: 1636.12 | bwd_inner_microstep: 1636.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 15:50:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 791.01 | bwd_inner_microstep: 790.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 15:50:51,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1246.23 | bwd_inner_microstep: 1246.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 15:50:54,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 15:50:56,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1506.71 | bwd_inner_microstep: 1506.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-10 15:50:58,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.13 | bwd_microstep: 1442.44 | bwd_inner_microstep: 1442.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3663 [2024-06-10 15:51:00,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1654.74 | bwd_inner_microstep: 1654.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3645 [2024-06-10 15:51:02,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.93 | bwd_microstep: 1813.88 | bwd_inner_microstep: 1813.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486 [2024-06-10 15:51:04,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.89 | bwd_microstep: 1435.39 | bwd_inner_microstep: 1435.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 15:51:06,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1283.92 | bwd_inner_microstep: 1283.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3484 [2024-06-10 15:51:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.12 | bwd_microstep: 1335.66 | bwd_inner_microstep: 1335.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 15:51:10,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.94 | bwd_microstep: 1316.51 | bwd_inner_microstep: 1316.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3856 [2024-06-10 15:51:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.40 | bwd_microstep: 1366.77 | bwd_inner_microstep: 1366.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 647 [2024-06-10 15:51:12,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.09 | bwd_microstep: 275.17 | bwd_inner_microstep: 275.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3782 [2024-06-10 15:51:14,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1578.10 | bwd_inner_microstep: 1578.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023 [2024-06-10 15:51:15,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.62 | bwd_microstep: 715.85 | bwd_inner_microstep: 715.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2162 [2024-06-10 15:51:17,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.77 | bwd_microstep: 952.08 | bwd_inner_microstep: 952.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824 [2024-06-10 15:51:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.82 | bwd_microstep: 1757.53 | bwd_inner_microstep: 1757.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 15:51:21,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.04 | bwd_inner_microstep: 1387.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-10 15:51:23,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.37 | bwd_microstep: 1486.73 | bwd_inner_microstep: 1486.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-10 15:51:25,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.23 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3782 [2024-06-10 15:51:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.02 | bwd_microstep: 1578.75 | bwd_inner_microstep: 1578.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 15:51:29,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.13 | bwd_microstep: 1596.02 | bwd_inner_microstep: 1595.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-10 15:51:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.64 | bwd_microstep: 969.22 | bwd_inner_microstep: 969.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 15:51:32,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1392.70 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271 [2024-06-10 15:51:34,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.85 | bwd_microstep: 1003.18 | bwd_inner_microstep: 1003.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3377 [2024-06-10 15:51:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.72 | bwd_microstep: 1432.31 | bwd_inner_microstep: 1432.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3579 [2024-06-10 15:51:39,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 15:51:39,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.50 | bwd_microstep: 2842.54 | bwd_inner_microstep: 2032.13 | bwd_allreduce_microstep: 810.37 | step_microstep: 37.73 [2024-06-10 15:51:39,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16124.74 | bwd: 44231.60 | bwd_inner: 43420.19 | bwd_allreduce: 810.67 | step: 39.25 {'loss': 1.23, 'learning_rate': 2.026273017980798e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401 [2024-06-10 15:51:41,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.83 | bwd_microstep: 1431.96 | bwd_inner_microstep: 1431.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915 [2024-06-10 15:51:43,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1588.72 | bwd_inner_microstep: 1588.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 15:51:45,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1390.45 | bwd_inner_microstep: 1390.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 15:51:48,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.04 | bwd_microstep: 1645.76 | bwd_inner_microstep: 1645.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2274 [2024-06-10 15:51:49,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.29 | bwd_microstep: 873.65 | bwd_inner_microstep: 873.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3801 [2024-06-10 15:51:51,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.44 | bwd_microstep: 1598.81 | bwd_inner_microstep: 1598.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 15:51:53,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.71 | bwd_microstep: 1282.53 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3712 [2024-06-10 15:51:55,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1359.25 | bwd_inner_microstep: 1359.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-10 15:51:57,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.96 | bwd_microstep: 1411.46 | bwd_inner_microstep: 1411.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 15:51:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1507.45 | bwd_inner_microstep: 1507.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 15:52:01,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.54 | bwd_microstep: 1290.49 | bwd_inner_microstep: 1290.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 15:52:02,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.08 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1382.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 15:52:04,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1375.57 | bwd_inner_microstep: 1375.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3520 [2024-06-10 15:52:06,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1559.40 | bwd_inner_microstep: 1559.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 15:52:08,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1244.05 | bwd_inner_microstep: 1244.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 15:52:10,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1386.34 | bwd_inner_microstep: 1386.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447 [2024-06-10 15:52:12,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1283.21 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-10 15:52:14,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.01 | bwd_microstep: 1192.52 | bwd_inner_microstep: 1192.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3965 [2024-06-10 15:52:16,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.01 | bwd_microstep: 1665.40 | bwd_inner_microstep: 1665.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 15:52:18,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1407.20 | bwd_inner_microstep: 1407.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082 [2024-06-10 15:52:19,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.43 | bwd_microstep: 818.96 | bwd_inner_microstep: 818.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 15:52:21,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1379.94 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 15:52:23,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.05 | bwd_microstep: 1279.31 | bwd_inner_microstep: 1279.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3808 [2024-06-10 15:52:25,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.28 | bwd_microstep: 1513.96 | bwd_inner_microstep: 1513.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 15:52:27,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1460.19 | bwd_inner_microstep: 1460.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566 [2024-06-10 15:52:29,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.62 | bwd_microstep: 1301.19 | bwd_inner_microstep: 1301.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-10 15:52:31,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.98 | bwd_microstep: 1562.96 | bwd_inner_microstep: 1562.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 15:52:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1255.42 | bwd_inner_microstep: 1255.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 15:52:34,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1373.93 | bwd_inner_microstep: 1373.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 15:52:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1399.06 | bwd_inner_microstep: 1399.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 15:52:38,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 975.43 | bwd_inner_microstep: 975.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 15:52:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.16 | optimizer_step: 6.59 [2024-06-10 15:52:41,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 2937.22 | bwd_inner_microstep: 1640.85 | bwd_allreduce_microstep: 1296.32 | step_microstep: 37.73 [2024-06-10 15:52:41,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16351.30 | bwd: 45133.99 | bwd_inner: 43836.77 | bwd_allreduce: 1296.55 | step: 39.19 {'loss': 1.2172, 'learning_rate': 2.0225199015480518e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 15:52:43,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1495.30 | bwd_inner_microstep: 1495.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 15:52:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1892 [2024-06-10 15:52:46,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.69 | bwd_microstep: 747.00 | bwd_inner_microstep: 746.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3776 [2024-06-10 15:52:48,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1372.14 | bwd_inner_microstep: 1372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 15:52:50,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1244.38 | bwd_inner_microstep: 1244.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 15:52:51,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.15 | bwd_microstep: 1247.54 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 15:52:53,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1412.02 | bwd_inner_microstep: 1411.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440 [2024-06-10 15:52:55,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.97 | bwd_microstep: 1157.75 | bwd_inner_microstep: 1157.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3732 [2024-06-10 15:52:57,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.70 | bwd_microstep: 1633.03 | bwd_inner_microstep: 1633.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 15:52:58,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.06 | bwd_microstep: 804.66 | bwd_inner_microstep: 804.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 15:53:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2249 [2024-06-10 15:53:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.21 | bwd_microstep: 842.05 | bwd_inner_microstep: 842.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039 [2024-06-10 15:53:02,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.26 | bwd_microstep: 809.61 | bwd_inner_microstep: 809.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 15:53:04,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.59 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 15:53:06,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.42 | bwd_microstep: 1476.17 | bwd_inner_microstep: 1476.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641 [2024-06-10 15:53:09,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.98 | bwd_microstep: 1710.10 | bwd_inner_microstep: 1710.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-10 15:53:10,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.06 | bwd_microstep: 910.82 | bwd_inner_microstep: 910.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3543 [2024-06-10 15:53:12,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1450.21 | bwd_inner_microstep: 1450.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 15:53:14,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.72 | bwd_microstep: 1301.49 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 15:53:16,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.06 | bwd_microstep: 1392.63 | bwd_inner_microstep: 1392.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616 [2024-06-10 15:53:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.33 | bwd_microstep: 1537.90 | bwd_inner_microstep: 1537.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 15:53:20,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1374.95 | bwd_inner_microstep: 1374.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 15:53:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2542 [2024-06-10 15:53:23,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.99 | bwd_microstep: 969.50 | bwd_inner_microstep: 969.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-10 15:53:25,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1324.60 | bwd_inner_microstep: 1324.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519 [2024-06-10 15:53:27,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1414.67 | bwd_inner_microstep: 1414.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2715 [2024-06-10 15:53:28,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.81 | bwd_microstep: 1130.95 | bwd_inner_microstep: 1130.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812 [2024-06-10 15:53:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.20 | bwd_microstep: 1719.42 | bwd_inner_microstep: 1719.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-10 15:53:33,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.75 | bwd_microstep: 1545.98 | bwd_inner_microstep: 1545.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3561 [2024-06-10 15:53:35,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.99 | bwd_microstep: 1210.11 | bwd_inner_microstep: 1210.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3743 [2024-06-10 15:53:37,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.88 | bwd_microstep: 1602.41 | bwd_inner_microstep: 1602.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-10 15:53:42,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 15:53:42,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.60 | bwd_microstep: 5112.08 | bwd_inner_microstep: 1042.96 | bwd_allreduce_microstep: 4069.05 | step_microstep: 38.85 [2024-06-10 15:53:42,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15432.77 | bwd: 45434.40 | bwd_inner: 41364.42 | bwd_allreduce: 4069.29 | step: 40.27 {'loss': 1.2303, 'learning_rate': 2.0187667058003298e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 15:53:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.28 | bwd_microstep: 1442.12 | bwd_inner_microstep: 1442.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2414 [2024-06-10 15:53:46,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.84 | bwd_microstep: 1000.33 | bwd_inner_microstep: 1000.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 15:53:47,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1249.66 | bwd_inner_microstep: 1249.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 15:53:50,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.83 | bwd_microstep: 1640.00 | bwd_inner_microstep: 1639.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738 [2024-06-10 15:53:52,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 15:53:53,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.74 | bwd_microstep: 1186.54 | bwd_inner_microstep: 1186.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 15:53:55,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1275.19 | bwd_inner_microstep: 1275.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717 [2024-06-10 15:53:57,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.16 | bwd_microstep: 1361.22 | bwd_inner_microstep: 1361.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450 [2024-06-10 15:53:59,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1395.05 | bwd_inner_microstep: 1395.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426 [2024-06-10 15:54:01,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.00 | bwd_microstep: 1537.79 | bwd_inner_microstep: 1537.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 15:54:03,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.18 | bwd_microstep: 1715.60 | bwd_inner_microstep: 1715.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 15:54:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.56 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3377 [2024-06-10 15:54:07,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.09 | bwd_microstep: 1239.74 | bwd_inner_microstep: 1239.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3686 [2024-06-10 15:54:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.71 | bwd_microstep: 1822.21 | bwd_inner_microstep: 1822.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 15:54:11,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 15:54:13,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1393.56 | bwd_inner_microstep: 1393.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 15:54:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1509.47 | bwd_inner_microstep: 1509.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564 [2024-06-10 15:54:17,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.29 | bwd_microstep: 1204.22 | bwd_inner_microstep: 1204.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 15:54:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.39 | bwd_inner_microstep: 1377.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 15:54:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1254.61 | bwd_inner_microstep: 1254.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2060 [2024-06-10 15:54:22,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.20 | bwd_microstep: 752.34 | bwd_inner_microstep: 752.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 15:54:24,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 15:54:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1430.58 | bwd_inner_microstep: 1430.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 15:54:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1459.92 | bwd_inner_microstep: 1459.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 15:54:30,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1530.72 | bwd_inner_microstep: 1530.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 15:54:31,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.43 | bwd_microstep: 978.86 | bwd_inner_microstep: 978.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777 [2024-06-10 15:54:34,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053 [2024-06-10 15:54:35,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.50 | bwd_microstep: 913.39 | bwd_inner_microstep: 913.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 15:54:37,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1348.67 | bwd_inner_microstep: 1348.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539 [2024-06-10 15:54:39,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1518.81 | bwd_inner_microstep: 1518.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 15:54:41,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1497.51 | bwd_inner_microstep: 1497.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033 [2024-06-10 15:54:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.17 | optimizer_step: 6.58 [2024-06-10 15:54:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.34 | bwd_microstep: 1666.29 | bwd_inner_microstep: 908.21 | bwd_allreduce_microstep: 758.03 | step_microstep: 37.74 [2024-06-10 15:54:43,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.70 | bwd: 44054.26 | bwd_inner: 43295.33 | bwd_allreduce: 758.25 | step: 39.24 {'loss': 1.2396, 'learning_rate': 2.0150134439563667e-05, 'epoch': 0.51} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-10 15:54:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 15:54:47,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1374.64 | bwd_inner_microstep: 1374.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 15:54:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1447.34 | bwd_inner_microstep: 1447.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 15:54:51,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.99 | bwd_microstep: 1275.54 | bwd_inner_microstep: 1275.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-10 15:54:53,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.14 | bwd_microstep: 1534.80 | bwd_inner_microstep: 1534.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3491 [2024-06-10 15:54:54,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.25 | bwd_microstep: 1233.96 | bwd_inner_microstep: 1233.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191 [2024-06-10 15:54:56,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.73 | bwd_microstep: 950.58 | bwd_inner_microstep: 950.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2640 [2024-06-10 15:54:57,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.34 | bwd_microstep: 1051.23 | bwd_inner_microstep: 1051.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481 [2024-06-10 15:54:59,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.25 | bwd_microstep: 1344.67 | bwd_inner_microstep: 1344.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 15:55:01,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 15:55:03,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.01 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473 [2024-06-10 15:55:05,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1574.12 | bwd_inner_microstep: 1574.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3712 [2024-06-10 15:55:07,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.80 | bwd_microstep: 1690.03 | bwd_inner_microstep: 1690.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3460 [2024-06-10 15:55:10,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.36 | bwd_microstep: 1516.04 | bwd_inner_microstep: 1516.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3480 [2024-06-10 15:55:11,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1433.13 | bwd_inner_microstep: 1433.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-10 15:55:13,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.18 | bwd_microstep: 1429.89 | bwd_inner_microstep: 1429.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3894 [2024-06-10 15:55:16,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.55 | bwd_microstep: 1685.34 | bwd_inner_microstep: 1685.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 15:55:17,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 799.02 | bwd_inner_microstep: 798.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2945 [2024-06-10 15:55:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.61 | bwd_microstep: 1007.58 | bwd_inner_microstep: 1007.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 15:55:21,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.54 | bwd_microstep: 1656.51 | bwd_inner_microstep: 1656.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 15:55:23,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294 [2024-06-10 15:55:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.64 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 15:55:26,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1637.71 | bwd_inner_microstep: 1637.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-10 15:55:28,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.85 | bwd_microstep: 1461.60 | bwd_inner_microstep: 1461.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815 [2024-06-10 15:55:30,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.91 | bwd_microstep: 1608.71 | bwd_inner_microstep: 1608.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2297 [2024-06-10 15:55:31,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.10 | bwd_microstep: 848.61 | bwd_inner_microstep: 848.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-10 15:55:33,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.94 | bwd_microstep: 1308.19 | bwd_inner_microstep: 1308.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 15:55:35,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1405.36 | bwd_inner_microstep: 1405.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 15:55:37,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1499.33 | bwd_inner_microstep: 1499.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-10 15:55:39,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.05 | bwd_microstep: 1491.82 | bwd_inner_microstep: 1491.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-10 15:55:40,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.58 | bwd_microstep: 809.46 | bwd_inner_microstep: 809.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3768 [2024-06-10 15:55:45,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 15:55:45,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 4072.52 | bwd_inner_microstep: 1815.54 | bwd_allreduce_microstep: 2256.91 | step_microstep: 38.81 [2024-06-10 15:55:45,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16171.86 | bwd: 45751.18 | bwd_inner: 43493.35 | bwd_allreduce: 2257.15 | step: 40.25 51%|█████ | 881/1726 [15:13:15<14:40:23, 62.51s/it] 51%|█████ | 881/1726 [15:13:15<14:40:23, 62.51s/it] 51%|█████ | 882/1726 [15:14:16<14:31:40, 61.97s/it] 51%|█████ | 882/1726 [15:14:16<14:31:40, 61.97s/it] 51%|█████ | 883/1726 [15:15:18<14:29:59, 61.92s/it] 51%|█████ | 883/1726 [15:15:18<14:29:59, 61.92s/it] 51%|█████ | 884/1726 [15:16:19<14:25:55, 61.70s/it] 51%|█████ | 884/1726 [15:16:19<14:25:55, 61.70s/it] 51%|█████▏ | 885/1726 [15:17:20<14:20:02, 61.36s/it] 51%|█████▏ | 885/1726 [15:17:20<14:20:02, 61.36s/it] 51%|█████▏ | 886/1726 [15:18:22<14:22:47, 6{'loss': 1.1994, 'learning_rate': 2.0112601292351322e-05, 'epoch': 0.51} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1852 [2024-06-10 15:55:46,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.55 | bwd_microstep: 665.95 | bwd_inner_microstep: 665.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 15:55:48,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1281.22 | bwd_inner_microstep: 1281.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1934 [2024-06-10 15:55:49,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.24 | bwd_microstep: 851.23 | bwd_inner_microstep: 851.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 15:55:51,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1376.25 | bwd_inner_microstep: 1376.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 15:55:53,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1538.81 | bwd_inner_microstep: 1538.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 15:55:55,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1376.48 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 15:55:57,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1377.21 | bwd_inner_microstep: 1377.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-10 15:55:59,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.43 | bwd_microstep: 1526.96 | bwd_inner_microstep: 1526.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 15:56:01,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1341.42 | bwd_inner_microstep: 1341.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 15:56:03,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.92 | bwd_microstep: 1293.93 | bwd_inner_microstep: 1293.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 15:56:04,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2140 [2024-06-10 15:56:06,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.99 | bwd_microstep: 864.54 | bwd_inner_microstep: 864.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-10 15:56:08,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.36 | bwd_microstep: 1618.86 | bwd_inner_microstep: 1618.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3452 [2024-06-10 15:56:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1400.33 | bwd_inner_microstep: 1400.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2100 [2024-06-10 15:56:11,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.89 | bwd_microstep: 789.48 | bwd_inner_microstep: 789.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 15:56:13,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1253.77 | bwd_inner_microstep: 1253.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3649 [2024-06-10 15:56:14,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.70 | bwd_microstep: 1357.78 | bwd_inner_microstep: 1357.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803 [2024-06-10 15:56:16,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1484.14 | bwd_inner_microstep: 1484.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 15:56:18,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1253.59 | bwd_inner_microstep: 1253.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 15:56:20,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.08 | bwd_microstep: 1374.17 | bwd_inner_microstep: 1374.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 15:56:22,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1499.73 | bwd_inner_microstep: 1499.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 15:56:24,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1412.27 | bwd_inner_microstep: 1412.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 15:56:26,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1557.82 | bwd_inner_microstep: 1557.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3603 [2024-06-10 15:56:28,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.71 | bwd_microstep: 1244.98 | bwd_inner_microstep: 1244.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721 [2024-06-10 15:56:30,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1335.65 | bwd_inner_microstep: 1335.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 15:56:31,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.74 | bwd_microstep: 802.77 | bwd_inner_microstep: 802.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 15:56:33,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1390.12 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 15:56:35,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1393.30 | bwd_inner_microstep: 1393.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 15:56:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.40 | bwd_microstep: 1645.11 | bwd_inner_microstep: 1645.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 15:56:39,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1550.41 | bwd_inner_microstep: 1550.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2000 [2024-06-10 15:56:40,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 861.12 | bwd_inner_microstep: 861.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116 [2024-06-10 15:56:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 15:56:47,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.45 | bwd_microstep: 6622.95 | bwd_inner_microstep: 976.51 | bwd_allreduce_microstep: 5646.37 | step_microstep: 38.73 [2024-06-10 15:56:47,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15335.14 | bwd: 46589.84 | bwd_inner: 40942.51 | bwd_allreduce: 5646.63 | step: 40.23 {'loss': 1.3055, 'learning_rate': 2.0075067748557808e-05, 'epoch': 0.51} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3444 [2024-06-10 15:56:49,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.34 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4012 [2024-06-10 15:56:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1513.35 | bwd_inner_microstep: 1513.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4263 [2024-06-10 15:56:54,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.19 | bwd_microstep: 1463.29 | bwd_inner_microstep: 1463.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 15:56:56,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1450.89 | bwd_inner_microstep: 1450.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1884 [2024-06-10 15:56:57,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.97 | bwd_microstep: 723.94 | bwd_inner_microstep: 723.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750 [2024-06-10 15:56:59,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1435.07 | bwd_inner_microstep: 1435.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 15:57:00,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1279.44 | bwd_inner_microstep: 1279.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022 [2024-06-10 15:57:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.45 | bwd_microstep: 805.99 | bwd_inner_microstep: 805.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 15:57:03,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1434.47 | bwd_inner_microstep: 1434.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 15:57:05,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.87 | bwd_inner_microstep: 1385.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913 [2024-06-10 15:57:06,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.84 | bwd_microstep: 716.02 | bwd_inner_microstep: 716.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 15:57:08,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1313.16 | bwd_inner_microstep: 1313.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 15:57:10,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1381.81 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3526 [2024-06-10 15:57:12,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.05 | bwd_microstep: 1687.14 | bwd_inner_microstep: 1687.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3421 [2024-06-10 15:57:14,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.67 | bwd_microstep: 1372.13 | bwd_inner_microstep: 1372.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 15:57:16,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.70 | bwd_microstep: 1417.78 | bwd_inner_microstep: 1417.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2001 [2024-06-10 15:57:17,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.41 | bwd_microstep: 709.44 | bwd_inner_microstep: 709.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-10 15:57:19,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1307.54 | bwd_inner_microstep: 1307.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694 [2024-06-10 15:57:21,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.72 | bwd_microstep: 1425.73 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530 [2024-06-10 15:57:23,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.46 | bwd_microstep: 1453.01 | bwd_inner_microstep: 1452.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-10 15:57:25,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.80 | bwd_microstep: 1610.11 | bwd_inner_microstep: 1610.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 15:57:27,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1392.01 | bwd_inner_microstep: 1391.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1990 [2024-06-10 15:57:28,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.95 | bwd_microstep: 708.03 | bwd_inner_microstep: 708.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 15:57:30,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1493.40 | bwd_inner_microstep: 1493.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937 [2024-06-10 15:57:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.95 | bwd_microstep: 761.65 | bwd_inner_microstep: 761.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 15:57:32,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.00 | bwd_microstep: 699.19 | bwd_inner_microstep: 699.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 15:57:34,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.03 | bwd_microstep: 1347.32 | bwd_inner_microstep: 1347.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-10 15:57:36,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.12 | bwd_microstep: 1543.76 | bwd_inner_microstep: 1543.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2234 [2024-06-10 15:57:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.34 | bwd_microstep: 1060.34 | bwd_inner_microstep: 1060.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3425 [2024-06-10 15:57:40,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.44 | bwd_microstep: 1494.25 | bwd_inner_microstep: 1494.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3398 [2024-06-10 15:57:42,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.31 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 15:57:49,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 15:57:49,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 6570.94 | bwd_inner_microstep: 2018.65 | bwd_allreduce_microstep: 4552.24 | step_microstep: 38.17 [2024-06-10 15:57:49,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15353.76 | bwd: 45989.06 | bwd_inner: 41435.92 | bwd_allreduce: 4552.46 | step: 39.76 {'loss': 1.2174, 'learning_rate': 2.0037533940376083e-05, 'epoch': 0.51} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 15:57:51,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1332.49 | bwd_inner_microstep: 1332.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 15:57:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1279.68 | bwd_inner_microstep: 1279.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2334 [2024-06-10 15:57:54,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.61 | bwd_microstep: 982.44 | bwd_inner_microstep: 982.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 15:57:55,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 971.55 | bwd_inner_microstep: 971.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 15:57:57,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1246.81 | bwd_inner_microstep: 1246.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 15:57:58,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.24 | bwd_microstep: 803.51 | bwd_inner_microstep: 803.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1894 [2024-06-10 15:57:59,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.06 | bwd_microstep: 712.45 | bwd_inner_microstep: 712.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 15:58:01,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.85 | bwd_microstep: 1279.93 | bwd_inner_microstep: 1279.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2455 [2024-06-10 15:58:02,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.41 | bwd_microstep: 977.97 | bwd_inner_microstep: 977.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2994 [2024-06-10 15:58:04,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.45 | bwd_microstep: 1190.00 | bwd_inner_microstep: 1189.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 15:58:06,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.63 | bwd_microstep: 1477.77 | bwd_inner_microstep: 1477.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3447 [2024-06-10 15:58:08,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1578.68 | bwd_inner_microstep: 1578.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 15:58:10,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1349.82 | bwd_inner_microstep: 1349.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3499 [2024-06-10 15:58:12,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1679.38 | bwd_inner_microstep: 1679.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 15:58:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1391.35 | bwd_inner_microstep: 1391.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 15:58:16,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1380.56 | bwd_inner_microstep: 1380.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-10 15:58:17,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.61 | bwd_microstep: 709.66 | bwd_inner_microstep: 709.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 15:58:19,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1412.32 | bwd_inner_microstep: 1412.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627 [2024-06-10 15:58:21,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 15:58:22,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.77 | bwd_microstep: 798.44 | bwd_inner_microstep: 798.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466 [2024-06-10 15:58:24,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.75 | bwd_microstep: 1211.78 | bwd_inner_microstep: 1211.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 15:58:25,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.48 | bwd_microstep: 1156.84 | bwd_inner_microstep: 1156.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-10 15:58:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1508.93 | bwd_inner_microstep: 1508.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1893 [2024-06-10 15:58:29,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.71 | bwd_microstep: 682.82 | bwd_inner_microstep: 682.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918 [2024-06-10 15:58:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.75 | bwd_microstep: 781.76 | bwd_inner_microstep: 781.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 15:58:32,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474 [2024-06-10 15:58:33,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.34 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3723 [2024-06-10 15:58:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1397.60 | bwd_inner_microstep: 1397.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 15:58:37,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1556.59 | bwd_inner_microstep: 1556.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2077 [2024-06-10 15:58:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.84 | bwd_microstep: 916.30 | bwd_inner_microstep: 916.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2909 [2024-06-10 15:58:40,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.43 | bwd_microstep: 1200.94 | bwd_inner_microstep: 1200.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 15:58:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 15:58:51,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.92 | bwd_microstep: 9954.44 | bwd_inner_microstep: 1746.04 | bwd_allreduce_microstep: 8208.33 | step_microstep: 38.41 [2024-06-10 15:58:51,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14462.10 | bwd: 46965.39 | bwd_inner: 38756.15 | bwd_allreduce: 8208.57 | step: 39.83 {'loss': 1.2236, 'learning_rate': 2e-05, 'epoch': 0.52} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416 [2024-06-10 15:58:53,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.86 | bwd_microstep: 1435.34 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 15:58:55,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.51 | bwd_microstep: 1469.34 | bwd_inner_microstep: 1469.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874 [2024-06-10 15:58:57,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.61 | bwd_microstep: 1577.45 | bwd_inner_microstep: 1577.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 15:58:59,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.61 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 15:59:01,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.23 | bwd_microstep: 1451.03 | bwd_inner_microstep: 1451.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 15:59:03,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1279.66 | bwd_inner_microstep: 1279.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 15:59:05,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 15:59:07,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.34 | bwd_microstep: 1293.18 | bwd_inner_microstep: 1293.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 15:59:08,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 15:59:10,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.35 | bwd_microstep: 1423.85 | bwd_inner_microstep: 1423.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 15:59:11,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.90 | bwd_microstep: 795.55 | bwd_inner_microstep: 795.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 15:59:13,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1284.42 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 15:59:15,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1338.84 | bwd_inner_microstep: 1338.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 15:59:17,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1387.29 | bwd_inner_microstep: 1387.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 15:59:19,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1514.38 | bwd_inner_microstep: 1514.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 15:59:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 15:59:22,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1390.87 | bwd_inner_microstep: 1390.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 15:59:24,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1523.52 | bwd_inner_microstep: 1523.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 15:59:26,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1558.20 | bwd_inner_microstep: 1558.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2283 [2024-06-10 15:59:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.48 | bwd_microstep: 927.55 | bwd_inner_microstep: 927.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3932 [2024-06-10 15:59:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.84 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 15:59:31,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1376.79 | bwd_inner_microstep: 1376.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822 [2024-06-10 15:59:34,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.30 | bwd_microstep: 1686.26 | bwd_inner_microstep: 1686.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 15:59:36,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1395.30 | bwd_inner_microstep: 1395.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010 [2024-06-10 15:59:37,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.74 | bwd_microstep: 740.79 | bwd_inner_microstep: 740.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 15:59:39,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1547.41 | bwd_inner_microstep: 1547.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3721 [2024-06-10 15:59:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 15:59:43,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.62 | bwd_microstep: 1472.55 | bwd_inner_microstep: 1472.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438 [2024-06-10 15:59:45,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3554 [2024-06-10 15:59:47,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.73 | bwd_microstep: 1562.29 | bwd_inner_microstep: 1562.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 15:59:49,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1339.71 | bwd_inner_microstep: 1339.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3411 [2024-06-10 15:59:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-10 15:59:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.93 | bwd_microstep: 1820.68 | bwd_inner_microstep: 1503.34 | bwd_allreduce_microstep: 317.30 | step_microstep: 37.56 [2024-06-10 15:59:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16219.34 | bwd: 43728.80 | bwd_inner: 43410.61 | bwd_allreduce: 317.52 | step: 39.06 {'loss': 1.2677, 'learning_rate': 1.9962466059623928e-05, 'epoch': 0.52} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1978 [2024-06-10 15:59:52,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 823.41 | bwd_inner_microstep: 823.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3899 [2024-06-10 15:59:54,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1390.31 | bwd_inner_microstep: 1390.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-10 15:59:56,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.92 | bwd_microstep: 1150.13 | bwd_inner_microstep: 1150.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 15:59:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1552.58 | bwd_inner_microstep: 1552.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 16:00:00,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1251.47 | bwd_inner_microstep: 1251.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:00:02,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 16:00:04,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1625.79 | bwd_inner_microstep: 1625.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-10 16:00:06,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.97 | bwd_microstep: 1425.65 | bwd_inner_microstep: 1425.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 16:00:08,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1406.07 | bwd_inner_microstep: 1406.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 16:00:09,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 16:00:12,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1492.99 | bwd_inner_microstep: 1492.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 16:00:13,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1378.92 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1960 [2024-06-10 16:00:14,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.76 | bwd_microstep: 732.21 | bwd_inner_microstep: 732.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523 [2024-06-10 16:00:17,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.36 | bwd_microstep: 1583.05 | bwd_inner_microstep: 1583.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527 [2024-06-10 16:00:18,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1322.94 | bwd_inner_microstep: 1322.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 16:00:20,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.55 | bwd_microstep: 1477.21 | bwd_inner_microstep: 1477.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 16:00:23,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1481.55 | bwd_inner_microstep: 1481.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 16:00:25,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.11 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 16:00:27,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1411.34 | bwd_inner_microstep: 1411.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449 [2024-06-10 16:00:28,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.46 | bwd_microstep: 1192.62 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 16:00:30,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1555.86 | bwd_inner_microstep: 1555.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2163 [2024-06-10 16:00:32,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.91 | bwd_microstep: 950.57 | bwd_inner_microstep: 950.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551 [2024-06-10 16:00:33,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1232.05 | bwd_inner_microstep: 1232.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 16:00:35,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1292.86 | bwd_inner_microstep: 1292.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534 [2024-06-10 16:00:37,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.12 | bwd_microstep: 1229.32 | bwd_inner_microstep: 1229.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484 [2024-06-10 16:00:39,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1218.21 | bwd_inner_microstep: 1218.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 16:00:41,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1559.78 | bwd_inner_microstep: 1559.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 16:00:43,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1491.30 | bwd_inner_microstep: 1491.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 16:00:45,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.21 | bwd_microstep: 1405.21 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3723 [2024-06-10 16:00:47,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1335.20 | bwd_inner_microstep: 1335.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 16:00:48,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1291.64 | bwd_inner_microstep: 1291.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 16:00:55,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.60 [2024-06-10 16:00:55,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.51 | bwd_microstep: 6294.29 | bwd_inner_microstep: 1663.98 | bwd_allreduce_microstep: 4630.27 | step_microstep: 38.01 [2024-06-10 16:00:55,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16098.25 | bwd: 47737.50 | bwd_inner: 43106.30 | bwd_allreduce: 4630.50 | step: 39.51 1.63s/it] 51%|█████▏ | 886/1726 [15:18:22<14:22:47, 61.63s/it] 51%|█████▏ | 887/1726 [15:19:24<14:24:23, 61.82s/it] 51%|█████▏ | 887/1726 [15:19:24<14:24:23, 61.82s/it] 51%|█████▏ | 888/1726 [15:20:26<14:22:46, 61.77s/it] 51%|█████▏ | 888/1726 [15:20:26<14:22:46, 61.77s/it] 52%|█████▏ | 889/1726 [15:21:28<14:21:37, 61.77s/it] 52%|█████▏ | 889/1726 [15:21:28<14:21:37, 61.77s/it] 52%|█████▏ | 890/1726 [15:22:28<14:14:23, 61.32s/it] 52%|█████▏ | 890/1726 [15:22:28<14:14:23, 61.32s/it] 52%|█████▏ | 891/1726 [15:23:32<14:25:13, 62.17s/it] {'loss': 1.1821, 'learning_rate': 1.992493225144219e-05, 'epoch': 0.52} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3520 [2024-06-10 16:00:57,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.67 | bwd_microstep: 1336.21 | bwd_inner_microstep: 1336.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3921 [2024-06-10 16:00:59,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.39 | bwd_microstep: 1484.64 | bwd_inner_microstep: 1484.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 16:01:01,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.45 | bwd_microstep: 1546.68 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291 [2024-06-10 16:01:02,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 874.60 | bwd_inner_microstep: 874.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 16:01:04,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.13 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 16:01:06,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478 [2024-06-10 16:01:08,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.65 | bwd_microstep: 1180.53 | bwd_inner_microstep: 1180.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073 [2024-06-10 16:01:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.98 | bwd_microstep: 816.54 | bwd_inner_microstep: 816.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 16:01:11,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1243.27 | bwd_inner_microstep: 1243.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 16:01:12,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1246.61 | bwd_inner_microstep: 1246.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 16:01:14,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.99 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416 [2024-06-10 16:01:16,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.11 | bwd_microstep: 1438.75 | bwd_inner_microstep: 1438.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 16:01:18,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.58 | bwd_microstep: 1478.48 | bwd_inner_microstep: 1478.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3437 [2024-06-10 16:01:21,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1506.20 | bwd_inner_microstep: 1506.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-10 16:01:22,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.49 | bwd_microstep: 1314.63 | bwd_inner_microstep: 1314.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487 [2024-06-10 16:01:24,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.26 | bwd_microstep: 1314.29 | bwd_inner_microstep: 1314.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 16:01:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1496.00 | bwd_inner_microstep: 1495.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2300 [2024-06-10 16:01:28,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.85 | bwd_microstep: 1004.87 | bwd_inner_microstep: 1004.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813 [2024-06-10 16:01:30,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.34 | bwd_microstep: 1682.44 | bwd_inner_microstep: 1682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614 [2024-06-10 16:01:32,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1607.61 | bwd_inner_microstep: 1607.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 16:01:33,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.40 | bwd_microstep: 973.53 | bwd_inner_microstep: 973.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 16:01:35,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.36 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 16:01:37,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1553.78 | bwd_inner_microstep: 1553.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 16:01:39,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1294.79 | bwd_inner_microstep: 1294.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 16:01:41,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1458.08 | bwd_inner_microstep: 1458.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 16:01:43,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1345.34 | bwd_inner_microstep: 1345.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 16:01:45,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3586 [2024-06-10 16:01:47,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1704.87 | bwd_inner_microstep: 1704.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3605 [2024-06-10 16:01:50,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.34 | bwd_microstep: 1703.53 | bwd_inner_microstep: 1703.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3422 [2024-06-10 16:01:51,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.93 | bwd_microstep: 1208.43 | bwd_inner_microstep: 1208.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3421 [2024-06-10 16:01:53,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.40 | bwd_microstep: 1325.11 | bwd_inner_microstep: 1325.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 16:01:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 16:01:57,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 2994.56 | bwd_inner_microstep: 1527.25 | bwd_allreduce_microstep: 1467.25 | step_microstep: 38.19 [2024-06-10 16:01:57,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16175.93 | bwd: 44897.51 | bwd_inner: 43429.35 | bwd_allreduce: 1467.48 | step: 39.68 {'loss': 1.2872, 'learning_rate': 1.988739870764869e-05, 'epoch': 0.52} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-10 16:01:59,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1354.62 | bwd_inner_microstep: 1354.55 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 16:02:00,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.23 | bwd_microstep: 1274.64 | bwd_inner_microstep: 1274.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3871 [2024-06-10 16:02:02,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 16:02:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1447.72 | bwd_inner_microstep: 1447.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 16:02:06,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-10 16:02:08,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.95 | bwd_microstep: 1187.53 | bwd_inner_microstep: 1187.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1983 [2024-06-10 16:02:09,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.43 | bwd_microstep: 734.11 | bwd_inner_microstep: 734.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 16:02:11,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1384.01 | bwd_inner_microstep: 1383.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 16:02:13,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1252.51 | bwd_inner_microstep: 1252.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 16:02:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1316.19 | bwd_inner_microstep: 1316.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955 [2024-06-10 16:02:16,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 820.44 | bwd_inner_microstep: 820.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3567 [2024-06-10 16:02:18,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.48 | bwd_microstep: 1558.14 | bwd_inner_microstep: 1558.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-10 16:02:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.02 | bwd_microstep: 1579.00 | bwd_inner_microstep: 1578.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774 [2024-06-10 16:02:22,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.83 | bwd_microstep: 1735.38 | bwd_inner_microstep: 1735.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3812 [2024-06-10 16:02:25,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.41 | bwd_microstep: 1711.51 | bwd_inner_microstep: 1711.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 16:02:27,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1484.50 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 16:02:29,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1442.25 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637 [2024-06-10 16:02:31,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.24 | bwd_microstep: 1532.67 | bwd_inner_microstep: 1532.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510 [2024-06-10 16:02:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1511.60 | bwd_inner_microstep: 1511.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1202 [2024-06-10 16:02:34,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 178.54 | bwd_microstep: 462.84 | bwd_inner_microstep: 462.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3392 [2024-06-10 16:02:35,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1391.63 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 16:02:38,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.02 | bwd_microstep: 1522.07 | bwd_inner_microstep: 1522.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3604 [2024-06-10 16:02:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.48 | bwd_microstep: 1584.72 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540 [2024-06-10 16:02:42,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1322.90 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 16:02:43,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.05 | bwd_microstep: 971.00 | bwd_inner_microstep: 970.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 16:02:44,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.75 | bwd_microstep: 976.23 | bwd_inner_microstep: 976.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-10 16:02:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.21 | bwd_microstep: 1197.96 | bwd_inner_microstep: 1197.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2108 [2024-06-10 16:02:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.72 | bwd_microstep: 854.23 | bwd_inner_microstep: 854.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3539 [2024-06-10 16:02:49,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1323.09 | bwd_inner_microstep: 1323.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 16:02:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1499.11 | bwd_inner_microstep: 1499.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 16:02:53,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3579 [2024-06-10 16:02:57,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.38 | optimizer_step: 6.61 [2024-06-10 16:02:57,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.29 | bwd_microstep: 3991.33 | bwd_inner_microstep: 1491.70 | bwd_allreduce_microstep: 2499.57 | step_microstep: 39.33 [2024-06-10 16:02:57,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15713.01 | bwd: 44617.33 | bwd_inner: 42116.80 | bwd_allreduce: 2499.82 | step: 40.83 {'loss': 1.1885, 'learning_rate': 1.984986556043634e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 16:02:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.30 | bwd_microstep: 1329.60 | bwd_inner_microstep: 1329.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3883 [2024-06-10 16:03:01,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.12 | bwd_microstep: 1350.07 | bwd_inner_microstep: 1350.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 16:03:03,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1271.29 | bwd_inner_microstep: 1271.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-10 16:03:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.40 | bwd_microstep: 1640.79 | bwd_inner_microstep: 1640.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 16:03:07,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1492.22 | bwd_inner_microstep: 1492.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 16:03:09,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1281.43 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 16:03:11,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1395.65 | bwd_inner_microstep: 1395.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 778 [2024-06-10 16:03:11,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.23 | bwd_microstep: 306.41 | bwd_inner_microstep: 306.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 16:03:13,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1386.62 | bwd_inner_microstep: 1386.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 16:03:15,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523 [2024-06-10 16:03:17,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1420.36 | bwd_inner_microstep: 1420.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 16:03:18,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 802.59 | bwd_inner_microstep: 802.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 16:03:20,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609 [2024-06-10 16:03:22,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1314.96 | bwd_inner_microstep: 1314.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515 [2024-06-10 16:03:23,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1352.94 | bwd_inner_microstep: 1352.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3423 [2024-06-10 16:03:25,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1407.36 | bwd_inner_microstep: 1407.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 16:03:28,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1600.46 | bwd_inner_microstep: 1600.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434 [2024-06-10 16:03:29,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.43 | bwd_microstep: 1408.00 | bwd_inner_microstep: 1407.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 648 [2024-06-10 16:03:30,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.31 | bwd_microstep: 273.92 | bwd_inner_microstep: 273.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844 [2024-06-10 16:03:32,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.82 | bwd_microstep: 1654.40 | bwd_inner_microstep: 1654.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2287 [2024-06-10 16:03:34,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.07 | bwd_microstep: 1068.79 | bwd_inner_microstep: 1068.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839 [2024-06-10 16:03:36,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.41 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628 [2024-06-10 16:03:38,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1412.87 | bwd_inner_microstep: 1412.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 16:03:40,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1535.71 | bwd_inner_microstep: 1535.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-10 16:03:42,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1653.82 | bwd_inner_microstep: 1653.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-10 16:03:44,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.78 | bwd_microstep: 1535.98 | bwd_inner_microstep: 1535.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 16:03:46,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 879 [2024-06-10 16:03:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.09 | bwd_microstep: 366.98 | bwd_inner_microstep: 366.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 16:03:49,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 1503.92 | bwd_inner_microstep: 1503.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3791 [2024-06-10 16:03:51,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.55 | bwd_microstep: 1681.32 | bwd_inner_microstep: 1681.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 16:03:53,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.71 | bwd_microstep: 1634.72 | bwd_inner_microstep: 1634.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2416 [2024-06-10 16:03:57,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 16:03:57,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.12 | bwd_microstep: 2937.51 | bwd_inner_microstep: 1225.02 | bwd_allreduce_microstep: 1712.43 | step_microstep: 37.95 [2024-06-10 16:03:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15504.04 | bwd: 43494.81 | bwd_inner: 41781.45 | bwd_allreduce: 1712.66 | step: 39.44 {'loss': 1.2182, 'learning_rate': 1.981233294199671e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 16:03:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.98 | bwd_microstep: 1370.68 | bwd_inner_microstep: 1370.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1353 [2024-06-10 16:03:59,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.44 | bwd_microstep: 516.01 | bwd_inner_microstep: 515.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 16:04:01,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1443.12 | bwd_inner_microstep: 1443.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 16:04:03,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1474.86 | bwd_inner_microstep: 1474.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 16:04:05,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1379.36 | bwd_inner_microstep: 1379.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 16:04:07,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 1544.28 | bwd_inner_microstep: 1544.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 16:04:09,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1275.76 | bwd_inner_microstep: 1275.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710 [2024-06-10 16:04:11,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.84 | bwd_microstep: 1428.49 | bwd_inner_microstep: 1428.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3485 [2024-06-10 16:04:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1408.58 | bwd_inner_microstep: 1408.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2903 [2024-06-10 16:04:15,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.10 | bwd_microstep: 1088.12 | bwd_inner_microstep: 1088.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1892 [2024-06-10 16:04:15,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.06 | bwd_microstep: 683.21 | bwd_inner_microstep: 683.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 16:04:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.94 | bwd_microstep: 1480.49 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 16:04:20,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1480.86 | bwd_inner_microstep: 1480.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3535 [2024-06-10 16:04:21,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1321.16 | bwd_inner_microstep: 1321.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493 [2024-06-10 16:04:24,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1581.51 | bwd_inner_microstep: 1581.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 16:04:26,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1577.74 | bwd_inner_microstep: 1577.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 16:04:27,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.90 | bwd_microstep: 701.08 | bwd_inner_microstep: 701.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-10 16:04:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.16 | bwd_microstep: 918.54 | bwd_inner_microstep: 918.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 16:04:30,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1354.61 | bwd_inner_microstep: 1354.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 16:04:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1414.82 | bwd_inner_microstep: 1414.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 16:04:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.93 | bwd_microstep: 1657.44 | bwd_inner_microstep: 1657.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-10 16:04:35,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.60 | bwd_microstep: 974.40 | bwd_inner_microstep: 974.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 16:04:37,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1412.89 | bwd_inner_microstep: 1412.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 16:04:39,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1300.16 | bwd_inner_microstep: 1300.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 16:04:41,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1448.11 | bwd_inner_microstep: 1448.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3817 [2024-06-10 16:04:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-10 16:04:45,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1590.93 | bwd_inner_microstep: 1590.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596 [2024-06-10 16:04:48,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1595.33 | bwd_inner_microstep: 1595.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 16:04:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1535.72 | bwd_inner_microstep: 1535.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 16:04:52,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1397.83 | bwd_inner_microstep: 1397.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042 [2024-06-10 16:04:53,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 904.29 | bwd_inner_microstep: 904.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3562 [2024-06-10 16:04:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.58 [2024-06-10 16:04:59,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.94 | bwd_microstep: 5515.20 | bwd_inner_microstep: 1775.85 | bwd_allreduce_microstep: 3739.29 | step_microstep: 38.40 [2024-06-10 16:04:59,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15826.08 | bwd: 46280.01 | bwd_inner: 42539.81 | bwd_allreduce: 3739.53 | step: 39.89 {'loss': 1.2076, 'learning_rate': 1.9774800984519485e-05, 'epoch': 0.52} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3597 [2024-06-10 16:05:01,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.56 | bwd_microstep: 1458.97 | bwd_inner_microstep: 1458.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2434 [2024-06-10 16:05:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.94 | bwd_microstep: 1008.77 | bwd_inner_microstep: 1008.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 16:05:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1245.74 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3912 [2024-06-10 16:05:06,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 1488.51 | bwd_inner_microstep: 1488.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 16:05:08,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.96 | bwd_microstep: 1150.93 | bwd_inner_microstep: 1150.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862 [2024-06-10 16:05:10,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.23 | bwd_microstep: 1665.96 | bwd_inner_microstep: 1665.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428 [2024-06-10 16:05:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1348.31 | bwd_inner_microstep: 1348.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2311 [2024-06-10 16:05:13,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.95 | bwd_microstep: 787.11 | bwd_inner_microstep: 787.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 16:05:15,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.26 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 16:05:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 16:05:18,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 16:05:20,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.69 | bwd_microstep: 1391.56 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3417 [2024-06-10 16:05:22,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1421.53 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2875 [2024-06-10 16:05:24,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.26 | bwd_microstep: 1176.93 | bwd_inner_microstep: 1176.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 16:05:26,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.63 | bwd_microstep: 1511.41 | bwd_inner_microstep: 1511.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2175 [2024-06-10 16:05:27,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.92 | bwd_microstep: 948.22 | bwd_inner_microstep: 948.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 16:05:29,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 1491.24 | bwd_inner_microstep: 1491.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 16:05:31,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.78 | bwd_microstep: 1626.17 | bwd_inner_microstep: 1626.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3692 [2024-06-10 16:05:33,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1358.62 | bwd_inner_microstep: 1358.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-10 16:05:35,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 16:05:37,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1502.27 | bwd_inner_microstep: 1502.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 16:05:39,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1556.21 | bwd_inner_microstep: 1556.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3548 [2024-06-10 16:05:41,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.54 | bwd_microstep: 1230.14 | bwd_inner_microstep: 1230.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3778 [2024-06-10 16:05:43,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.33 | bwd_microstep: 1411.00 | bwd_inner_microstep: 1410.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2432 [2024-06-10 16:05:44,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.23 | bwd_microstep: 941.09 | bwd_inner_microstep: 941.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 633 [2024-06-10 16:05:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.64 | bwd_microstep: 263.37 | bwd_inner_microstep: 263.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.26 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-10 16:05:47,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.71 | bwd_microstep: 1549.51 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 16:05:49,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1653.51 | bwd_inner_microstep: 1653.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3594 [2024-06-10 16:05:51,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.48 | bwd_microstep: 1553.44 | bwd_inner_microstep: 1553.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-10 16:05:53,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.81 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3491 [2024-06-10 16:05:55,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.89 | bwd_microstep: 1333.54 | bwd_inner_microstep: 1333.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 16:05:59,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 16:05:59,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 2991.42 | bwd_inner_microstep: 1753.43 | bwd_allreduce_microstep: 1237.94 | step_microstep: 37.83 [2024-06-10 16:05:59,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.93 | bwd: 43557.15 | bwd_inner: 42318.30 | bwd_allreduce: 1238.17 | step: 39.57 {'loss': 1.211, 'learning_rate': 1.973726982019202e-05, 'epoch': 0.52} 52%|█████▏ | 891/1726 [15:23:32<14:25:13, 62.17s/it] 52%|█████▏ | 892/1726 [15:24:33<14:21:00, 61.94s/it] 52%|█████▏ | 892/1726 [15:24:33<14:21:00, 61.94s/it] 52%|█████▏ | 893/1726 [15:25:34<14:14:37, 61.56s/it] 52%|█████▏ | 893/1726 [15:25:34<14:14:37, 61.56s/it] 52%|█████▏ | 894/1726 [15:26:33<14:04:20, 60.89s/it] 52%|█████▏ | 894/1726 [15:26:33<14:04:20, 60.89s/it] 52%|█████▏ | 895/1726 [15:27:36<14:09:48, 61.36s/it] 52%|█████▏ | 895/1726 [15:27:36<14:09:48, 61.36s/it] 52%|█████▏ | 896/1726 [15:28:35<14:01:40, 60.84s/it] 52%|█████dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-10 16:06:01,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 1596.72 | bwd_inner_microstep: 1596.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 16:06:03,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1275.10 | bwd_inner_microstep: 1275.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 16:06:04,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-10 16:06:06,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1356.89 | bwd_inner_microstep: 1356.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:06:08,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.21 | bwd_microstep: 1381.26 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4081 [2024-06-10 16:06:10,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1586.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 16:06:12,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.92 | bwd_microstep: 1252.55 | bwd_inner_microstep: 1252.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 16:06:14,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.59 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 16:06:15,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.27 | bwd_microstep: 797.57 | bwd_inner_microstep: 797.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-10 16:06:17,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.03 | bwd_microstep: 1305.30 | bwd_inner_microstep: 1305.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 16:06:19,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.80 | bwd_microstep: 1250.75 | bwd_inner_microstep: 1250.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 16:06:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.35 | bwd_microstep: 1529.88 | bwd_inner_microstep: 1529.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-10 16:06:22,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.88 | bwd_microstep: 805.70 | bwd_inner_microstep: 805.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2157 [2024-06-10 16:06:23,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.03 | bwd_microstep: 828.35 | bwd_inner_microstep: 828.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1857 [2024-06-10 16:06:24,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.52 | bwd_microstep: 707.09 | bwd_inner_microstep: 707.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3677 [2024-06-10 16:06:26,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.15 | bwd_microstep: 1551.69 | bwd_inner_microstep: 1551.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 16:06:28,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1277.09 | bwd_inner_microstep: 1277.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 16:06:30,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1247.10 | bwd_inner_microstep: 1247.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 16:06:32,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1493.27 | bwd_inner_microstep: 1493.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-10 16:06:34,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1521.64 | bwd_inner_microstep: 1521.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2132 [2024-06-10 16:06:35,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.00 | bwd_microstep: 867.00 | bwd_inner_microstep: 866.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3604 [2024-06-10 16:06:37,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.50 | bwd_microstep: 1217.46 | bwd_inner_microstep: 1217.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 16:06:39,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1558.93 | bwd_inner_microstep: 1558.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 16:06:41,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3729 [2024-06-10 16:06:43,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.02 | bwd_microstep: 1638.23 | bwd_inner_microstep: 1638.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 16:06:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1559.16 | bwd_inner_microstep: 1559.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3728 [2024-06-10 16:06:47,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1466.84 | bwd_inner_microstep: 1466.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602 [2024-06-10 16:06:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1537.59 | bwd_inner_microstep: 1537.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799 [2024-06-10 16:06:51,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.47 | bwd_microstep: 1644.99 | bwd_inner_microstep: 1644.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3802 [2024-06-10 16:06:53,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1513.54 | bwd_inner_microstep: 1513.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 16:06:55,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.56 | bwd_microstep: 1284.20 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 16:07:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 16:07:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 5931.34 | bwd_inner_microstep: 1803.09 | bwd_allreduce_microstep: 4128.20 | step_microstep: 37.87 [2024-06-10 16:07:02,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15873.67 | bwd: 46833.49 | bwd_inner: 42704.38 | bwd_allreduce: 4128.43 | step: 39.45 {'loss': 1.2167, 'learning_rate': 1.9699739581198888e-05, 'epoch': 0.52} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 16:07:04,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1466.38 | bwd_inner_microstep: 1466.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401 [2024-06-10 16:07:05,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.34 | bwd_microstep: 1207.94 | bwd_inner_microstep: 1207.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 16:07:07,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860 [2024-06-10 16:07:10,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.18 | bwd_microstep: 1566.49 | bwd_inner_microstep: 1566.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3482 [2024-06-10 16:07:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.79 | bwd_microstep: 1215.81 | bwd_inner_microstep: 1215.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 16:07:13,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1381.41 | bwd_inner_microstep: 1381.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4121 [2024-06-10 16:07:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.25 | bwd_microstep: 1635.77 | bwd_inner_microstep: 1635.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-10 16:07:17,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1416.18 | bwd_inner_microstep: 1416.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 16:07:18,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.21 | bwd_microstep: 798.50 | bwd_inner_microstep: 798.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2660 [2024-06-10 16:07:20,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.16 | bwd_microstep: 1024.84 | bwd_inner_microstep: 1024.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530 [2024-06-10 16:07:22,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.08 | bwd_microstep: 1454.48 | bwd_inner_microstep: 1454.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-10 16:07:24,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.18 | bwd_microstep: 1615.15 | bwd_inner_microstep: 1615.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 16:07:26,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.32 | bwd_microstep: 1519.77 | bwd_inner_microstep: 1519.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2105 [2024-06-10 16:07:27,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 855.35 | bwd_inner_microstep: 855.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 16:07:29,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-10 16:07:31,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1585.14 | bwd_inner_microstep: 1585.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 16:07:33,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 16:07:34,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.56 | bwd_microstep: 807.26 | bwd_inner_microstep: 807.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 16:07:35,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.88 | bwd_microstep: 810.52 | bwd_inner_microstep: 810.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2446 [2024-06-10 16:07:37,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.83 | bwd_microstep: 948.55 | bwd_inner_microstep: 948.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469 [2024-06-10 16:07:38,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.37 | bwd_microstep: 1216.44 | bwd_inner_microstep: 1216.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 16:07:40,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1280.07 | bwd_inner_microstep: 1280.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3664 [2024-06-10 16:07:42,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.81 | bwd_microstep: 1627.45 | bwd_inner_microstep: 1627.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2048 [2024-06-10 16:07:44,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.46 | bwd_microstep: 907.98 | bwd_inner_microstep: 907.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-10 16:07:45,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.61 | bwd_microstep: 1190.14 | bwd_inner_microstep: 1190.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 16:07:47,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 16:07:49,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1403.71 | bwd_inner_microstep: 1403.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814 [2024-06-10 16:07:51,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.05 | bwd_microstep: 1722.81 | bwd_inner_microstep: 1722.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617 [2024-06-10 16:07:54,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1507.13 | bwd_inner_microstep: 1507.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 16:07:56,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.40 | bwd_microstep: 1644.78 | bwd_inner_microstep: 1644.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 16:07:58,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.65 | bwd_microstep: 1599.96 | bwd_inner_microstep: 1599.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604 [2024-06-10 16:08:02,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.61 [2024-06-10 16:08:02,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.36 | bwd_microstep: 3605.41 | bwd_inner_microstep: 1930.27 | bwd_allreduce_microstep: 1675.09 | step_microstep: 37.75 [2024-06-10 16:08:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.83 | bwd: 44312.37 | bwd_inner: 42636.38 | bwd_allreduce: 1675.32 | step: 39.19 {'loss': 1.2287, 'learning_rate': 1.966221039972138e-05, 'epoch': 0.52} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 16:08:04,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.25 | bwd_microstep: 1468.91 | bwd_inner_microstep: 1468.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 16:08:06,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.72 | bwd_microstep: 1276.34 | bwd_inner_microstep: 1276.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 16:08:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1337.77 | bwd_inner_microstep: 1337.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 16:08:10,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.72 | bwd_microstep: 1546.79 | bwd_inner_microstep: 1546.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1880 [2024-06-10 16:08:11,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.90 | bwd_microstep: 742.03 | bwd_inner_microstep: 742.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-10 16:08:13,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.99 | bwd_microstep: 1462.97 | bwd_inner_microstep: 1462.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 16:08:15,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1285.71 | bwd_inner_microstep: 1285.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 16:08:17,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.00 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 16:08:18,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 16:08:20,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1249.38 | bwd_inner_microstep: 1249.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 16:08:21,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.62 | bwd_microstep: 801.44 | bwd_inner_microstep: 801.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434 [2024-06-10 16:08:23,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.12 | bwd_microstep: 1191.00 | bwd_inner_microstep: 1190.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2457 [2024-06-10 16:08:24,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.42 | bwd_microstep: 1014.63 | bwd_inner_microstep: 1014.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3416 [2024-06-10 16:08:26,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1330.44 | bwd_inner_microstep: 1330.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3694 [2024-06-10 16:08:28,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.40 | bwd_microstep: 1234.57 | bwd_inner_microstep: 1234.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 16:08:30,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1287.38 | bwd_inner_microstep: 1287.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686 [2024-06-10 16:08:31,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540 [2024-06-10 16:08:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1326.99 | bwd_inner_microstep: 1326.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681 [2024-06-10 16:08:35,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.47 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 16:08:37,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.13 | bwd_microstep: 1391.58 | bwd_inner_microstep: 1391.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 16:08:39,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.91 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 16:08:40,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.99 | bwd_microstep: 1183.74 | bwd_inner_microstep: 1183.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617 [2024-06-10 16:08:43,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1608.55 | bwd_inner_microstep: 1608.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934 [2024-06-10 16:08:44,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.77 | bwd_microstep: 727.81 | bwd_inner_microstep: 727.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645 [2024-06-10 16:08:46,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.47 | bwd_microstep: 1606.60 | bwd_inner_microstep: 1606.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3568 [2024-06-10 16:08:48,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.53 | bwd_microstep: 1346.88 | bwd_inner_microstep: 1346.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813 [2024-06-10 16:08:50,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1624.67 | bwd_inner_microstep: 1624.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815 [2024-06-10 16:08:52,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.66 | bwd_microstep: 1754.91 | bwd_inner_microstep: 1754.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3607 [2024-06-10 16:08:55,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1566.26 | bwd_inner_microstep: 1566.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2264 [2024-06-10 16:08:56,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.23 | bwd_microstep: 875.80 | bwd_inner_microstep: 875.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806 [2024-06-10 16:08:58,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.40 | bwd_microstep: 1751.79 | bwd_inner_microstep: 1751.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-10 16:09:03,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.17 | optimizer_step: 6.62 [2024-06-10 16:09:03,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 4040.04 | bwd_inner_microstep: 1808.37 | bwd_allreduce_microstep: 2231.62 | step_microstep: 37.87 [2024-06-10 16:09:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15771.08 | bwd: 44484.98 | bwd_inner: 42252.46 | bwd_allreduce: 2231.85 | step: 39.33 {'loss': 1.2305, 'learning_rate': 1.962468240793709e-05, 'epoch': 0.52} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2026 [2024-06-10 16:09:04,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.22 | bwd_microstep: 903.41 | bwd_inner_microstep: 903.30 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3392 [2024-06-10 16:09:06,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.56 | bwd_microstep: 1141.97 | bwd_inner_microstep: 1141.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4508 [2024-06-10 16:09:08,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.55 | bwd_microstep: 1742.33 | bwd_inner_microstep: 1742.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922 [2024-06-10 16:09:10,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.07 | bwd_microstep: 1485.76 | bwd_inner_microstep: 1485.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482 [2024-06-10 16:09:12,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1409.77 | bwd_inner_microstep: 1409.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4174 [2024-06-10 16:09:15,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.02 | bwd_microstep: 1747.84 | bwd_inner_microstep: 1747.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217 [2024-06-10 16:09:16,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.19 | bwd_microstep: 956.30 | bwd_inner_microstep: 956.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2666 [2024-06-10 16:09:17,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.96 | bwd_microstep: 1024.12 | bwd_inner_microstep: 1024.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 16:09:19,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-10 16:09:21,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1519.45 | bwd_inner_microstep: 1519.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976 [2024-06-10 16:09:22,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.64 | bwd_microstep: 892.36 | bwd_inner_microstep: 892.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 16:09:24,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1350.01 | bwd_inner_microstep: 1349.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905 [2024-06-10 16:09:25,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.35 | bwd_microstep: 775.73 | bwd_inner_microstep: 775.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1930 [2024-06-10 16:09:26,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.12 | bwd_microstep: 726.36 | bwd_inner_microstep: 726.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2979 [2024-06-10 16:09:28,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.39 | bwd_microstep: 1138.09 | bwd_inner_microstep: 1138.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3515 [2024-06-10 16:09:30,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.57 | bwd_microstep: 1616.12 | bwd_inner_microstep: 1616.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3648 [2024-06-10 16:09:32,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.63 | bwd_microstep: 1706.43 | bwd_inner_microstep: 1706.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023 [2024-06-10 16:09:34,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.80 | bwd_microstep: 903.85 | bwd_inner_microstep: 903.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 16:09:35,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.50 | bwd_microstep: 1157.98 | bwd_inner_microstep: 1157.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025 [2024-06-10 16:09:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.76 | bwd_microstep: 901.20 | bwd_inner_microstep: 901.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 16:09:39,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.16 | bwd_microstep: 1494.01 | bwd_inner_microstep: 1493.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 16:09:41,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1461.36 | bwd_inner_microstep: 1461.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2068 [2024-06-10 16:09:42,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 818.11 | bwd_inner_microstep: 818.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1437 [2024-06-10 16:09:42,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.30 | bwd_microstep: 535.29 | bwd_inner_microstep: 535.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568 [2024-06-10 16:09:44,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1362.05 | bwd_inner_microstep: 1362.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3608 [2024-06-10 16:09:46,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1341.17 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 16:09:48,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1411.93 | bwd_inner_microstep: 1411.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 16:09:50,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1450.10 | bwd_inner_microstep: 1450.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 16:09:52,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.48 | bwd_microstep: 1287.56 | bwd_inner_microstep: 1287.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594 [2024-06-10 16:09:54,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1311.62 | bwd_inner_microstep: 1311.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 16:09:56,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.53 | bwd_microstep: 1446.84 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780 [2024-06-10 16:10:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.28 | optimizer_step: 6.59 [2024-06-10 16:10:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.55 | bwd_microstep: 8471.44 | bwd_inner_microstep: 1862.60 | bwd_allreduce_microstep: 6608.79 | step_microstep: 39.07 [2024-06-10 16:10:05,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14943.80 | bwd: 46736.91 | bwd_inner: 40127.13 | bwd_allreduce: 6609.06 | step: 40.67 {'loss': 1.2033, 'learning_rate': 1.9587155738019412e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 16:10:07,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1360.92 | bwd_inner_microstep: 1360.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4577 [2024-06-10 16:10:09,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.23 | bwd_microstep: 1780.46 | bwd_inner_microstep: 1780.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 16:10:11,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1485.16 | bwd_inner_microstep: 1485.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 16:10:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.00 | bwd_microstep: 1146.80 | bwd_inner_microstep: 1146.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 16:10:15,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 16:10:17,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.69 | bwd_microstep: 1528.32 | bwd_inner_microstep: 1528.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 16:10:19,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.27 | bwd_inner_microstep: 1283.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 16:10:20,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1250.60 | bwd_inner_microstep: 1250.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 16:10:22,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1298.28 | bwd_inner_microstep: 1298.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2371 [2024-06-10 16:10:23,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.35 | bwd_microstep: 933.52 | bwd_inner_microstep: 933.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 16:10:25,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1252.71 | bwd_inner_microstep: 1252.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3433 [2024-06-10 16:10:27,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1536.54 | bwd_inner_microstep: 1536.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661 [2024-06-10 16:10:30,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1617.03 | bwd_inner_microstep: 1617.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511 [2024-06-10 16:10:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2047 [2024-06-10 16:10:33,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.88 | bwd_microstep: 904.92 | bwd_inner_microstep: 904.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538 [2024-06-10 16:10:34,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.79 | bwd_microstep: 1196.34 | bwd_inner_microstep: 1196.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 16:10:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1278.47 | bwd_inner_microstep: 1278.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 16:10:37,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 788.99 | bwd_inner_microstep: 788.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-10 16:10:39,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.58 | bwd_microstep: 1652.34 | bwd_inner_microstep: 1652.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 16:10:41,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1293.78 | bwd_inner_microstep: 1293.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 16:10:43,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1356.68 | bwd_inner_microstep: 1356.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 16:10:45,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1254.14 | bwd_inner_microstep: 1254.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2000 [2024-06-10 16:10:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.06 | bwd_microstep: 800.59 | bwd_inner_microstep: 800.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-10 16:10:48,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1190.84 | bwd_inner_microstep: 1190.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093 [2024-06-10 16:10:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.71 | bwd_microstep: 928.03 | bwd_inner_microstep: 928.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 16:10:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.96 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550 [2024-06-10 16:10:53,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1421.90 | bwd_inner_microstep: 1421.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-10 16:10:55,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.67 | bwd_microstep: 1694.25 | bwd_inner_microstep: 1694.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 16:10:57,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1531.67 | bwd_inner_microstep: 1531.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604 [2024-06-10 16:11:00,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.14 | bwd_microstep: 1595.68 | bwd_inner_microstep: 1595.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-10 16:11:01,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.82 | bwd_microstep: 972.98 | bwd_inner_microstep: 972.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 16:11:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.62 [2024-06-10 16:11:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.81 | bwd_microstep: 3853.31 | bwd_inner_microstep: 1852.67 | bwd_allreduce_microstep: 2000.59 | step_microstep: 38.04 [2024-06-10 16:11:05,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15804.29 | bwd: 44388.26 | bwd_inner: 42386.77 | bwd_allreduce: 2000.82 | step: 39.50 {'loss': 1.2348, 'learning_rate': 1.9549630522137084e-05, 'epoch': 0.52} ▏ | 896/1726 [15:28:35<14:01:40, 60.84s/it] 52%|█████▏ | 897/1726 [15:29:38<14:09:44, 61.50s/it] 52%|█████▏ | 897/1726 [15:29:39<14:09:44, 61.50s/it] 52%|█████▏ | 898/1726 [15:30:39<14:04:38, 61.21s/it] 52%|█████▏ | 898/1726 [15:30:39<14:04:38, 61.21s/it] 52%|█████▏ | 899/1726 [15:31:40<14:01:01, 61.02s/it] 52%|█████▏ | 899/1726 [15:31:40<14:01:01, 61.02s/it] 52%|█████▏ | 900/1726 [15:32:42<14:04:09, 61.32s/it] 52%|█████▏ | 900/1726 [15:32:42<14:04:09, 61.32s/it] 52%|█████▏ | 901/1726 [15:33:42<13:59:50, 61.08s/it] 52%|█████▏ | 901/1726 [15:33:42<13:59:50, 61.08sdynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2622 [2024-06-10 16:11:07,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.22 | bwd_microstep: 1103.52 | bwd_inner_microstep: 1103.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888 [2024-06-10 16:11:09,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1581.16 | bwd_inner_microstep: 1581.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853 [2024-06-10 16:11:11,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1361.39 | bwd_inner_microstep: 1361.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788 [2024-06-10 16:11:13,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.91 | bwd_microstep: 1449.25 | bwd_inner_microstep: 1449.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3761 [2024-06-10 16:11:15,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.34 | bwd_microstep: 1339.49 | bwd_inner_microstep: 1339.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477 [2024-06-10 16:11:17,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.73 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 16:11:19,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 1381.92 | bwd_inner_microstep: 1381.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 16:11:21,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1483.47 | bwd_inner_microstep: 1483.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 16:11:22,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1249.76 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3727 [2024-06-10 16:11:24,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.50 | bwd_microstep: 1483.41 | bwd_inner_microstep: 1483.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:11:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.64 | bwd_microstep: 1370.43 | bwd_inner_microstep: 1370.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 16:11:28,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1240.15 | bwd_inner_microstep: 1240.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 16:11:30,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1474.46 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3658 [2024-06-10 16:11:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.76 | bwd_microstep: 1461.15 | bwd_inner_microstep: 1461.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 16:11:34,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.43 | bwd_microstep: 1310.30 | bwd_inner_microstep: 1310.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2156 [2024-06-10 16:11:35,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.57 | bwd_microstep: 946.83 | bwd_inner_microstep: 946.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 16:11:37,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1415.22 | bwd_inner_microstep: 1415.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123 [2024-06-10 16:11:38,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.09 | bwd_microstep: 828.68 | bwd_inner_microstep: 828.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3685 [2024-06-10 16:11:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1324.91 | bwd_inner_microstep: 1324.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-10 16:11:42,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1506.91 | bwd_inner_microstep: 1506.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 16:11:44,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1153.72 | bwd_inner_microstep: 1153.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-10 16:11:46,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-10 16:11:48,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1502.90 | bwd_inner_microstep: 1502.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 16:11:50,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1407.05 | bwd_inner_microstep: 1407.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2273 [2024-06-10 16:11:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.56 | bwd_microstep: 936.62 | bwd_inner_microstep: 936.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3813 [2024-06-10 16:11:53,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1386.85 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564 [2024-06-10 16:11:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.40 | bwd_microstep: 1330.20 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-10 16:11:57,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1597.15 | bwd_inner_microstep: 1597.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3395 [2024-06-10 16:11:59,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.02 | bwd_microstep: 1439.39 | bwd_inner_microstep: 1439.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 16:12:01,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 1644.15 | bwd_inner_microstep: 1644.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815 [2024-06-10 16:12:04,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.08 | bwd_microstep: 1751.16 | bwd_inner_microstep: 1751.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 16:12:08,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 16:12:08,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.78 | bwd_microstep: 3500.06 | bwd_inner_microstep: 1710.70 | bwd_allreduce_microstep: 1789.31 | step_microstep: 37.99 [2024-06-10 16:12:08,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16374.37 | bwd: 45740.27 | bwd_inner: 43950.03 | bwd_allreduce: 1789.55 | step: 39.44 {'loss': 1.2557, 'learning_rate': 1.951210689245371e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 16:12:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.90 | bwd_microstep: 1327.46 | bwd_inner_microstep: 1327.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3998 [2024-06-10 16:12:12,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.23 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 16:12:14,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.80 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-10 16:12:16,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.81 | bwd_microstep: 1545.20 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2212 [2024-06-10 16:12:17,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.83 | bwd_microstep: 956.25 | bwd_inner_microstep: 956.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 16:12:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1539.43 | bwd_inner_microstep: 1539.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870 [2024-06-10 16:12:20,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.37 | bwd_microstep: 708.20 | bwd_inner_microstep: 708.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1881 [2024-06-10 16:12:21,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 710.26 | bwd_inner_microstep: 710.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 16:12:23,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1250.98 | bwd_inner_microstep: 1250.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404 [2024-06-10 16:12:25,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1307.88 | bwd_inner_microstep: 1307.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500 [2024-06-10 16:12:27,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1580.24 | bwd_inner_microstep: 1580.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3017 [2024-06-10 16:12:29,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.26 | bwd_microstep: 1227.79 | bwd_inner_microstep: 1227.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3982 [2024-06-10 16:12:31,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.60 | bwd_microstep: 1605.05 | bwd_inner_microstep: 1605.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 16:12:33,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.30 | bwd_microstep: 1371.12 | bwd_inner_microstep: 1371.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2118 [2024-06-10 16:12:34,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.38 | bwd_microstep: 826.98 | bwd_inner_microstep: 826.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645 [2024-06-10 16:12:36,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642 [2024-06-10 16:12:38,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1417.42 | bwd_inner_microstep: 1417.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3636 [2024-06-10 16:12:40,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1248.70 | bwd_inner_microstep: 1248.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 16:12:41,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.72 | bwd_microstep: 697.70 | bwd_inner_microstep: 697.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 16:12:42,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.95 | bwd_microstep: 800.45 | bwd_inner_microstep: 800.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 16:12:44,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1402.21 | bwd_inner_microstep: 1402.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-10 16:12:46,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.46 | bwd_microstep: 1585.89 | bwd_inner_microstep: 1585.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-10 16:12:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1524.25 | bwd_inner_microstep: 1524.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 16:12:50,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.57 | bwd_microstep: 1422.93 | bwd_inner_microstep: 1422.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 16:12:52,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1278.45 | bwd_inner_microstep: 1278.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 16:12:54,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1457.03 | bwd_inner_microstep: 1457.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865 [2024-06-10 16:12:56,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.19 | bwd_microstep: 1667.71 | bwd_inner_microstep: 1667.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-10 16:12:58,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.33 | bwd_microstep: 1431.19 | bwd_inner_microstep: 1431.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-10 16:13:00,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1633.57 | bwd_inner_microstep: 1633.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 16:13:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276 [2024-06-10 16:13:03,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.80 | bwd_microstep: 1070.52 | bwd_inner_microstep: 1070.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2281 [2024-06-10 16:13:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.28 | optimizer_step: 6.59 [2024-06-10 16:13:09,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.76 | bwd_microstep: 5320.04 | bwd_inner_microstep: 1137.93 | bwd_allreduce_microstep: 4182.05 | step_microstep: 39.09 [2024-06-10 16:13:09,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15315.05 | bwd: 45258.04 | bwd_inner: 41075.07 | bwd_allreduce: 4182.28 | step: 40.66 {'loss': 1.2451, 'learning_rate': 1.947458498112732e-05, 'epoch': 0.52} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-10 16:13:11,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1397.13 | bwd_inner_microstep: 1397.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 16:13:12,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.86 | bwd_microstep: 1240.00 | bwd_inner_microstep: 1239.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4311 [2024-06-10 16:13:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.77 | bwd_microstep: 1680.94 | bwd_inner_microstep: 1680.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866 [2024-06-10 16:13:17,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.50 | bwd_microstep: 1661.94 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3701 [2024-06-10 16:13:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.34 | bwd_microstep: 1624.01 | bwd_inner_microstep: 1623.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 16:13:21,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.19 | bwd_microstep: 1239.41 | bwd_inner_microstep: 1239.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3512 [2024-06-10 16:13:23,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1248.78 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 16:13:25,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1483.02 | bwd_inner_microstep: 1482.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2461 [2024-06-10 16:13:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.26 | bwd_microstep: 949.48 | bwd_inner_microstep: 949.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4018 [2024-06-10 16:13:28,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.19 | bwd_microstep: 1617.01 | bwd_inner_microstep: 1616.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 16:13:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1524.76 | bwd_inner_microstep: 1524.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496 [2024-06-10 16:13:32,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.17 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845 [2024-06-10 16:13:34,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.45 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 16:13:36,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1347.00 | bwd_inner_microstep: 1346.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708 [2024-06-10 16:13:39,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.77 | bwd_microstep: 1629.03 | bwd_inner_microstep: 1629.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460 [2024-06-10 16:13:40,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.04 | bwd_microstep: 1211.79 | bwd_inner_microstep: 1211.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 16:13:42,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1556.43 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 903 [2024-06-10 16:13:43,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.42 | bwd_microstep: 372.11 | bwd_inner_microstep: 372.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 16:13:45,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.05 | bwd_microstep: 1355.62 | bwd_inner_microstep: 1355.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463 [2024-06-10 16:13:47,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1405.24 | bwd_inner_microstep: 1405.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 16:13:49,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.03 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-10 16:13:51,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1428.58 | bwd_inner_microstep: 1428.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3857 [2024-06-10 16:13:53,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.19 | bwd_microstep: 1762.56 | bwd_inner_microstep: 1762.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548 [2024-06-10 16:13:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1428.59 | bwd_inner_microstep: 1428.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830 [2024-06-10 16:13:57,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1355.98 | bwd_inner_microstep: 1355.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809 [2024-06-10 16:13:59,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.61 | bwd_microstep: 1750.84 | bwd_inner_microstep: 1750.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2112 [2024-06-10 16:14:01,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.39 | bwd_microstep: 921.39 | bwd_inner_microstep: 921.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-10 16:14:03,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1576.74 | bwd_inner_microstep: 1576.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567 [2024-06-10 16:14:05,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.50 | bwd_microstep: 1300.27 | bwd_inner_microstep: 1300.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 16:14:07,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 16:14:08,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1284.58 | bwd_inner_microstep: 1284.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581 [2024-06-10 16:14:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.63 [2024-06-10 16:14:10,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.66 | bwd_microstep: 1470.00 | bwd_inner_microstep: 1462.04 | bwd_allreduce_microstep: 7.90 | step_microstep: 39.23 [2024-06-10 16:14:10,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16693.75 | bwd: 44684.24 | bwd_inner: 44675.43 | bwd_allreduce: 8.13 | step: 40.70 {'loss': 1.2348, 'learning_rate': 1.9437064920309895e-05, 'epoch': 0.52} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486 [2024-06-10 16:14:12,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1310.70 | bwd_inner_microstep: 1310.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4087 [2024-06-10 16:14:14,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.04 | bwd_microstep: 1522.04 | bwd_inner_microstep: 1522.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 16:14:16,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1244.57 | bwd_inner_microstep: 1244.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 16:14:18,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.75 | bwd_microstep: 1649.82 | bwd_inner_microstep: 1649.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 16:14:21,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.90 | bwd_microstep: 1652.71 | bwd_inner_microstep: 1652.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415 [2024-06-10 16:14:22,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1312.66 | bwd_inner_microstep: 1312.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954 [2024-06-10 16:14:23,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.98 | bwd_microstep: 731.34 | bwd_inner_microstep: 731.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-10 16:14:24,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.75 | bwd_microstep: 682.32 | bwd_inner_microstep: 682.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 16:14:26,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1283.85 | bwd_inner_microstep: 1283.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3621 [2024-06-10 16:14:28,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1538.46 | bwd_inner_microstep: 1538.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3683 [2024-06-10 16:14:30,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 16:14:32,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1347.82 | bwd_inner_microstep: 1347.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 16:14:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641 [2024-06-10 16:14:36,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1607.34 | bwd_inner_microstep: 1607.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 16:14:38,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1374.42 | bwd_inner_microstep: 1374.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3678 [2024-06-10 16:14:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.29 | bwd_microstep: 1554.49 | bwd_inner_microstep: 1554.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2103 [2024-06-10 16:14:41,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.56 | bwd_microstep: 856.37 | bwd_inner_microstep: 856.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 16:14:43,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-10 16:14:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.20 | bwd_microstep: 802.45 | bwd_inner_microstep: 802.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3833 [2024-06-10 16:14:46,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1607.33 | bwd_inner_microstep: 1607.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3677 [2024-06-10 16:14:48,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.62 | bwd_microstep: 1458.95 | bwd_inner_microstep: 1458.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 16:14:50,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.95 | bwd_microstep: 1294.35 | bwd_inner_microstep: 1294.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866 [2024-06-10 16:14:52,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.32 | bwd_microstep: 1567.25 | bwd_inner_microstep: 1567.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 16:14:54,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 16:14:56,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.55 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-10 16:14:58,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.59 | bwd_microstep: 1544.97 | bwd_inner_microstep: 1544.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2233 [2024-06-10 16:15:00,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.05 | bwd_microstep: 925.77 | bwd_inner_microstep: 925.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-10 16:15:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.40 | bwd_microstep: 1751.85 | bwd_inner_microstep: 1751.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 16:15:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.64 | bwd_microstep: 1500.70 | bwd_inner_microstep: 1500.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3637 [2024-06-10 16:15:06,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1579.82 | bwd_inner_microstep: 1579.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-10 16:15:08,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.30 | bwd_microstep: 1649.08 | bwd_inner_microstep: 1649.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400 [2024-06-10 16:15:12,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.26 | optimizer_step: 6.62 [2024-06-10 16:15:12,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.64 | bwd_microstep: 3197.56 | bwd_inner_microstep: 1635.46 | bwd_allreduce_microstep: 1562.04 | step_microstep: 38.41 [2024-06-10 16:15:12,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16225.75 | bwd: 45190.28 | bwd_inner: 43627.32 | bwd_allreduce: 1562.28 | step: 39.92 {'loss': 1.2312, 'learning_rate': 1.93995468421469e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 16:15:14,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1368.80 | bwd_inner_microstep: 1368.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 16:15:16,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1348.06 | bwd_inner_microstep: 1348.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 16:15:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.68 | bwd_microstep: 1472.17 | bwd_inner_microstep: 1472.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-10 16:15:19,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.09 | bwd_microstep: 809.32 | bwd_inner_microstep: 809.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 16:15:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.99 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 16:15:23,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.56 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592 [2024-06-10 16:15:25,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1306.94 | bwd_inner_microstep: 1306.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 16:15:26,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 16:15:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.09 | bwd_microstep: 1388.16 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 16:15:30,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1344.43 | bwd_inner_microstep: 1344.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3985 [2024-06-10 16:15:32,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.22 | bwd_microstep: 1603.91 | bwd_inner_microstep: 1603.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928 [2024-06-10 16:15:33,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.28 | bwd_microstep: 789.19 | bwd_inner_microstep: 789.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-10 16:15:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385 [2024-06-10 16:15:37,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3622 [2024-06-10 16:15:39,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.22 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3860 [2024-06-10 16:15:42,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.46 | bwd_microstep: 1593.04 | bwd_inner_microstep: 1593.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 16:15:44,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1403.09 | bwd_inner_microstep: 1403.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 16:15:45,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.64 | bwd_microstep: 1322.41 | bwd_inner_microstep: 1322.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 16:15:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1655.87 | bwd_inner_microstep: 1655.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 16:15:50,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 1555.99 | bwd_inner_microstep: 1555.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 16:15:52,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1453.58 | bwd_inner_microstep: 1453.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 16:15:54,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1397.93 | bwd_inner_microstep: 1397.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 16:15:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.52 | bwd_microstep: 1555.60 | bwd_inner_microstep: 1555.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 16:15:58,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1359.17 | bwd_inner_microstep: 1359.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1927 [2024-06-10 16:15:59,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.15 | bwd_microstep: 806.78 | bwd_inner_microstep: 806.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2056 [2024-06-10 16:16:00,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.23 | bwd_microstep: 860.58 | bwd_inner_microstep: 860.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 16:16:02,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1301.18 | bwd_inner_microstep: 1301.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 16:16:04,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.51 | bwd_microstep: 1190.13 | bwd_inner_microstep: 1190.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 16:16:06,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.65 | bwd_microstep: 1470.97 | bwd_inner_microstep: 1470.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766 [2024-06-10 16:16:08,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.57 | bwd_microstep: 1445.53 | bwd_inner_microstep: 1445.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3543 [2024-06-10 16:16:10,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3557 [2024-06-10 16:16:14,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 16:16:14,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.53 | bwd_microstep: 3269.79 | bwd_inner_microstep: 1741.60 | bwd_allreduce_microstep: 1528.15 | step_microstep: 38.43 [2024-06-10 16:16:14,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16155.20 | bwd: 44812.73 | bwd_inner: 43283.69 | bwd_allreduce: 1528.37 | step: 39.90 {'loss': 1.2117, 'learning_rate': 1.936203087877681e-05, 'epoch': 0.52} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 16:16:15,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1363.19 | bwd_inner_microstep: 1363.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:16:17,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1375.97 | bwd_inner_microstep: 1375.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 16:16:19,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.32 | bwd_microstep: 1340.35 | bwd_inner_microstep: 1340.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3417 [2024-06-10 16:16:21,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.91 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 16:16:23,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1246.43 | bwd_inner_microstep: 1246.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3776 [2024-06-10 16:16:25,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.31 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 16:16:26,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 16:16:28,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1245.55 | bwd_inner_microstep: 1245.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-10 16:16:30,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.53 | bwd_microstep: 1185.99 | bwd_inner_microstep: 1185.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924 [2024-06-10 16:16:31,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.27 | bwd_microstep: 848.52 | bwd_inner_microstep: 848.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 16:16:33,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3470 [2024-06-10 16:16:35,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1531.62 | bwd_inner_microstep: 1531.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939 [2024-06-10 16:16:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1684.62 | bwd_inner_microstep: 1684.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483 [2024-06-10 16:16:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1431.23 | bwd_inner_microstep: 1431.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 16:16:41,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1251.09 | bwd_inner_microstep: 1251.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 16:16:43,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1392.22 | bwd_inner_microstep: 1392.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 16:16:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.41 | bwd_microstep: 1178.72 | bwd_inner_microstep: 1178.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 16:16:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292 [2024-06-10 16:16:48,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 973.36 | bwd_inner_microstep: 973.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-10 16:16:50,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1464.34 | bwd_inner_microstep: 1464.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 16:16:52,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743 [2024-06-10 16:16:54,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1440.49 | bwd_inner_microstep: 1440.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 16:16:56,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1395.78 | bwd_inner_microstep: 1395.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547 [2024-06-10 16:16:58,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1520.58 | bwd_inner_microstep: 1520.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 16:17:00,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1397.96 | bwd_inner_microstep: 1397.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-10 16:17:02,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.95 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930 [2024-06-10 16:17:03,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.63 | bwd_microstep: 763.15 | bwd_inner_microstep: 763.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193 [2024-06-10 16:17:04,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.19 | bwd_microstep: 955.61 | bwd_inner_microstep: 955.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445 [2024-06-10 16:17:06,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1376.73 | bwd_inner_microstep: 1376.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801 [2024-06-10 16:17:09,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.05 | bwd_microstep: 1748.14 | bwd_inner_microstep: 1748.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 16:17:11,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.81 | bwd_microstep: 1456.31 | bwd_inner_microstep: 1456.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3776 [2024-06-10 16:17:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.63 [2024-06-10 16:17:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.02 | bwd_microstep: 2751.10 | bwd_inner_microstep: 1784.49 | bwd_allreduce_microstep: 966.55 | step_microstep: 37.48 [2024-06-10 16:17:14,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16128.62 | bwd: 44150.84 | bwd_inner: 43183.38 | bwd_allreduce: 966.78 | step: 38.99 /it] 52%|█████▏ | 902/1726 [15:34:45<14:04:26, 61.49s/it] 52%|█████▏ | 902/1726 [15:34:45<14:04:26, 61.49s/it] 52%|█████▏ | 903/1726 [15:35:45<14:01:00, 61.31s/it] 52%|█████▏ | 903/1726 [15:35:45<14:01:00, 61.31s/it] 52%|█████▏ | 904/1726 [15:36:47<14:01:40, 61.44s/it] 52%|█████▏ | 904/1726 [15:36:47<14:01:40, 61.44s/it] 52%|█████▏ | 905/1726 [15:37:49<14:01:58, 61.53s/it] 52%|█████▏ | 905/1726 [15:37:49<14:01:58, 61.53s/it] 52%|█████▏ | 906/1726 [15:38:50<14:00:00, 61.46s/it] 52%|█████▏ | 906/1726 [15:38:50<14:00:00, 61.46s/it] 53%|█████▎ | 907/1726 [{'loss': 1.2343, 'learning_rate': 1.932451716233064e-05, 'epoch': 0.53} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 16:17:16,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1310.05 | bwd_inner_microstep: 1310.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 16:17:18,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.37 | bwd_microstep: 1377.77 | bwd_inner_microstep: 1377.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899 [2024-06-10 16:17:20,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.54 | bwd_microstep: 1482.06 | bwd_inner_microstep: 1482.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2016 [2024-06-10 16:17:21,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.86 | bwd_microstep: 800.08 | bwd_inner_microstep: 800.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 16:17:23,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.82 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3505 [2024-06-10 16:17:25,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1249.62 | bwd_inner_microstep: 1249.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 16:17:26,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.82 | bwd_microstep: 1276.64 | bwd_inner_microstep: 1276.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706 [2024-06-10 16:17:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.29 | bwd_microstep: 1421.01 | bwd_inner_microstep: 1420.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 16:17:30,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.41 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711 [2024-06-10 16:17:33,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1627.59 | bwd_inner_microstep: 1627.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 16:17:35,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.28 | bwd_microstep: 1487.71 | bwd_inner_microstep: 1487.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518 [2024-06-10 16:17:36,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.89 | bwd_microstep: 1227.48 | bwd_inner_microstep: 1227.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1930 [2024-06-10 16:17:37,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.24 | bwd_microstep: 761.12 | bwd_inner_microstep: 761.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 16:17:39,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.34 | bwd_microstep: 1446.53 | bwd_inner_microstep: 1446.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 16:17:42,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.40 | bwd_microstep: 1717.05 | bwd_inner_microstep: 1717.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3501 [2024-06-10 16:17:44,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.55 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 16:17:46,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.41 | bwd_microstep: 1299.81 | bwd_inner_microstep: 1299.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 16:17:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1390.21 | bwd_inner_microstep: 1390.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 16:17:49,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.66 | bwd_microstep: 801.01 | bwd_inner_microstep: 800.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 16:17:51,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 16:17:52,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.46 | bwd_microstep: 796.52 | bwd_inner_microstep: 796.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 16:17:54,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1485.88 | bwd_inner_microstep: 1485.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920 [2024-06-10 16:17:55,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.88 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 16:17:57,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1380.54 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 16:17:59,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.13 | bwd_inner_microstep: 1412.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3552 [2024-06-10 16:18:01,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.63 | bwd_microstep: 1526.05 | bwd_inner_microstep: 1526.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 16:18:03,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.29 | bwd_microstep: 1257.13 | bwd_inner_microstep: 1257.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3757 [2024-06-10 16:18:05,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.19 | bwd_microstep: 1738.70 | bwd_inner_microstep: 1738.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-10 16:18:07,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1597.79 | bwd_inner_microstep: 1597.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 16:18:09,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.95 | bwd_microstep: 1548.51 | bwd_inner_microstep: 1548.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-10 16:18:11,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1357.26 | bwd_inner_microstep: 1357.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3572 [2024-06-10 16:18:15,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.62 [2024-06-10 16:18:15,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.74 | bwd_microstep: 3263.75 | bwd_inner_microstep: 1878.84 | bwd_allreduce_microstep: 1384.86 | step_microstep: 39.40 [2024-06-10 16:18:15,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.88 | bwd: 44582.07 | bwd_inner: 43196.31 | bwd_allreduce: 1385.08 | step: 40.84 {'loss': 1.2875, 'learning_rate': 1.9287005824931514e-05, 'epoch': 0.53} dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878 [2024-06-10 16:18:16,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.00 | bwd_microstep: 738.12 | bwd_inner_microstep: 737.98 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 16:18:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.19 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 16:18:20,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.07 | bwd_inner_microstep: 1283.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868 [2024-06-10 16:18:22,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.72 | bwd_microstep: 1663.85 | bwd_inner_microstep: 1663.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 16:18:24,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1482.19 | bwd_inner_microstep: 1482.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-10 16:18:26,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.04 | bwd_microstep: 1531.87 | bwd_inner_microstep: 1531.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3738 [2024-06-10 16:18:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1633.73 | bwd_inner_microstep: 1633.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047 [2024-06-10 16:18:30,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.28 | bwd_microstep: 719.19 | bwd_inner_microstep: 719.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1890 [2024-06-10 16:18:30,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 683.07 | bwd_inner_microstep: 683.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925 [2024-06-10 16:18:32,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.55 | bwd_microstep: 817.98 | bwd_inner_microstep: 817.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2397 [2024-06-10 16:18:33,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.55 | bwd_microstep: 958.40 | bwd_inner_microstep: 958.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 16:18:35,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.35 | bwd_microstep: 1617.43 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629 [2024-06-10 16:18:38,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.44 | bwd_microstep: 1706.16 | bwd_inner_microstep: 1706.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3639 [2024-06-10 16:18:40,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.57 | bwd_microstep: 1709.94 | bwd_inner_microstep: 1709.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478 [2024-06-10 16:18:41,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.55 | bwd_microstep: 1186.25 | bwd_inner_microstep: 1186.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 16:18:43,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1316.69 | bwd_inner_microstep: 1316.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 16:18:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.55 | bwd_microstep: 1486.79 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 16:18:47,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1276.61 | bwd_inner_microstep: 1276.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-10 16:18:48,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.13 | bwd_microstep: 698.75 | bwd_inner_microstep: 698.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 16:18:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.60 | bwd_microstep: 1551.80 | bwd_inner_microstep: 1551.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 16:18:51,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 799.13 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3608 [2024-06-10 16:18:53,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1461.60 | bwd_inner_microstep: 1461.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 16:18:55,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.10 | bwd_microstep: 1499.16 | bwd_inner_microstep: 1499.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 16:18:58,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1497.35 | bwd_inner_microstep: 1497.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506 [2024-06-10 16:19:00,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1549.35 | bwd_inner_microstep: 1549.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 16:19:02,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.85 | bwd_microstep: 1752.25 | bwd_inner_microstep: 1752.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3589 [2024-06-10 16:19:04,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.36 | bwd_microstep: 1671.29 | bwd_inner_microstep: 1671.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1936 [2024-06-10 16:19:05,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.10 | bwd_microstep: 758.95 | bwd_inner_microstep: 758.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3860 [2024-06-10 16:19:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.39 | bwd_microstep: 1594.47 | bwd_inner_microstep: 1594.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-10 16:19:10,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1455.29 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 16:19:12,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.83 | bwd_microstep: 1544.39 | bwd_inner_microstep: 1544.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3416 [2024-06-10 16:19:18,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-10 16:19:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 5656.98 | bwd_inner_microstep: 1562.56 | bwd_allreduce_microstep: 4094.37 | step_microstep: 38.42 [2024-06-10 16:19:18,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15797.60 | bwd: 46692.45 | bwd_inner: 42597.08 | bwd_allreduce: 4094.64 | step: 39.88 {'loss': 1.24, 'learning_rate': 1.9249496998694168e-05, 'epoch': 0.53} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 16:19:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1366.89 | bwd_inner_microstep: 1366.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402 [2024-06-10 16:19:22,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 1447.50 | bwd_inner_microstep: 1447.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3479 [2024-06-10 16:19:24,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.43 | bwd_microstep: 1442.49 | bwd_inner_microstep: 1442.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-10 16:19:26,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.29 | bwd_microstep: 1276.00 | bwd_inner_microstep: 1275.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 16:19:27,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.18 | bwd_microstep: 1248.03 | bwd_inner_microstep: 1248.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 16:19:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.70 | bwd_microstep: 698.25 | bwd_inner_microstep: 698.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 16:19:30,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.46 | bwd_microstep: 1451.29 | bwd_inner_microstep: 1451.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 16:19:32,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.27 | bwd_microstep: 1245.09 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 16:19:34,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1384.39 | bwd_inner_microstep: 1384.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 16:19:36,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.52 | bwd_inner_microstep: 1254.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 16:19:38,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1486.02 | bwd_inner_microstep: 1485.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573 [2024-06-10 16:19:40,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.19 | bwd_microstep: 1432.46 | bwd_inner_microstep: 1432.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 16:19:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1483.26 | bwd_inner_microstep: 1483.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 16:19:44,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.27 | bwd_microstep: 1335.62 | bwd_inner_microstep: 1335.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108 [2024-06-10 16:19:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.02 | bwd_microstep: 817.78 | bwd_inner_microstep: 817.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3518 [2024-06-10 16:19:47,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.50 | bwd_microstep: 1334.45 | bwd_inner_microstep: 1334.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 16:19:48,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1254.95 | bwd_inner_microstep: 1254.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 16:19:50,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1395.79 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-10 16:19:52,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.81 | bwd_microstep: 1530.13 | bwd_inner_microstep: 1530.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462 [2024-06-10 16:19:54,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.13 | bwd_microstep: 1180.68 | bwd_inner_microstep: 1180.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621 [2024-06-10 16:19:56,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1312.29 | bwd_inner_microstep: 1312.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 16:19:58,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1398.42 | bwd_inner_microstep: 1398.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 16:20:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 1397.14 | bwd_inner_microstep: 1397.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 16:20:02,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1385.29 | bwd_inner_microstep: 1385.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2179 [2024-06-10 16:20:03,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.99 | bwd_microstep: 917.68 | bwd_inner_microstep: 917.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717 [2024-06-10 16:20:05,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.80 | bwd_microstep: 1343.82 | bwd_inner_microstep: 1343.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-10 16:20:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.82 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 16:20:09,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.45 | bwd_microstep: 1444.45 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 16:20:11,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1407.57 | bwd_inner_microstep: 1407.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 16:20:13,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.69 | bwd_microstep: 1645.85 | bwd_inner_microstep: 1645.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588 [2024-06-10 16:20:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.87 | bwd_microstep: 1702.45 | bwd_inner_microstep: 1702.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 16:20:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 16:20:20,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 4283.37 | bwd_inner_microstep: 1817.99 | bwd_allreduce_microstep: 2465.33 | step_microstep: 37.94 [2024-06-10 16:20:20,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.46 | bwd: 45906.26 | bwd_inner: 43440.02 | bwd_allreduce: 2465.56 | step: 39.36 {'loss': 1.2881, 'learning_rate': 1.9211990815724496e-05, 'epoch': 0.53} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3560 [2024-06-10 16:20:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.79 | bwd_microstep: 1513.73 | bwd_inner_microstep: 1513.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3396 [2024-06-10 16:20:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.36 | bwd_microstep: 1144.48 | bwd_inner_microstep: 1144.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4343 [2024-06-10 16:20:26,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.08 | bwd_microstep: 1698.65 | bwd_inner_microstep: 1698.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 16:20:28,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.14 | bwd_microstep: 1478.20 | bwd_inner_microstep: 1478.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 16:20:30,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.26 | bwd_microstep: 1186.27 | bwd_inner_microstep: 1186.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 16:20:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.23 | bwd_microstep: 790.91 | bwd_inner_microstep: 790.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 16:20:33,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.16 | bwd_microstep: 1291.84 | bwd_inner_microstep: 1291.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 16:20:35,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.98 | bwd_microstep: 1529.53 | bwd_inner_microstep: 1529.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 16:20:37,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.95 | bwd_microstep: 1525.54 | bwd_inner_microstep: 1525.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3627 [2024-06-10 16:20:39,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.97 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669 [2024-06-10 16:20:41,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.97 | bwd_microstep: 1482.52 | bwd_inner_microstep: 1482.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-10 16:20:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.15 | bwd_microstep: 1520.88 | bwd_inner_microstep: 1520.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3665 [2024-06-10 16:20:45,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1451.10 | bwd_inner_microstep: 1451.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3960 [2024-06-10 16:20:47,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.36 | bwd_microstep: 1793.30 | bwd_inner_microstep: 1793.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 16:20:50,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.45 | bwd_microstep: 1615.68 | bwd_inner_microstep: 1615.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3717 [2024-06-10 16:20:52,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.38 | bwd_microstep: 1477.27 | bwd_inner_microstep: 1477.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-10 16:20:54,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1507.09 | bwd_inner_microstep: 1507.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-10 16:20:56,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1401.21 | bwd_inner_microstep: 1401.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-10 16:20:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1337.18 | bwd_inner_microstep: 1337.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130 [2024-06-10 16:20:59,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.50 | bwd_microstep: 832.13 | bwd_inner_microstep: 832.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 16:21:00,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.46 | bwd_microstep: 1188.51 | bwd_inner_microstep: 1188.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727 [2024-06-10 16:21:02,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1433.45 | bwd_inner_microstep: 1433.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689 [2024-06-10 16:21:04,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1431.58 | bwd_inner_microstep: 1431.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020 [2024-06-10 16:21:05,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.78 | bwd_microstep: 806.75 | bwd_inner_microstep: 806.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 16:21:07,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1301.57 | bwd_inner_microstep: 1301.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293 [2024-06-10 16:21:09,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.25 | bwd_microstep: 975.58 | bwd_inner_microstep: 975.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3573 [2024-06-10 16:21:10,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.03 | bwd_microstep: 1334.02 | bwd_inner_microstep: 1333.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-10 16:21:13,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.25 | bwd_microstep: 1600.44 | bwd_inner_microstep: 1600.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-10 16:21:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1419.22 | bwd_inner_microstep: 1419.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 16:21:17,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2448 [2024-06-10 16:21:18,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.09 | bwd_microstep: 853.77 | bwd_inner_microstep: 853.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 16:21:22,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.33 | optimizer_step: 6.58 [2024-06-10 16:21:22,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.63 | bwd_microstep: 4072.96 | bwd_inner_microstep: 1106.98 | bwd_allreduce_microstep: 2965.91 | step_microstep: 38.64 [2024-06-10 16:21:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16011.80 | bwd: 45801.30 | bwd_inner: 42834.48 | bwd_allreduce: 2966.15 | step: 40.15 {'loss': 1.2445, 'learning_rate': 1.9174487408119067e-05, 'epoch': 0.53} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 16:21:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.30 | bwd_microstep: 1237.35 | bwd_inner_microstep: 1237.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4299 [2024-06-10 16:21:26,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.60 | bwd_microstep: 1675.27 | bwd_inner_microstep: 1675.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 16:21:29,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.86 | bwd_microstep: 1478.60 | bwd_inner_microstep: 1478.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-10 16:21:30,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.62 | bwd_microstep: 1216.24 | bwd_inner_microstep: 1216.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 16:21:32,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.85 | bwd_microstep: 1350.80 | bwd_inner_microstep: 1350.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 16:21:34,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1246.00 | bwd_inner_microstep: 1245.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480 [2024-06-10 16:21:35,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.32 | bwd_microstep: 1184.08 | bwd_inner_microstep: 1184.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 16:21:37,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1251.86 | bwd_inner_microstep: 1251.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 16:21:39,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.27 | bwd_microstep: 1659.81 | bwd_inner_microstep: 1659.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 16:21:41,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.03 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3450 [2024-06-10 16:21:43,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.55 | bwd_microstep: 1334.27 | bwd_inner_microstep: 1334.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846 [2024-06-10 16:21:44,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.12 | bwd_microstep: 670.81 | bwd_inner_microstep: 670.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 16:21:46,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3505 [2024-06-10 16:21:48,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1349.88 | bwd_inner_microstep: 1349.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092 [2024-06-10 16:21:49,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.29 | bwd_microstep: 919.60 | bwd_inner_microstep: 919.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-10 16:21:51,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1434.69 | bwd_inner_microstep: 1434.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 16:21:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1287.52 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627 [2024-06-10 16:21:55,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1313.58 | bwd_inner_microstep: 1313.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 16:21:56,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 16:21:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 16:22:00,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.33 | bwd_microstep: 972.31 | bwd_inner_microstep: 972.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 16:22:01,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1394.54 | bwd_inner_microstep: 1394.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-10 16:22:03,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.51 | bwd_microstep: 972.77 | bwd_inner_microstep: 972.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 16:22:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1564.66 | bwd_inner_microstep: 1564.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 16:22:07,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.74 | bwd_microstep: 1298.94 | bwd_inner_microstep: 1298.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3635 [2024-06-10 16:22:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.01 | bwd_microstep: 1346.14 | bwd_inner_microstep: 1346.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4067 [2024-06-10 16:22:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.69 | bwd_microstep: 1527.65 | bwd_inner_microstep: 1527.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 16:22:13,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1349.44 | bwd_inner_microstep: 1349.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 16:22:14,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.48 | bwd_microstep: 1298.50 | bwd_inner_microstep: 1298.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3587 [2024-06-10 16:22:16,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.61 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3467 [2024-06-10 16:22:18,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1330.57 | bwd_inner_microstep: 1330.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3582 [2024-06-10 16:22:24,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 16:22:24,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.65 | bwd_microstep: 5227.69 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 3579.05 | step_microstep: 38.29 [2024-06-10 16:22:24,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15694.69 | bwd: 45478.68 | bwd_inner: 41898.72 | bwd_allreduce: 3579.28 | step: 39.80 15:39:51<13:55:28, 61.21s/it] 53%|█████▎ | 907/1726 [15:39:51<13:55:28, 61.21s/it] 53%|█████▎ | 908/1726 [15:40:52<13:53:30, 61.14s/it] 53%|█████▎ | 908/1726 [15:40:52<13:53:30, 61.14s/it] 53%|█████▎ | 909/1726 [15:41:55<13:59:24, 61.65s/it] 53%|█████▎ | 909/1726 [15:41:55<13:59:24, 61.65s/it] 53%|█████▎ | 910/1726 [15:42:57<14:01:17, 61.86s/it] 53%|█████▎ | 910/1726 [15:42:57<14:01:17, 61.86s/it] 53%|█████▎ | 911/1726 [15:43:59<14:01:25, 61.95s/it] 53%|█████▎ | 911/1726 [15:43:59<14:01:25, 61.95s/it] 53%|█████▎ | 912/1726 [15:45:01<13:58:34, 61.81s/it] {'loss': 1.2131, 'learning_rate': 1.9136986907964694e-05, 'epoch': 0.53} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 16:22:26,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.73 | bwd_microstep: 1235.20 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2409 [2024-06-10 16:22:27,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.06 | bwd_microstep: 998.16 | bwd_inner_microstep: 998.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463 [2024-06-10 16:22:29,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1403.70 | bwd_inner_microstep: 1403.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391 [2024-06-10 16:22:31,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.42 | bwd_microstep: 1144.39 | bwd_inner_microstep: 1144.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 16:22:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1254.96 | bwd_inner_microstep: 1254.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-10 16:22:34,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.92 | bwd_microstep: 1543.79 | bwd_inner_microstep: 1543.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-10 16:22:36,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.83 | bwd_microstep: 1179.20 | bwd_inner_microstep: 1179.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 16:22:38,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.62 | bwd_microstep: 1150.45 | bwd_inner_microstep: 1150.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431 [2024-06-10 16:22:39,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.30 | bwd_microstep: 1317.46 | bwd_inner_microstep: 1317.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 16:22:41,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1378.14 | bwd_inner_microstep: 1378.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416 [2024-06-10 16:22:43,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1211.41 | bwd_inner_microstep: 1211.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 16:22:44,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.35 | bwd_microstep: 889.24 | bwd_inner_microstep: 889.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 16:22:45,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3886 [2024-06-10 16:22:48,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.44 | bwd_microstep: 1849.68 | bwd_inner_microstep: 1849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 16:22:50,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 16:22:52,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.92 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 16:22:54,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2094 [2024-06-10 16:22:55,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.92 | bwd_microstep: 822.13 | bwd_inner_microstep: 822.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 16:22:57,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1297.33 | bwd_inner_microstep: 1297.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3669 [2024-06-10 16:22:59,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1376.59 | bwd_inner_microstep: 1376.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-10 16:23:01,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.82 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 591 [2024-06-10 16:23:01,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.31 | bwd_microstep: 256.74 | bwd_inner_microstep: 256.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 16:23:03,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.00 | bwd_microstep: 1491.53 | bwd_inner_microstep: 1491.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 16:23:05,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1529.37 | bwd_inner_microstep: 1529.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947 [2024-06-10 16:23:06,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.18 | bwd_microstep: 729.20 | bwd_inner_microstep: 729.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 16:23:08,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.91 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 16:23:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1469.39 | bwd_inner_microstep: 1469.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3555 [2024-06-10 16:23:13,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1591.49 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485 [2024-06-10 16:23:14,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.64 | bwd_microstep: 1335.69 | bwd_inner_microstep: 1335.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772 [2024-06-10 16:23:16,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.24 | bwd_microstep: 1436.40 | bwd_inner_microstep: 1436.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954 [2024-06-10 16:23:18,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.39 | bwd_microstep: 854.69 | bwd_inner_microstep: 854.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 16:23:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.60 [2024-06-10 16:23:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 6279.98 | bwd_inner_microstep: 1756.79 | bwd_allreduce_microstep: 4523.13 | step_microstep: 39.07 [2024-06-10 16:23:24,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.87 | bwd: 45075.39 | bwd_inner: 40551.22 | bwd_allreduce: 4523.44 | step: 40.77 {'loss': 1.2771, 'learning_rate': 1.9099489447337946e-05, 'epoch': 0.53} dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1415 [2024-06-10 16:23:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 205.33 | bwd_microstep: 524.68 | bwd_inner_microstep: 524.60 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2435 [2024-06-10 16:23:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.02 | bwd_microstep: 1007.56 | bwd_inner_microstep: 1007.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 16:23:29,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269 [2024-06-10 16:23:30,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.26 | bwd_microstep: 967.42 | bwd_inner_microstep: 967.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3506 [2024-06-10 16:23:32,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.56 | bwd_microstep: 1188.43 | bwd_inner_microstep: 1188.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 16:23:33,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1412.33 | bwd_inner_microstep: 1412.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-10 16:23:35,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1434.41 | bwd_inner_microstep: 1434.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3429 [2024-06-10 16:23:37,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.05 | bwd_microstep: 1215.08 | bwd_inner_microstep: 1215.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 16:23:39,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1286.33 | bwd_inner_microstep: 1286.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 16:23:41,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1248.22 | bwd_inner_microstep: 1248.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3759 [2024-06-10 16:23:43,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1471.05 | bwd_inner_microstep: 1471.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-10 16:23:45,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1421.86 | bwd_inner_microstep: 1421.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1985 [2024-06-10 16:23:46,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.22 | bwd_microstep: 860.77 | bwd_inner_microstep: 860.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-10 16:23:47,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1163.51 | bwd_inner_microstep: 1163.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 16:23:49,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1393.54 | bwd_inner_microstep: 1393.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057 [2024-06-10 16:23:51,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.79 | bwd_microstep: 849.23 | bwd_inner_microstep: 849.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2368 [2024-06-10 16:23:52,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.91 | bwd_microstep: 894.69 | bwd_inner_microstep: 894.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 16:23:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1341.81 | bwd_inner_microstep: 1341.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 16:23:56,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1487.35 | bwd_inner_microstep: 1487.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 16:23:58,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1404.52 | bwd_inner_microstep: 1404.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 16:24:00,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.41 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 16:24:02,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.49 | bwd_microstep: 1387.85 | bwd_inner_microstep: 1387.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:24:03,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1374.71 | bwd_inner_microstep: 1374.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 16:24:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1296.33 | bwd_inner_microstep: 1296.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3020 [2024-06-10 16:24:07,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.83 | bwd_microstep: 1235.36 | bwd_inner_microstep: 1235.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3573 [2024-06-10 16:24:09,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.65 | bwd_microstep: 1633.90 | bwd_inner_microstep: 1633.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020 [2024-06-10 16:24:10,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.14 | bwd_microstep: 803.85 | bwd_inner_microstep: 803.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565 [2024-06-10 16:24:12,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.56 | bwd_microstep: 1330.26 | bwd_inner_microstep: 1330.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 16:24:14,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1544.54 | bwd_inner_microstep: 1544.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2958 [2024-06-10 16:24:16,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.55 | bwd_microstep: 1199.97 | bwd_inner_microstep: 1199.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 16:24:18,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1476.90 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795 [2024-06-10 16:24:25,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 16:24:25,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 6850.52 | bwd_inner_microstep: 1750.96 | bwd_allreduce_microstep: 5099.51 | step_microstep: 37.99 [2024-06-10 16:24:25,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15132.25 | bwd: 45567.72 | bwd_inner: 40467.24 | bwd_allreduce: 5099.78 | step: 39.48 {'loss': 1.1726, 'learning_rate': 1.9061995158304682e-05, 'epoch': 0.53} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433 [2024-06-10 16:24:27,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.01 | bwd_microstep: 1299.94 | bwd_inner_microstep: 1299.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 16:24:29,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 16:24:31,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.09 | bwd_microstep: 1644.98 | bwd_inner_microstep: 1644.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 16:24:33,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.64 | bwd_microstep: 1242.93 | bwd_inner_microstep: 1242.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 16:24:35,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-10 16:24:37,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.84 | bwd_microstep: 1524.51 | bwd_inner_microstep: 1524.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 16:24:39,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1248.76 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 16:24:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 16:24:43,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1345.55 | bwd_inner_microstep: 1345.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675 [2024-06-10 16:24:45,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.60 | bwd_microstep: 1823.87 | bwd_inner_microstep: 1823.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 16:24:47,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489 [2024-06-10 16:24:49,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3677 [2024-06-10 16:24:52,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.24 | bwd_microstep: 1823.50 | bwd_inner_microstep: 1823.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3451 [2024-06-10 16:24:54,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.51 | bwd_microstep: 1333.48 | bwd_inner_microstep: 1333.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3609 [2024-06-10 16:24:55,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.60 | bwd_microstep: 1273.38 | bwd_inner_microstep: 1273.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642 [2024-06-10 16:24:57,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1420.59 | bwd_inner_microstep: 1420.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830 [2024-06-10 16:24:59,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.03 | bwd_microstep: 1358.10 | bwd_inner_microstep: 1358.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 16:25:01,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1183.81 | bwd_inner_microstep: 1183.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612 [2024-06-10 16:25:03,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1606.27 | bwd_inner_microstep: 1606.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 16:25:05,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1554.66 | bwd_inner_microstep: 1554.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3842 [2024-06-10 16:25:07,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1266.44 | bwd_inner_microstep: 1266.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-10 16:25:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.71 | bwd_microstep: 1582.47 | bwd_inner_microstep: 1582.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 16:25:11,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.22 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3929 [2024-06-10 16:25:13,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1497.22 | bwd_inner_microstep: 1497.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 16:25:14,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.68 | bwd_microstep: 804.04 | bwd_inner_microstep: 804.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 16:25:16,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.34 | bwd_microstep: 1185.87 | bwd_inner_microstep: 1185.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 16:25:18,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.35 | bwd_microstep: 1659.40 | bwd_inner_microstep: 1659.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2396 [2024-06-10 16:25:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.74 | bwd_microstep: 1081.97 | bwd_inner_microstep: 1081.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777 [2024-06-10 16:25:22,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.14 | bwd_microstep: 1639.91 | bwd_inner_microstep: 1639.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 16:25:24,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1308.34 | bwd_inner_microstep: 1308.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3742 [2024-06-10 16:25:26,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.71 | bwd_microstep: 1736.36 | bwd_inner_microstep: 1736.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3602 [2024-06-10 16:25:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.65 [2024-06-10 16:25:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1501.11 | bwd_inner_microstep: 1493.36 | bwd_allreduce_microstep: 7.70 | step_microstep: 37.64 [2024-06-10 16:25:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16936.66 | bwd: 45492.71 | bwd_inner: 45484.10 | bwd_allreduce: 7.92 | step: 39.10 {'loss': 1.181, 'learning_rate': 1.9024504172919606e-05, 'epoch': 0.53} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 16:25:30,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1481.10 | bwd_inner_microstep: 1481.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 16:25:32,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1376.70 | bwd_inner_microstep: 1376.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-10 16:25:34,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1553.60 | bwd_inner_microstep: 1553.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 16:25:36,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1448.91 | bwd_inner_microstep: 1448.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 16:25:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.91 | bwd_microstep: 1644.97 | bwd_inner_microstep: 1644.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 16:25:41,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.57 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 16:25:42,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.97 | bwd_microstep: 1150.37 | bwd_inner_microstep: 1150.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 16:25:43,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.55 | bwd_microstep: 790.60 | bwd_inner_microstep: 790.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3500 [2024-06-10 16:25:45,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1350.55 | bwd_inner_microstep: 1350.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948 [2024-06-10 16:25:46,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.86 | bwd_microstep: 824.99 | bwd_inner_microstep: 824.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3681 [2024-06-10 16:25:48,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.73 | bwd_microstep: 1446.09 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 16:25:50,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504 [2024-06-10 16:25:52,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.96 | bwd_microstep: 1442.68 | bwd_inner_microstep: 1442.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974 [2024-06-10 16:25:53,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 888.73 | bwd_inner_microstep: 888.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3650 [2024-06-10 16:25:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.55 | bwd_microstep: 1615.73 | bwd_inner_microstep: 1615.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638 [2024-06-10 16:25:58,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.31 | bwd_microstep: 1572.41 | bwd_inner_microstep: 1572.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 16:26:00,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1386.39 | bwd_inner_microstep: 1386.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3445 [2024-06-10 16:26:01,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.66 | bwd_microstep: 1185.57 | bwd_inner_microstep: 1185.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 16:26:04,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.72 | bwd_microstep: 1544.83 | bwd_inner_microstep: 1544.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 16:26:06,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1455.06 | bwd_inner_microstep: 1455.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 16:26:08,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.72 | bwd_microstep: 1458.81 | bwd_inner_microstep: 1458.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3628 [2024-06-10 16:26:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.49 | bwd_microstep: 1217.33 | bwd_inner_microstep: 1217.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 16:26:11,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.09 | bwd_microstep: 1186.60 | bwd_inner_microstep: 1186.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 16:26:13,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1277.65 | bwd_inner_microstep: 1277.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 16:26:14,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1256.50 | bwd_inner_microstep: 1256.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 16:26:16,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1455.58 | bwd_inner_microstep: 1455.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553 [2024-06-10 16:26:18,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.51 | bwd_microstep: 1328.45 | bwd_inner_microstep: 1328.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066 [2024-06-10 16:26:20,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.37 | bwd_microstep: 915.62 | bwd_inner_microstep: 915.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 16:26:22,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.96 | bwd_microstep: 1546.92 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3790 [2024-06-10 16:26:24,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.85 | bwd_microstep: 1583.47 | bwd_inner_microstep: 1583.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 16:26:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1590.36 | bwd_inner_microstep: 1590.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3765 [2024-06-10 16:26:31,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 16:26:31,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 4691.27 | bwd_inner_microstep: 1957.91 | bwd_allreduce_microstep: 2733.30 | step_microstep: 38.01 [2024-06-10 16:26:31,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16224.19 | bwd: 46531.70 | bwd_inner: 43797.49 | bwd_allreduce: 2733.53 | step: 39.48 {'loss': 1.2333, 'learning_rate': 1.8987016623225748e-05, 'epoch': 0.53} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 16:26:33,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.51 | bwd_microstep: 1335.01 | bwd_inner_microstep: 1334.85 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 870 [2024-06-10 16:26:34,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.11 | bwd_microstep: 363.83 | bwd_inner_microstep: 363.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-10 16:26:36,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1498.94 | bwd_inner_microstep: 1498.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3842 [2024-06-10 16:26:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1359.73 | bwd_inner_microstep: 1359.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 16:26:40,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1479.37 | bwd_inner_microstep: 1479.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 16:26:42,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.36 | bwd_microstep: 1649.63 | bwd_inner_microstep: 1649.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 16:26:44,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 16:26:46,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1244.27 | bwd_inner_microstep: 1244.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 16:26:48,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.15 | bwd_microstep: 1629.53 | bwd_inner_microstep: 1629.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3986 [2024-06-10 16:26:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.52 | bwd_microstep: 1633.89 | bwd_inner_microstep: 1633.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 16:26:52,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1346.87 | bwd_inner_microstep: 1346.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 16:26:54,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.27 | bwd_microstep: 1285.56 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3495 [2024-06-10 16:26:56,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.63 | bwd_microstep: 1435.24 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507 [2024-06-10 16:26:58,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.39 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676 [2024-06-10 16:27:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.37 | bwd_microstep: 1719.66 | bwd_inner_microstep: 1719.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 16:27:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 16:27:04,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1384.46 | bwd_inner_microstep: 1384.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638 [2024-06-10 16:27:06,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.21 | bwd_microstep: 1709.59 | bwd_inner_microstep: 1709.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2171 [2024-06-10 16:27:07,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.77 | bwd_microstep: 917.91 | bwd_inner_microstep: 917.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624 [2024-06-10 16:27:09,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1315.01 | bwd_inner_microstep: 1314.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 16:27:11,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1355.87 | bwd_inner_microstep: 1355.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 16:27:13,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.14 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 16:27:14,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.65 | bwd_microstep: 974.18 | bwd_inner_microstep: 974.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2454 [2024-06-10 16:27:16,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.74 | bwd_microstep: 950.41 | bwd_inner_microstep: 950.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-10 16:27:17,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1293.92 | bwd_inner_microstep: 1293.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561 [2024-06-10 16:27:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.49 | bwd_microstep: 1330.28 | bwd_inner_microstep: 1330.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 16:27:21,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1388.57 | bwd_inner_microstep: 1388.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-10 16:27:23,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.73 | bwd_microstep: 1534.81 | bwd_inner_microstep: 1534.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 16:27:26,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.46 | bwd_microstep: 1644.18 | bwd_inner_microstep: 1644.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467 [2024-06-10 16:27:28,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1568.11 | bwd_inner_microstep: 1568.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 16:27:30,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.28 | bwd_microstep: 1450.48 | bwd_inner_microstep: 1450.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805 [2024-06-10 16:27:36,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.03 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 16:27:36,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.78 | bwd_microstep: 5622.19 | bwd_inner_microstep: 1975.22 | bwd_allreduce_microstep: 3646.92 | step_microstep: 38.39 [2024-06-10 16:27:36,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16355.26 | bwd: 47880.74 | bwd_inner: 44232.80 | bwd_allreduce: 3647.21 | step: 39.94 {'loss': 1.234, 'learning_rate': 1.894953264125408e-05, 'epoch': 0.53} 53%|█████▎ | 912/1726 [15:45:01<13:58:34, 61.81s/it] 53%|█████▎ | 913/1726 [15:46:01<13:52:21, 61.43s/it] 53%|█████▎ | 913/1726 [15:46:01<13:52:21, 61.43s/it] 53%|█████▎ | 914/1726 [15:47:02<13:49:40, 61.31s/it] 53%|█████▎ | 914/1726 [15:47:02<13:49:40, 61.31s/it] 53%|█████▎ | 915/1726 [15:48:05<13:54:33, 61.74s/it] 53%|█████▎ | 915/1726 [15:48:05<13:54:33, 61.74s/it] 53%|█████▎ | 916/1726 [15:49:08<13:58:58, 62.15s/it] 53%|█████▎ | 916/1726 [15:49:08<13:58:58, 62.15s/it] 53%|█████▎ | 917/1726 [15:50:13<14:07:46, 62.88s/it] dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 16:27:38,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.95 | bwd_microstep: 1477.91 | bwd_inner_microstep: 1477.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 16:27:40,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1246.79 | bwd_inner_microstep: 1246.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3943 [2024-06-10 16:27:42,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1496.53 | bwd_inner_microstep: 1496.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 16:27:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.58 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906 [2024-06-10 16:27:45,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.03 | bwd_microstep: 776.84 | bwd_inner_microstep: 776.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 16:27:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1531.88 | bwd_inner_microstep: 1531.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3724 [2024-06-10 16:27:49,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.98 | bwd_microstep: 1733.29 | bwd_inner_microstep: 1733.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2529 [2024-06-10 16:27:51,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.43 | bwd_microstep: 1029.22 | bwd_inner_microstep: 1029.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478 [2024-06-10 16:27:52,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.60 | bwd_microstep: 1243.48 | bwd_inner_microstep: 1243.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2357 [2024-06-10 16:27:54,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.72 | bwd_microstep: 892.43 | bwd_inner_microstep: 892.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 16:27:56,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1393.33 | bwd_inner_microstep: 1393.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-10 16:27:57,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.94 | bwd_microstep: 1275.58 | bwd_inner_microstep: 1275.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516 [2024-06-10 16:27:59,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1581.62 | bwd_inner_microstep: 1581.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510 [2024-06-10 16:28:01,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512 [2024-06-10 16:28:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1575.97 | bwd_inner_microstep: 1575.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3839 [2024-06-10 16:28:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.48 | bwd_microstep: 1605.06 | bwd_inner_microstep: 1605.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 16:28:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.39 | bwd_microstep: 1391.34 | bwd_inner_microstep: 1391.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 16:28:10,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.15 | bwd_microstep: 1656.16 | bwd_inner_microstep: 1656.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 16:28:12,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1274.99 | bwd_inner_microstep: 1274.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-10 16:28:14,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.92 | bwd_microstep: 1447.97 | bwd_inner_microstep: 1447.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-10 16:28:16,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1324.07 | bwd_inner_microstep: 1324.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813 [2024-06-10 16:28:18,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.98 | bwd_microstep: 1691.33 | bwd_inner_microstep: 1691.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-10 16:28:20,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1427.78 | bwd_inner_microstep: 1427.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641 [2024-06-10 16:28:22,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1538.44 | bwd_inner_microstep: 1538.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 16:28:24,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1374.92 | bwd_inner_microstep: 1374.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513 [2024-06-10 16:28:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1439.55 | bwd_inner_microstep: 1439.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599 [2024-06-10 16:28:28,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.17 | bwd_microstep: 1704.42 | bwd_inner_microstep: 1704.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-10 16:28:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.63 | bwd_microstep: 1597.59 | bwd_inner_microstep: 1597.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3554 [2024-06-10 16:28:33,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.73 | bwd_microstep: 1591.18 | bwd_inner_microstep: 1591.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785 [2024-06-10 16:28:35,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.32 | bwd_microstep: 1613.88 | bwd_inner_microstep: 1613.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3384 [2024-06-10 16:28:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.57 | bwd_microstep: 1176.08 | bwd_inner_microstep: 1176.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 16:28:39,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-10 16:28:39,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.41 | bwd_microstep: 1631.88 | bwd_inner_microstep: 1624.17 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.68 [2024-06-10 16:28:39,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16913.08 | bwd: 45468.20 | bwd_inner: 45459.65 | bwd_allreduce: 7.88 | step: 39.12 {'loss': 1.2289, 'learning_rate': 1.8912052359022995e-05, 'epoch': 0.53} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 16:28:41,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.36 | bwd_microstep: 1447.57 | bwd_inner_microstep: 1447.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 16:28:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.21 | bwd_microstep: 1381.90 | bwd_inner_microstep: 1381.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846 [2024-06-10 16:28:45,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1462.97 | bwd_inner_microstep: 1462.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 16:28:47,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 16:28:49,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4061 [2024-06-10 16:28:51,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.62 | bwd_microstep: 1718.45 | bwd_inner_microstep: 1718.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 16:28:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.69 | bwd_microstep: 1280.39 | bwd_inner_microstep: 1280.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 16:28:55,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.82 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3698 [2024-06-10 16:28:57,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1389.00 | bwd_inner_microstep: 1388.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1903 [2024-06-10 16:28:58,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.57 | bwd_microstep: 745.66 | bwd_inner_microstep: 745.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3538 [2024-06-10 16:29:00,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1438.65 | bwd_inner_microstep: 1438.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1966 [2024-06-10 16:29:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.12 | bwd_microstep: 887.34 | bwd_inner_microstep: 887.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468 [2024-06-10 16:29:03,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.92 | bwd_microstep: 1572.08 | bwd_inner_microstep: 1572.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3421 [2024-06-10 16:29:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1491.58 | bwd_inner_microstep: 1491.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3634 [2024-06-10 16:29:07,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.14 | bwd_microstep: 1810.60 | bwd_inner_microstep: 1810.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3423 [2024-06-10 16:29:09,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 16:29:11,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.69 | bwd_microstep: 1505.92 | bwd_inner_microstep: 1505.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523 [2024-06-10 16:29:13,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1320.72 | bwd_inner_microstep: 1320.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:29:15,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1376.62 | bwd_inner_microstep: 1376.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 16:29:17,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1661.91 | bwd_inner_microstep: 1661.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 16:29:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 16:29:21,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1252.73 | bwd_inner_microstep: 1252.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 16:29:23,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.44 | bwd_microstep: 1559.02 | bwd_inner_microstep: 1558.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-10 16:29:25,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1414.85 | bwd_inner_microstep: 1414.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2010 [2024-06-10 16:29:26,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.15 | bwd_microstep: 773.00 | bwd_inner_microstep: 772.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-10 16:29:28,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.53 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 16:29:30,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1500.86 | bwd_inner_microstep: 1500.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3549 [2024-06-10 16:29:32,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 16:29:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.17 | bwd_microstep: 1539.25 | bwd_inner_microstep: 1539.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 16:29:36,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1551.84 | bwd_inner_microstep: 1551.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 16:29:38,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.75 | bwd_microstep: 1555.22 | bwd_inner_microstep: 1555.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579 [2024-06-10 16:29:41,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.14 | optimizer_step: 6.61 [2024-06-10 16:29:41,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 2002.04 | bwd_inner_microstep: 1611.36 | bwd_allreduce_microstep: 390.63 | step_microstep: 37.67 [2024-06-10 16:29:41,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16751.61 | bwd: 45351.91 | bwd_inner: 44960.38 | bwd_allreduce: 390.86 | step: 39.12 {'loss': 1.2081, 'learning_rate': 1.887457590853784e-05, 'epoch': 0.53} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1947 [2024-06-10 16:29:42,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.56 | bwd_microstep: 886.66 | bwd_inner_microstep: 886.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 16:29:44,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 16:29:46,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1270.50 | bwd_inner_microstep: 1270.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470 [2024-06-10 16:29:48,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1337.66 | bwd_inner_microstep: 1337.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 16:29:50,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 16:29:51,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1340.77 | bwd_inner_microstep: 1340.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-10 16:29:54,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.49 | bwd_microstep: 1528.99 | bwd_inner_microstep: 1528.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939 [2024-06-10 16:29:55,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.74 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1930 [2024-06-10 16:29:56,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.82 | bwd_microstep: 881.76 | bwd_inner_microstep: 881.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 16:29:58,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1480.87 | bwd_inner_microstep: 1480.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935 [2024-06-10 16:29:59,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.35 | bwd_microstep: 885.16 | bwd_inner_microstep: 885.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150 [2024-06-10 16:30:01,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.00 | bwd_microstep: 1041.93 | bwd_inner_microstep: 1041.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3484 [2024-06-10 16:30:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.14 | bwd_microstep: 1674.72 | bwd_inner_microstep: 1674.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3690 [2024-06-10 16:30:05,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.10 | bwd_microstep: 1824.31 | bwd_inner_microstep: 1824.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3508 [2024-06-10 16:30:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.01 | bwd_microstep: 1533.90 | bwd_inner_microstep: 1533.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 16:30:10,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1513.49 | bwd_inner_microstep: 1513.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 646 [2024-06-10 16:30:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.22 | bwd_microstep: 274.65 | bwd_inner_microstep: 274.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628 [2024-06-10 16:30:12,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1535.13 | bwd_inner_microstep: 1535.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 16:30:14,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1511.40 | bwd_inner_microstep: 1511.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 16:30:16,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1651.54 | bwd_inner_microstep: 1651.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 16:30:19,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1507.87 | bwd_inner_microstep: 1507.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467 [2024-06-10 16:30:21,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1404.58 | bwd_inner_microstep: 1404.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 16:30:23,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1554.91 | bwd_inner_microstep: 1554.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 16:30:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1393.53 | bwd_inner_microstep: 1393.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792 [2024-06-10 16:30:27,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1553.83 | bwd_inner_microstep: 1553.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3807 [2024-06-10 16:30:29,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.02 | bwd_microstep: 1578.81 | bwd_inner_microstep: 1578.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2322 [2024-06-10 16:30:30,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.09 | bwd_microstep: 986.91 | bwd_inner_microstep: 986.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3544 [2024-06-10 16:30:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.21 | bwd_microstep: 1518.58 | bwd_inner_microstep: 1518.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 16:30:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1514.03 | bwd_inner_microstep: 1514.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431 [2024-06-10 16:30:36,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.00 | bwd_microstep: 1309.16 | bwd_inner_microstep: 1309.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3790 [2024-06-10 16:30:38,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773 [2024-06-10 16:30:41,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 16:30:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.63 | bwd_microstep: 2572.89 | bwd_inner_microstep: 1666.03 | bwd_allreduce_microstep: 906.82 | step_microstep: 37.67 [2024-06-10 16:30:41,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15973.11 | bwd: 44006.65 | bwd_inner: 43098.90 | bwd_allreduce: 907.05 | step: 39.16 {'loss': 1.3089, 'learning_rate': 1.8837103421790486e-05, 'epoch': 0.53} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5813 [2024-06-10 16:30:46,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 1809.83 | bwd_microstep: 2627.21 | bwd_inner_microstep: 2627.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4090 [2024-06-10 16:30:48,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1624.11 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 16:30:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1484.42 | bwd_inner_microstep: 1484.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2258 [2024-06-10 16:30:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.99 | bwd_microstep: 900.60 | bwd_inner_microstep: 900.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 16:30:53,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.70 | bwd_microstep: 1145.39 | bwd_inner_microstep: 1145.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397 [2024-06-10 16:30:55,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.81 | bwd_microstep: 1147.22 | bwd_inner_microstep: 1147.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 16:30:56,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 16:30:58,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.14 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 16:31:00,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2658 [2024-06-10 16:31:02,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 956.07 | bwd_inner_microstep: 956.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892 [2024-06-10 16:31:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.66 | bwd_microstep: 1579.70 | bwd_inner_microstep: 1579.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486 [2024-06-10 16:31:06,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1314.71 | bwd_inner_microstep: 1314.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3672 [2024-06-10 16:31:07,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.74 | bwd_microstep: 1370.92 | bwd_inner_microstep: 1370.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466 [2024-06-10 16:31:09,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.37 | bwd_microstep: 1310.40 | bwd_inner_microstep: 1310.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 16:31:11,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.07 | bwd_microstep: 1381.83 | bwd_inner_microstep: 1381.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 16:31:13,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1385.61 | bwd_inner_microstep: 1385.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3651 [2024-06-10 16:31:15,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.35 | bwd_microstep: 1586.64 | bwd_inner_microstep: 1586.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-10 16:31:17,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.46 | bwd_microstep: 1619.94 | bwd_inner_microstep: 1619.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2017 [2024-06-10 16:31:19,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.30 | bwd_microstep: 898.33 | bwd_inner_microstep: 898.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702 [2024-06-10 16:31:21,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.79 | bwd_microstep: 1629.77 | bwd_inner_microstep: 1629.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443 [2024-06-10 16:31:23,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.21 | bwd_microstep: 1313.04 | bwd_inner_microstep: 1313.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 16:31:25,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1612.96 | bwd_inner_microstep: 1612.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 894 [2024-06-10 16:31:25,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.84 | bwd_microstep: 369.58 | bwd_inner_microstep: 369.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545 [2024-06-10 16:31:28,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1525.45 | bwd_inner_microstep: 1525.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 16:31:30,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.61 | bwd_microstep: 1531.36 | bwd_inner_microstep: 1531.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 16:31:32,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2061 [2024-06-10 16:31:33,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 753.51 | bwd_inner_microstep: 753.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 16:31:35,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.10 | bwd_microstep: 1476.30 | bwd_inner_microstep: 1476.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 16:31:37,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.92 | bwd_microstep: 1637.67 | bwd_inner_microstep: 1637.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3736 [2024-06-10 16:31:39,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 1564.63 | bwd_inner_microstep: 1564.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3569 [2024-06-10 16:31:42,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 1643.86 | bwd_inner_microstep: 1643.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 16:31:44,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.19 | optimizer_step: 6.65 [2024-06-10 16:31:44,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.66 | bwd_microstep: 1510.59 | bwd_inner_microstep: 1502.89 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.52 [2024-06-10 16:31:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17431.95 | bwd: 44438.21 | bwd_inner: 44429.67 | bwd_allreduce: 7.86 | step: 39.02 {'loss': 1.1821, 'learning_rate': 1.8799635030758837e-05, 'epoch': 0.53} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 16:31:45,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1333.65 | bwd_inner_microstep: 1333.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 16:31:47,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1349.33 | bwd_inner_microstep: 1349.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 16:31:50,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.51 | bwd_microstep: 1653.64 | bwd_inner_microstep: 1653.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:31:52,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 16:31:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.36 | bwd_microstep: 1402.03 | bwd_inner_microstep: 1402.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 16:31:55,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1381.73 | bwd_inner_microstep: 1381.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 715 [2024-06-10 16:31:56,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.51 | bwd_microstep: 290.55 | bwd_inner_microstep: 290.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 16:31:58,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1430.19 | bwd_inner_microstep: 1430.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 16:32:00,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.05 | bwd_inner_microstep: 1385.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1948 [2024-06-10 16:32:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.68 | bwd_microstep: 759.37 | bwd_inner_microstep: 759.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480 [2024-06-10 16:32:03,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.24 | bwd_microstep: 1313.91 | bwd_inner_microstep: 1313.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 16:32:05,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1508.76 | bwd_inner_microstep: 1508.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-10 16:32:07,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1448.44 | bwd_inner_microstep: 1448.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651 [2024-06-10 16:32:09,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.19 | bwd_microstep: 1614.98 | bwd_inner_microstep: 1614.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 16:32:11,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1346.38 | bwd_inner_microstep: 1346.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458 [2024-06-10 16:32:13,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.05 | bwd_microstep: 1566.16 | bwd_inner_microstep: 1566.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2285 [2024-06-10 16:32:14,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.69 | bwd_microstep: 906.17 | bwd_inner_microstep: 906.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3826 [2024-06-10 16:32:16,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1416.13 | bwd_inner_microstep: 1416.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842 [2024-06-10 16:32:18,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.56 | bwd_microstep: 1461.17 | bwd_inner_microstep: 1461.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2048 [2024-06-10 16:32:19,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.59 | bwd_microstep: 842.42 | bwd_inner_microstep: 842.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1922 [2024-06-10 16:32:20,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.62 | bwd_microstep: 834.33 | bwd_inner_microstep: 834.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695 [2024-06-10 16:32:22,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1457.74 | bwd_inner_microstep: 1457.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 16:32:24,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.78 | bwd_microstep: 1467.24 | bwd_inner_microstep: 1467.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928 [2024-06-10 16:32:25,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.56 | bwd_microstep: 726.38 | bwd_inner_microstep: 726.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 16:32:27,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.27 | bwd_microstep: 878.38 | bwd_inner_microstep: 878.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 16:32:29,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.96 | bwd_microstep: 1453.03 | bwd_inner_microstep: 1453.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 16:32:31,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1386.65 | bwd_inner_microstep: 1386.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 16:32:32,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.57 | bwd_microstep: 1156.33 | bwd_inner_microstep: 1156.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 16:32:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1283.02 | bwd_inner_microstep: 1282.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 16:32:36,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1456.89 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219 [2024-06-10 16:32:37,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 864.27 | bwd_inner_microstep: 864.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 16:32:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.25 | optimizer_step: 6.59 [2024-06-10 16:32:45,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.81 | bwd_microstep: 6853.49 | bwd_inner_microstep: 1576.29 | bwd_allreduce_microstep: 5277.14 | step_microstep: 38.08 [2024-06-10 16:32:45,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15081.20 | bwd: 45607.86 | bwd_inner: 40329.81 | bwd_allreduce: 5277.37 | step: 39.57 {'loss': 1.264, 'learning_rate': 1.8762170867406366e-05, 'epoch': 0.53} 53%|█████▎ | 917/1726 [15:50:13<14:07:46, 62.88s/it] 53%|█████▎ | 918/1726 [15:51:15<14:06:06, 62.83s/it] 53%|█████▎ | 918/1726 [15:51:15<14:06:06, 62.83s/it] 53%|█████▎ | 919/1726 [15:52:18<14:03:30, 62.71s/it] 53%|█████▎ | 919/1726 [15:52:18<14:03:30, 62.71s/it] 53%|█████▎ | 920/1726 [15:53:18<13:52:48, 62.00s/it] 53%|█████▎ | 920/1726 [15:53:18<13:52:48, 62.00s/it] 53%|█████▎ | 921/1726 [15:54:20<13:52:36, 62.06s/it] 53%|█████▎ | 921/1726 [15:54:20<13:52:36, 62.06s/it] 53%|█████▎ | 922/1726 [15:55:21<13:47:22, 61.74s/it] 53%|█████▎ | 922/1726 [15:55dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931 [2024-06-10 16:32:46,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.92 | bwd_microstep: 876.09 | bwd_inner_microstep: 876.02 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4007 [2024-06-10 16:32:48,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.77 | bwd_microstep: 1604.93 | bwd_inner_microstep: 1604.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2373 [2024-06-10 16:32:49,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.46 | bwd_microstep: 994.65 | bwd_inner_microstep: 994.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2969 [2024-06-10 16:32:51,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.26 | bwd_microstep: 1196.57 | bwd_inner_microstep: 1196.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-10 16:32:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.90 | bwd_microstep: 1546.90 | bwd_inner_microstep: 1546.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3694 [2024-06-10 16:32:55,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1289.53 | bwd_inner_microstep: 1289.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 16:32:57,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.75 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 16:32:59,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1384.40 | bwd_inner_microstep: 1384.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 16:33:01,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1383.06 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 16:33:02,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 16:33:04,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.83 | bwd_microstep: 1384.72 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 16:33:06,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.49 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 16:33:08,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1373.93 | bwd_inner_microstep: 1373.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 16:33:10,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.41 | bwd_microstep: 1516.41 | bwd_inner_microstep: 1516.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 16:33:12,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.44 | bwd_microstep: 1614.92 | bwd_inner_microstep: 1614.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3643 [2024-06-10 16:33:14,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.87 | bwd_microstep: 1363.96 | bwd_inner_microstep: 1363.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3473 [2024-06-10 16:33:16,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.69 | bwd_microstep: 1242.55 | bwd_inner_microstep: 1242.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 16:33:18,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.26 | bwd_microstep: 1294.61 | bwd_inner_microstep: 1294.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-10 16:33:19,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.15 | bwd_microstep: 815.91 | bwd_inner_microstep: 815.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 16:33:21,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1510.25 | bwd_inner_microstep: 1510.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547 [2024-06-10 16:33:23,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1424.63 | bwd_inner_microstep: 1424.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 16:33:24,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.26 | bwd_microstep: 805.47 | bwd_inner_microstep: 805.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 16:33:26,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.56 | bwd_microstep: 1460.33 | bwd_inner_microstep: 1460.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461 [2024-06-10 16:33:28,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.84 | bwd_microstep: 1183.44 | bwd_inner_microstep: 1183.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484 [2024-06-10 16:33:30,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.53 | bwd_microstep: 1343.30 | bwd_inner_microstep: 1343.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3518 [2024-06-10 16:33:32,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.17 | bwd_microstep: 1440.79 | bwd_inner_microstep: 1440.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3598 [2024-06-10 16:33:34,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.93 | bwd_microstep: 1705.39 | bwd_inner_microstep: 1705.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 16:33:36,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2249 [2024-06-10 16:33:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.76 | bwd_microstep: 1062.26 | bwd_inner_microstep: 1062.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3650 [2024-06-10 16:33:39,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.19 | bwd_microstep: 1348.77 | bwd_inner_microstep: 1348.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-10 16:33:41,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.91 | bwd_microstep: 1441.60 | bwd_inner_microstep: 1441.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3776 [2024-06-10 16:33:46,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 16:33:46,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.18 | bwd_microstep: 3660.98 | bwd_inner_microstep: 2089.03 | bwd_allreduce_microstep: 1571.89 | step_microstep: 38.11 [2024-06-10 16:33:46,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16000.84 | bwd: 44570.01 | bwd_inner: 42997.17 | bwd_allreduce: 1572.14 | step: 39.60 {'loss': 1.1735, 'learning_rate': 1.8724711063681665e-05, 'epoch': 0.53} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 16:33:47,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1277.00 | bwd_inner_microstep: 1276.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 16:33:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.71 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869 [2024-06-10 16:33:51,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.06 | bwd_microstep: 1461.44 | bwd_inner_microstep: 1461.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-10 16:33:52,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.97 | bwd_microstep: 787.21 | bwd_inner_microstep: 787.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 16:33:54,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1382.35 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 16:33:56,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1282.62 | bwd_inner_microstep: 1282.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 16:33:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.07 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 16:34:00,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1281.03 | bwd_inner_microstep: 1281.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 16:34:02,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 1390.27 | bwd_inner_microstep: 1390.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619 [2024-06-10 16:34:03,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1312.19 | bwd_inner_microstep: 1312.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3712 [2024-06-10 16:34:06,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.77 | bwd_microstep: 1618.63 | bwd_inner_microstep: 1618.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 16:34:08,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.56 | bwd_microstep: 1523.56 | bwd_inner_microstep: 1523.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-10 16:34:10,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1413.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 16:34:12,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.77 | bwd_microstep: 1478.01 | bwd_inner_microstep: 1477.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2127 [2024-06-10 16:34:13,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.62 | bwd_microstep: 925.78 | bwd_inner_microstep: 925.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 16:34:15,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1282.24 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 16:34:17,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.99 | bwd_microstep: 1611.62 | bwd_inner_microstep: 1611.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 16:34:19,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3456 [2024-06-10 16:34:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1303.35 | bwd_inner_microstep: 1303.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-10 16:34:22,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1157.60 | bwd_inner_microstep: 1157.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 16:34:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1388.31 | bwd_inner_microstep: 1388.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 16:34:26,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1505.94 | bwd_inner_microstep: 1505.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 924 [2024-06-10 16:34:27,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.43 | bwd_microstep: 376.44 | bwd_inner_microstep: 376.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618 [2024-06-10 16:34:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562 [2024-06-10 16:34:31,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1433.96 | bwd_inner_microstep: 1433.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 16:34:33,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1497.17 | bwd_inner_microstep: 1497.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465 [2024-06-10 16:34:35,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.72 | bwd_microstep: 1343.19 | bwd_inner_microstep: 1343.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 16:34:37,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3760 [2024-06-10 16:34:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1587.50 | bwd_inner_microstep: 1587.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 16:34:41,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1372.93 | bwd_inner_microstep: 1372.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134 [2024-06-10 16:34:42,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.53 | bwd_microstep: 861.16 | bwd_inner_microstep: 861.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808 [2024-06-10 16:34:46,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.63 [2024-06-10 16:34:46,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.42 | bwd_microstep: 3259.94 | bwd_inner_microstep: 1790.99 | bwd_allreduce_microstep: 1468.89 | step_microstep: 38.95 [2024-06-10 16:34:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15900.82 | bwd: 44061.81 | bwd_inner: 42592.01 | bwd_allreduce: 1469.12 | step: 40.47 {'loss': 1.1719, 'learning_rate': 1.8687255751517975e-05, 'epoch': 0.54} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 16:34:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.13 | bwd_microstep: 1333.07 | bwd_inner_microstep: 1332.93 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3873 [2024-06-10 16:34:50,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.76 | bwd_microstep: 1540.24 | bwd_inner_microstep: 1540.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 16:34:52,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 16:34:54,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.82 | bwd_microstep: 1548.83 | bwd_inner_microstep: 1548.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 16:34:55,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 787.12 | bwd_inner_microstep: 787.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 16:34:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 16:34:59,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1281.64 | bwd_inner_microstep: 1281.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 16:35:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.46 | bwd_microstep: 793.70 | bwd_inner_microstep: 793.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3498 [2024-06-10 16:35:01,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.51 | bwd_microstep: 1222.05 | bwd_inner_microstep: 1222.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966 [2024-06-10 16:35:02,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 704.56 | bwd_inner_microstep: 704.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-10 16:35:04,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.46 | bwd_microstep: 1284.23 | bwd_inner_microstep: 1284.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 16:35:06,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1442.24 | bwd_inner_microstep: 1442.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431 [2024-06-10 16:35:08,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1409.92 | bwd_inner_microstep: 1409.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 16:35:10,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1342.09 | bwd_inner_microstep: 1342.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486 [2024-06-10 16:35:12,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.41 | bwd_microstep: 1505.99 | bwd_inner_microstep: 1505.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2400 [2024-06-10 16:35:13,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.90 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461 [2024-06-10 16:35:15,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1310.97 | bwd_inner_microstep: 1310.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 16:35:17,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.07 | bwd_microstep: 1426.72 | bwd_inner_microstep: 1426.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819 [2024-06-10 16:35:19,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1386.94 | bwd_inner_microstep: 1386.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 16:35:21,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1414.95 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839 [2024-06-10 16:35:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.22 | bwd_microstep: 1453.51 | bwd_inner_microstep: 1453.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116 [2024-06-10 16:35:24,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.86 | bwd_microstep: 861.95 | bwd_inner_microstep: 861.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426 [2024-06-10 16:35:26,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.08 | bwd_microstep: 1542.55 | bwd_inner_microstep: 1542.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 16:35:29,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.75 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-10 16:35:31,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1409.52 | bwd_inner_microstep: 1409.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3565 [2024-06-10 16:35:33,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.83 | bwd_microstep: 1359.55 | bwd_inner_microstep: 1359.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 16:35:35,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.69 | bwd_microstep: 1454.68 | bwd_inner_microstep: 1454.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591 [2024-06-10 16:35:37,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1463.18 | bwd_inner_microstep: 1463.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 16:35:39,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1540.46 | bwd_inner_microstep: 1540.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 16:35:41,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1397.37 | bwd_inner_microstep: 1397.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 16:35:42,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 903.95 | bwd_inner_microstep: 903.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 16:35:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 16:35:45,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.11 | bwd_microstep: 2143.89 | bwd_inner_microstep: 1677.53 | bwd_allreduce_microstep: 466.30 | step_microstep: 37.91 [2024-06-10 16:35:45,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15770.44 | bwd: 42702.44 | bwd_inner: 42235.13 | bwd_allreduce: 466.57 | step: 39.39 {'loss': 1.2107, 'learning_rate': 1.8649805062832697e-05, 'epoch': 0.54} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 16:35:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-10 16:35:48,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.56 | bwd_microstep: 725.13 | bwd_inner_microstep: 725.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 16:35:50,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1556.09 | bwd_inner_microstep: 1556.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3778 [2024-06-10 16:35:52,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.09 | bwd_microstep: 1579.54 | bwd_inner_microstep: 1579.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-10 16:35:53,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.70 | bwd_microstep: 802.98 | bwd_inner_microstep: 802.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1892 [2024-06-10 16:35:54,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.38 | bwd_microstep: 710.68 | bwd_inner_microstep: 710.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 16:35:55,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489 [2024-06-10 16:35:57,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1432.39 | bwd_inner_microstep: 1432.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-10 16:35:59,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1534.27 | bwd_inner_microstep: 1534.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192 [2024-06-10 16:36:01,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.18 | bwd_microstep: 953.72 | bwd_inner_microstep: 953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-10 16:36:02,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.93 | bwd_microstep: 798.83 | bwd_inner_microstep: 798.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678 [2024-06-10 16:36:04,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 1447.52 | bwd_inner_microstep: 1447.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3528 [2024-06-10 16:36:06,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.89 | bwd_microstep: 1583.29 | bwd_inner_microstep: 1583.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 16:36:08,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1242.78 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 16:36:10,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1496.74 | bwd_inner_microstep: 1496.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2128 [2024-06-10 16:36:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.59 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 16:36:13,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1496.51 | bwd_inner_microstep: 1496.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 16:36:14,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.53 | bwd_microstep: 802.20 | bwd_inner_microstep: 802.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3623 [2024-06-10 16:36:16,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.11 | bwd_microstep: 1310.95 | bwd_inner_microstep: 1310.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724 [2024-06-10 16:36:18,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1367.76 | bwd_inner_microstep: 1367.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 16:36:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3870 [2024-06-10 16:36:21,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.29 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 16:36:24,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1555.26 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 16:36:26,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1397.29 | bwd_inner_microstep: 1397.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 747 [2024-06-10 16:36:26,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.98 | bwd_microstep: 300.58 | bwd_inner_microstep: 300.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 16:36:28,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.74 | bwd_microstep: 1501.92 | bwd_inner_microstep: 1501.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 16:36:30,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1356.35 | bwd_inner_microstep: 1356.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613 [2024-06-10 16:36:32,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3571 [2024-06-10 16:36:34,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.81 | bwd_microstep: 1557.21 | bwd_inner_microstep: 1557.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 16:36:36,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.07 | bwd_microstep: 1542.33 | bwd_inner_microstep: 1542.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2588 [2024-06-10 16:36:37,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.37 | bwd_microstep: 980.11 | bwd_inner_microstep: 980.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 16:36:42,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 16:36:42,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.66 | bwd_microstep: 4128.33 | bwd_inner_microstep: 1687.02 | bwd_allreduce_microstep: 2441.26 | step_microstep: 37.94 [2024-06-10 16:36:42,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14836.09 | bwd: 42189.37 | bwd_inner: 39747.21 | bwd_allreduce: 2441.49 | step: 39.35 {'loss': 1.1955, 'learning_rate': 1.861235912952697e-05, 'epoch': 0.54} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1891 [2024-06-10 16:36:43,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.27 | bwd_microstep: 861.49 | bwd_inner_microstep: 861.40 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 949 [2024-06-10 16:36:44,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.56 | bwd_microstep: 381.07 | bwd_inner_microstep: 381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-10 16:36:46,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1405.14 | bwd_inner_microstep: 1405.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3854 [2024-06-10 16:36:48,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3742 [2024-06-10 16:36:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.41 | bwd_microstep: 1299.19 | bwd_inner_microstep: 1299.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2436 [2024-06-10 16:36:51,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 851.25 | bwd_inner_microstep: 851.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 16:36:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3610 [2024-06-10 16:36:54,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1360.60 | bwd_inner_microstep: 1360.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 16:36:55,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.45 | bwd_microstep: 796.99 | bwd_inner_microstep: 796.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 16:36:57,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.46 | bwd_microstep: 1288.13 | bwd_inner_microstep: 1288.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692 [2024-06-10 16:36:59,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.89 | bwd_microstep: 1458.15 | bwd_inner_microstep: 1458.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 16:37:01,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.81 | bwd_microstep: 1291.02 | bwd_inner_microstep: 1291.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518 [2024-06-10 16:37:03,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1582.13 | bwd_inner_microstep: 1582.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2492 [2024-06-10 16:37:05,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.64 | bwd_microstep: 1054.03 | bwd_inner_microstep: 1054.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2142 [2024-06-10 16:37:06,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.73 | bwd_microstep: 926.97 | bwd_inner_microstep: 926.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 16:37:08,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.72 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555 [2024-06-10 16:37:10,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.19 | bwd_microstep: 1234.47 | bwd_inner_microstep: 1234.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3631 [2024-06-10 16:37:12,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.46 | bwd_microstep: 1441.74 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 16:37:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294 [2024-06-10 16:37:15,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.15 | bwd_microstep: 1075.52 | bwd_inner_microstep: 1075.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3707 [2024-06-10 16:37:17,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1388.13 | bwd_inner_microstep: 1388.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514 [2024-06-10 16:37:19,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1416.93 | bwd_inner_microstep: 1416.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 16:37:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1397.34 | bwd_inner_microstep: 1397.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 16:37:23,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1554.87 | bwd_inner_microstep: 1554.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 16:37:25,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1385.02 | bwd_inner_microstep: 1384.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 16:37:27,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.19 | bwd_microstep: 1626.80 | bwd_inner_microstep: 1626.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3571 [2024-06-10 16:37:29,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.07 | bwd_microstep: 1443.37 | bwd_inner_microstep: 1443.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3762 [2024-06-10 16:37:31,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1590.21 | bwd_inner_microstep: 1590.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-10 16:37:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1367.53 | bwd_inner_microstep: 1367.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3579 [2024-06-10 16:37:36,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.95 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3432 [2024-06-10 16:37:37,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3807 [2024-06-10 16:37:44,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-10 16:37:44,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.55 | bwd_microstep: 5497.73 | bwd_inner_microstep: 1446.33 | bwd_allreduce_microstep: 4051.33 | step_microstep: 38.81 [2024-06-10 16:37:44,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15519.88 | bwd: 45681.28 | bwd_inner: 41628.95 | bwd_allreduce: 4051.61 | step: 40.31 {'loss': 1.2165, 'learning_rate': 1.8574918083485173e-05, 'epoch': 0.54} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-10 16:37:45,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.80 | bwd_microstep: 1139.55 | bwd_inner_microstep: 1139.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4015 [2024-06-10 16:37:47,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1314.42 | bwd_inner_microstep: 1314.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3927 [2024-06-10 16:37:49,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.20 | bwd_microstep: 1591.04 | bwd_inner_microstep: 1591.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902 [2024-06-10 16:37:51,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.67 | bwd_microstep: 1686.67 | bwd_inner_microstep: 1686.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794 [2024-06-10 16:37:53,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.79 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3870 [2024-06-10 16:37:56,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.06 | bwd_microstep: 1664.44 | bwd_inner_microstep: 1664.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 16:37:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.51 | bwd_microstep: 1539.26 | bwd_inner_microstep: 1539.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 16:38:00,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1548.51 | bwd_inner_microstep: 1548.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 16:38:02,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.36 | bwd_microstep: 1391.32 | bwd_inner_microstep: 1391.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 16:38:04,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1244.49 | bwd_inner_microstep: 1244.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 16:38:05,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.54 | bwd_inner_microstep: 1252.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970 [2024-06-10 16:38:07,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.65 | bwd_microstep: 826.94 | bwd_inner_microstep: 826.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-10 16:38:08,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.78 | bwd_microstep: 801.75 | bwd_inner_microstep: 801.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 16:38:09,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.60 | bwd_inner_microstep: 1254.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495 [2024-06-10 16:38:11,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 16:38:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.60 | bwd_microstep: 1488.17 | bwd_inner_microstep: 1488.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669 [2024-06-10 16:38:16,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.70 | bwd_microstep: 1550.92 | bwd_inner_microstep: 1550.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058 [2024-06-10 16:38:17,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 818.70 | bwd_inner_microstep: 818.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 16:38:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1311.12 | bwd_inner_microstep: 1311.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 16:38:21,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1494.64 | bwd_inner_microstep: 1494.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 16:38:23,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1478.89 | bwd_inner_microstep: 1478.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 16:38:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1346.61 | bwd_inner_microstep: 1346.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 16:38:27,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1498.20 | bwd_inner_microstep: 1498.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3479 [2024-06-10 16:38:29,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1575.30 | bwd_inner_microstep: 1575.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 16:38:31,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1402.77 | bwd_inner_microstep: 1402.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009 [2024-06-10 16:38:32,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.26 | bwd_microstep: 896.38 | bwd_inner_microstep: 896.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2076 [2024-06-10 16:38:33,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.83 | bwd_microstep: 1014.58 | bwd_inner_microstep: 1014.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278 [2024-06-10 16:38:35,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.04 | bwd_microstep: 1070.68 | bwd_inner_microstep: 1070.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 16:38:37,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.25 | bwd_inner_microstep: 1398.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-10 16:38:39,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1306.81 | bwd_inner_microstep: 1306.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2057 [2024-06-10 16:38:40,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.16 | bwd_microstep: 915.43 | bwd_inner_microstep: 915.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3586 [2024-06-10 16:38:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.22 | optimizer_step: 6.62 [2024-06-10 16:38:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.33 | bwd_microstep: 6672.00 | bwd_inner_microstep: 1537.66 | bwd_allreduce_microstep: 5134.29 | step_microstep: 37.87 [2024-06-10 16:38:47,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15741.90 | bwd: 47393.78 | bwd_inner: 42258.59 | bwd_allreduce: 5134.52 | step: 39.36 :21<13:47:22, 61.74s/it] 53%|█████▎ | 923/1726 [15:56:22<13:42:58, 61.49s/it] 53%|█████▎ | 923/1726 [15:56:22<13:42:58, 61.49s/it] 54%|█████▎ | 924/1726 [15:57:23<13:37:11, 61.14s/it] 54%|█████▎ | 924/1726 [15:57:23<13:37:11, 61.14s/it] 54%|█████▎ | 925/1726 [15:58:21<13:26:50, 60.44s/it] 54%|█████▎ | 925/1726 [15:58:21<13:26:50, 60.44s/it] 54%|█████▎ | 926/1726 [15:59:19<13:13:29, 59.51s/it] 54%|█████▎ | 926/1726 [15:59:19<13:13:29, 59.51s/it] 54%|█████▎ | 927/1726 [16:00:20<13:20:35, 60.12s/it] 54%|█████▎ | 927/1726 [16:00:20<13:20:35, 60.12s/it] 54%|████�{'loss': 1.2519, 'learning_rate': 1.853748205657448e-05, 'epoch': 0.54} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3477 [2024-06-10 16:38:49,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1565.39 | bwd_inner_microstep: 1565.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3393 [2024-06-10 16:38:51,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1205.51 | bwd_inner_microstep: 1205.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893 [2024-06-10 16:38:53,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.96 | bwd_microstep: 1679.55 | bwd_inner_microstep: 1679.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2649 [2024-06-10 16:38:55,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.03 | bwd_microstep: 1114.57 | bwd_inner_microstep: 1114.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2311 [2024-06-10 16:38:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.08 | bwd_microstep: 979.65 | bwd_inner_microstep: 979.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 16:38:58,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1244.33 | bwd_inner_microstep: 1244.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 16:38:59,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.31 | bwd_microstep: 798.18 | bwd_inner_microstep: 798.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 16:39:00,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 792.51 | bwd_inner_microstep: 792.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 16:39:02,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1312.24 | bwd_inner_microstep: 1312.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 16:39:04,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1248.29 | bwd_inner_microstep: 1248.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3682 [2024-06-10 16:39:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1475.42 | bwd_inner_microstep: 1475.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3412 [2024-06-10 16:39:08,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.03 | bwd_microstep: 1472.87 | bwd_inner_microstep: 1472.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 16:39:10,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.48 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3665 [2024-06-10 16:39:12,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.40 | bwd_microstep: 1672.27 | bwd_inner_microstep: 1672.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 16:39:14,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.27 | bwd_microstep: 1618.61 | bwd_inner_microstep: 1618.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 16:39:15,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.51 | bwd_microstep: 796.70 | bwd_inner_microstep: 796.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3632 [2024-06-10 16:39:17,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1314.15 | bwd_inner_microstep: 1314.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-10 16:39:19,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1307.83 | bwd_inner_microstep: 1307.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 16:39:21,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1254.90 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 16:39:22,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.10 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473 [2024-06-10 16:39:24,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1330.19 | bwd_inner_microstep: 1330.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 16:39:26,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1411.44 | bwd_inner_microstep: 1411.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 16:39:27,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.67 | bwd_microstep: 685.89 | bwd_inner_microstep: 685.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 16:39:29,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1562.41 | bwd_inner_microstep: 1562.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2082 [2024-06-10 16:39:31,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 920.44 | bwd_inner_microstep: 920.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300 [2024-06-10 16:39:32,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.83 | bwd_microstep: 882.25 | bwd_inner_microstep: 882.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1904 [2024-06-10 16:39:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.27 | bwd_microstep: 716.30 | bwd_inner_microstep: 716.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073 [2024-06-10 16:39:34,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.68 | bwd_microstep: 1012.66 | bwd_inner_microstep: 1012.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-10 16:39:36,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1394.06 | bwd_inner_microstep: 1394.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-10 16:39:38,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.75 | bwd_microstep: 971.72 | bwd_inner_microstep: 971.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589 [2024-06-10 16:39:40,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.19 | bwd_microstep: 1806.51 | bwd_inner_microstep: 1806.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821 [2024-06-10 16:39:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 16:39:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.43 | bwd_microstep: 8695.23 | bwd_inner_microstep: 1716.51 | bwd_allreduce_microstep: 6978.68 | step_microstep: 38.07 [2024-06-10 16:39:49,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14898.95 | bwd: 47079.00 | bwd_inner: 40099.41 | bwd_allreduce: 6978.90 | step: 39.55 {'loss': 1.2251, 'learning_rate': 1.8500051180644388e-05, 'epoch': 0.54} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 16:39:51,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.63 | bwd_microstep: 1280.97 | bwd_inner_microstep: 1280.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4049 [2024-06-10 16:39:53,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.30 | bwd_microstep: 1511.11 | bwd_inner_microstep: 1511.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3900 [2024-06-10 16:39:55,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.29 | bwd_microstep: 1479.19 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009 [2024-06-10 16:39:56,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.09 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 16:39:58,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1537.72 | bwd_inner_microstep: 1537.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238 [2024-06-10 16:40:00,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.58 | bwd_microstep: 861.87 | bwd_inner_microstep: 861.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 16:40:01,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.98 | bwd_microstep: 788.62 | bwd_inner_microstep: 788.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767 [2024-06-10 16:40:03,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 16:40:04,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.79 | bwd_microstep: 796.40 | bwd_inner_microstep: 796.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691 [2024-06-10 16:40:06,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.93 | bwd_microstep: 1325.77 | bwd_inner_microstep: 1325.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523 [2024-06-10 16:40:08,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1421.76 | bwd_inner_microstep: 1421.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2479 [2024-06-10 16:40:09,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.54 | bwd_microstep: 1047.44 | bwd_inner_microstep: 1047.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085 [2024-06-10 16:40:10,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.47 | bwd_microstep: 853.28 | bwd_inner_microstep: 853.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 16:40:12,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.60 | bwd_microstep: 1372.86 | bwd_inner_microstep: 1372.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3952 [2024-06-10 16:40:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.16 | bwd_microstep: 1895.27 | bwd_inner_microstep: 1895.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3898 [2024-06-10 16:40:17,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1493.32 | bwd_inner_microstep: 1493.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 16:40:19,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.79 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 16:40:21,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1511.70 | bwd_inner_microstep: 1511.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3534 [2024-06-10 16:40:23,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1196.35 | bwd_inner_microstep: 1196.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 16:40:25,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1515.72 | bwd_inner_microstep: 1515.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3836 [2024-06-10 16:40:27,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.06 | bwd_microstep: 1725.57 | bwd_inner_microstep: 1725.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 16:40:28,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 788.01 | bwd_inner_microstep: 787.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 16:40:30,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1348.32 | bwd_inner_microstep: 1348.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 16:40:32,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.89 | bwd_microstep: 1159.18 | bwd_inner_microstep: 1159.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-10 16:40:34,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1439.68 | bwd_inner_microstep: 1439.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3539 [2024-06-10 16:40:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.59 | bwd_microstep: 1231.48 | bwd_inner_microstep: 1231.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-10 16:40:37,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1403.54 | bwd_inner_microstep: 1403.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-10 16:40:39,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.41 | bwd_microstep: 1443.78 | bwd_inner_microstep: 1443.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800 [2024-06-10 16:40:41,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.69 | bwd_microstep: 1614.73 | bwd_inner_microstep: 1614.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3478 [2024-06-10 16:40:43,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1403.15 | bwd_inner_microstep: 1403.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230 [2024-06-10 16:40:45,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.77 | bwd_microstep: 961.57 | bwd_inner_microstep: 961.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3576 [2024-06-10 16:40:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.07 | optimizer_step: 6.62 [2024-06-10 16:40:50,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.04 | bwd_microstep: 4998.07 | bwd_inner_microstep: 1840.80 | bwd_allreduce_microstep: 3157.22 | step_microstep: 37.71 [2024-06-10 16:40:50,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15630.84 | bwd: 45128.14 | bwd_inner: 41970.02 | bwd_allreduce: 3157.45 | step: 39.17 {'loss': 1.2351, 'learning_rate': 1.846262558752623e-05, 'epoch': 0.54} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472 [2024-06-10 16:40:53,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.70 | bwd_microstep: 1564.37 | bwd_inner_microstep: 1564.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2430 [2024-06-10 16:40:54,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.80 | bwd_microstep: 908.76 | bwd_inner_microstep: 908.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3854 [2024-06-10 16:40:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1361.55 | bwd_inner_microstep: 1361.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895 [2024-06-10 16:40:58,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.97 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271 [2024-06-10 16:40:59,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.63 | bwd_microstep: 874.68 | bwd_inner_microstep: 874.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479 [2024-06-10 16:41:01,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 16:41:03,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.39 | bwd_microstep: 1153.54 | bwd_inner_microstep: 1153.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-10 16:41:05,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.49 | bwd_microstep: 1524.51 | bwd_inner_microstep: 1524.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608 [2024-06-10 16:41:06,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1214.40 | bwd_inner_microstep: 1214.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 16:41:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 16:41:09,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.02 | bwd_microstep: 795.11 | bwd_inner_microstep: 795.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458 [2024-06-10 16:41:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.20 | bwd_microstep: 1339.76 | bwd_inner_microstep: 1339.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925 [2024-06-10 16:41:12,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.71 | bwd_microstep: 726.85 | bwd_inner_microstep: 726.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3690 [2024-06-10 16:41:14,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1586.55 | bwd_inner_microstep: 1586.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 16:41:16,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3462 [2024-06-10 16:41:18,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1421.21 | bwd_inner_microstep: 1421.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2643 [2024-06-10 16:41:20,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.04 | bwd_microstep: 1113.08 | bwd_inner_microstep: 1113.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 16:41:22,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1387.42 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471 [2024-06-10 16:41:24,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.92 | bwd_microstep: 1311.03 | bwd_inner_microstep: 1311.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 16:41:26,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1488.36 | bwd_inner_microstep: 1488.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-10 16:41:27,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1325.15 | bwd_inner_microstep: 1325.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 16:41:29,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1491.63 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 16:41:31,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 16:41:33,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1257.27 | bwd_inner_microstep: 1257.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 16:41:35,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1494.18 | bwd_inner_microstep: 1494.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 16:41:37,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1555.14 | bwd_inner_microstep: 1555.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 16:41:39,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.89 | bwd_microstep: 1293.61 | bwd_inner_microstep: 1293.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 16:41:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1399.38 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 16:41:43,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1389.84 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 16:41:45,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1400.47 | bwd_inner_microstep: 1400.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3807 [2024-06-10 16:41:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.85 | bwd_microstep: 1517.85 | bwd_inner_microstep: 1517.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2029 [2024-06-10 16:41:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-10 16:41:52,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.42 | bwd_microstep: 4192.71 | bwd_inner_microstep: 1032.19 | bwd_allreduce_microstep: 3160.46 | step_microstep: 37.97 [2024-06-10 16:41:52,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15739.00 | bwd: 45138.16 | bwd_inner: 41976.79 | bwd_allreduce: 3160.69 | step: 39.46 {'loss': 1.2234, 'learning_rate': 1.8425205409032767e-05, 'epoch': 0.54} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923 [2024-06-10 16:41:53,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.40 | bwd_microstep: 817.90 | bwd_inner_microstep: 817.80 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 16:41:55,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.41 | bwd_microstep: 1477.46 | bwd_inner_microstep: 1477.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 16:41:57,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3582 [2024-06-10 16:41:58,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.28 | bwd_microstep: 1329.21 | bwd_inner_microstep: 1329.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4165 [2024-06-10 16:42:01,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.87 | bwd_microstep: 1646.89 | bwd_inner_microstep: 1646.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773 [2024-06-10 16:42:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1465.92 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 16:42:04,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-10 16:42:06,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.20 | bwd_microstep: 1213.77 | bwd_inner_microstep: 1213.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3708 [2024-06-10 16:42:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 16:42:10,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1626.09 | bwd_inner_microstep: 1626.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3854 [2024-06-10 16:42:13,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.59 | bwd_microstep: 1767.02 | bwd_inner_microstep: 1767.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3674 [2024-06-10 16:42:15,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.79 | bwd_microstep: 1355.85 | bwd_inner_microstep: 1355.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 16:42:16,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1350.21 | bwd_inner_microstep: 1350.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3657 [2024-06-10 16:42:19,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.27 | bwd_microstep: 1682.31 | bwd_inner_microstep: 1682.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3719 [2024-06-10 16:42:21,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.51 | bwd_microstep: 1470.36 | bwd_inner_microstep: 1470.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666 [2024-06-10 16:42:23,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.19 | bwd_microstep: 1325.59 | bwd_inner_microstep: 1325.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 16:42:25,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.95 | bwd_inner_microstep: 1396.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 16:42:26,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.16 | bwd_microstep: 1181.95 | bwd_inner_microstep: 1181.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3953 [2024-06-10 16:42:28,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.52 | bwd_microstep: 1626.97 | bwd_inner_microstep: 1626.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892 [2024-06-10 16:42:29,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.04 | bwd_microstep: 369.84 | bwd_inner_microstep: 369.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115 [2024-06-10 16:42:30,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.43 | bwd_microstep: 927.38 | bwd_inner_microstep: 927.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 16:42:32,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1529.67 | bwd_inner_microstep: 1529.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459 [2024-06-10 16:42:34,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.87 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-10 16:42:36,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.87 | bwd_microstep: 806.16 | bwd_inner_microstep: 806.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185 [2024-06-10 16:42:37,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.44 | bwd_microstep: 890.03 | bwd_inner_microstep: 890.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 16:42:39,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 1559.69 | bwd_inner_microstep: 1559.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940 [2024-06-10 16:42:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.88 | bwd_microstep: 761.30 | bwd_inner_microstep: 761.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 16:42:42,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 1509.20 | bwd_inner_microstep: 1509.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 16:42:43,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.31 | bwd_microstep: 878.65 | bwd_inner_microstep: 878.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 16:42:46,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.84 | bwd_microstep: 1658.43 | bwd_inner_microstep: 1658.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-10 16:42:48,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1447.30 | bwd_inner_microstep: 1447.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575 [2024-06-10 16:42:54,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.12 | optimizer_step: 6.58 [2024-06-10 16:42:54,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.15 | bwd_microstep: 5477.33 | bwd_inner_microstep: 1678.14 | bwd_allreduce_microstep: 3799.14 | step_microstep: 37.88 [2024-06-10 16:42:54,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15733.06 | bwd: 45945.09 | bwd_inner: 42144.97 | bwd_allreduce: 3799.41 | step: 39.40 {'loss': 1.2131, 'learning_rate': 1.838779077695766e-05, 'epoch': 0.54} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-10 16:42:55,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.49 | bwd_microstep: 1330.36 | bwd_inner_microstep: 1330.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 16:42:57,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1475.54 | bwd_inner_microstep: 1475.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 16:42:59,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.59 | bwd_microstep: 1338.32 | bwd_inner_microstep: 1338.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-10 16:43:02,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1641.01 | bwd_inner_microstep: 1640.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 16:43:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.04 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 16:43:05,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.93 | bwd_inner_microstep: 1282.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 16:43:06,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.34 | bwd_microstep: 791.49 | bwd_inner_microstep: 791.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 16:43:08,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.37 | bwd_microstep: 1158.49 | bwd_inner_microstep: 1158.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498 [2024-06-10 16:43:10,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3715 [2024-06-10 16:43:12,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.99 | bwd_microstep: 1699.56 | bwd_inner_microstep: 1699.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1993 [2024-06-10 16:43:13,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.14 | bwd_microstep: 828.84 | bwd_inner_microstep: 828.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2186 [2024-06-10 16:43:15,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.67 | bwd_microstep: 1048.18 | bwd_inner_microstep: 1048.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-10 16:43:17,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1548.76 | bwd_inner_microstep: 1548.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 16:43:19,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1335.48 | bwd_inner_microstep: 1335.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 16:43:21,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1390.32 | bwd_inner_microstep: 1390.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490 [2024-06-10 16:43:23,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.64 | bwd_microstep: 1314.18 | bwd_inner_microstep: 1314.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 943 [2024-06-10 16:43:23,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 162.00 | bwd_microstep: 412.55 | bwd_inner_microstep: 412.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3610 [2024-06-10 16:43:25,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.55 | bwd_microstep: 1536.72 | bwd_inner_microstep: 1536.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-10 16:43:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.74 | bwd_microstep: 1754.98 | bwd_inner_microstep: 1754.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3610 [2024-06-10 16:43:30,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.68 | bwd_microstep: 1807.84 | bwd_inner_microstep: 1807.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 16:43:32,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.40 | bwd_microstep: 1397.23 | bwd_inner_microstep: 1397.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 16:43:34,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1283.15 | bwd_inner_microstep: 1283.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3835 [2024-06-10 16:43:36,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.82 | bwd_microstep: 1618.45 | bwd_inner_microstep: 1618.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 16:43:38,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.00 | bwd_microstep: 1650.98 | bwd_inner_microstep: 1650.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-10 16:43:40,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1449.81 | bwd_inner_microstep: 1449.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 16:43:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1340.24 | bwd_inner_microstep: 1340.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914 [2024-06-10 16:43:43,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.44 | bwd_microstep: 686.27 | bwd_inner_microstep: 686.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 16:43:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.25 | bwd_microstep: 1488.04 | bwd_inner_microstep: 1488.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2189 [2024-06-10 16:43:46,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.85 | bwd_microstep: 765.26 | bwd_inner_microstep: 765.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196 [2024-06-10 16:43:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.18 | bwd_microstep: 958.12 | bwd_inner_microstep: 958.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 16:43:50,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 16:43:55,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-10 16:43:55,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 5237.04 | bwd_inner_microstep: 1460.13 | bwd_allreduce_microstep: 3776.84 | step_microstep: 38.84 [2024-06-10 16:43:55,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.44 | bwd: 45780.30 | bwd_inner: 42002.54 | bwd_allreduce: 3777.08 | step: 40.28 �▍ | 928/1726 [16:01:24<13:32:57, 61.12s/it] 54%|█████▍ | 928/1726 [16:01:24<13:32:57, 61.12s/it] 54%|█████▍ | 929/1726 [16:02:26<13:36:38, 61.48s/it] 54%|█████▍ | 929/1726 [16:02:26<13:36:38, 61.48s/it] 54%|█████▍ | 930/1726 [16:03:27<13:34:04, 61.36s/it] 54%|█████▍ | 930/1726 [16:03:27<13:34:04, 61.36s/it] 54%|█████▍ | 931/1726 [16:04:28<13:32:23, 61.31s/it] 54%|█████▍ | 931/1726 [16:04:28<13:32:23, 61.31s/it] 54%|█████▍ | 932/1726 [16:05:30<13:34:07, 61.52s/it] 54%|█████▍ | 932/1726 [16:05:30<13:34:07, 61.52s/it] 54%|█████▍ | 933/1726 [16:06:32<13:34:00, 61.59{'loss': 1.2285, 'learning_rate': 1.8350381823075062e-05, 'epoch': 0.54} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 16:43:57,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1387.33 | bwd_inner_microstep: 1387.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3930 [2024-06-10 16:43:59,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.37 | bwd_microstep: 1487.90 | bwd_inner_microstep: 1487.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894 [2024-06-10 16:44:02,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.23 | bwd_microstep: 1583.03 | bwd_inner_microstep: 1583.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4303 [2024-06-10 16:44:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.70 | bwd_microstep: 1778.65 | bwd_inner_microstep: 1778.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 16:44:06,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1285.58 | bwd_inner_microstep: 1285.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536 [2024-06-10 16:44:07,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1197.19 | bwd_inner_microstep: 1197.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 16:44:09,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1396.06 | bwd_inner_microstep: 1396.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 16:44:11,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.53 | bwd_microstep: 1447.75 | bwd_inner_microstep: 1447.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 16:44:13,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 16:44:15,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1345.99 | bwd_inner_microstep: 1345.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3493 [2024-06-10 16:44:17,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1575.84 | bwd_inner_microstep: 1575.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2381 [2024-06-10 16:44:19,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.44 | bwd_microstep: 929.76 | bwd_inner_microstep: 929.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-10 16:44:21,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1432.14 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3659 [2024-06-10 16:44:23,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.30 | bwd_microstep: 1822.69 | bwd_inner_microstep: 1822.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-10 16:44:25,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1503.00 | bwd_inner_microstep: 1502.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 16:44:27,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 16:44:29,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1490.54 | bwd_inner_microstep: 1490.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 16:44:31,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1491.34 | bwd_inner_microstep: 1491.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3643 [2024-06-10 16:44:33,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.96 | bwd_microstep: 1345.79 | bwd_inner_microstep: 1345.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852 [2024-06-10 16:44:35,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.92 | bwd_microstep: 1665.35 | bwd_inner_microstep: 1665.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 16:44:37,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.17 | bwd_microstep: 1384.32 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 16:44:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1352.73 | bwd_inner_microstep: 1352.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1900 [2024-06-10 16:44:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.31 | bwd_microstep: 685.49 | bwd_inner_microstep: 685.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-10 16:44:42,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.94 | bwd_microstep: 1423.76 | bwd_inner_microstep: 1423.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3479 [2024-06-10 16:44:44,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.57 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609 [2024-06-10 16:44:46,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.95 | bwd_microstep: 1309.14 | bwd_inner_microstep: 1309.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 16:44:47,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.37 | bwd_microstep: 805.69 | bwd_inner_microstep: 805.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 16:44:49,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.98 | bwd_microstep: 1294.20 | bwd_inner_microstep: 1294.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3832 [2024-06-10 16:44:51,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.43 | bwd_microstep: 1616.33 | bwd_inner_microstep: 1616.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 16:44:53,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 16:44:55,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 16:44:57,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-10 16:44:57,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.87 | bwd_microstep: 1583.37 | bwd_inner_microstep: 1575.67 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.84 [2024-06-10 16:44:57,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16689.32 | bwd: 44648.09 | bwd_inner: 44639.55 | bwd_allreduce: 7.87 | step: 39.34 {'loss': 1.2199, 'learning_rate': 1.831297867913911e-05, 'epoch': 0.54} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3556 [2024-06-10 16:44:59,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1591.44 | bwd_inner_microstep: 1591.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 16:45:01,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1276.24 | bwd_inner_microstep: 1276.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448 [2024-06-10 16:45:03,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1415.65 | bwd_inner_microstep: 1415.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2317 [2024-06-10 16:45:04,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.75 | bwd_microstep: 931.66 | bwd_inner_microstep: 931.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 16:45:06,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1246.58 | bwd_inner_microstep: 1246.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799 [2024-06-10 16:45:08,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1549.49 | bwd_inner_microstep: 1549.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 16:45:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.85 | bwd_microstep: 1353.63 | bwd_inner_microstep: 1353.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3565 [2024-06-10 16:45:12,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1461.39 | bwd_inner_microstep: 1461.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1902 [2024-06-10 16:45:13,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.71 | bwd_microstep: 779.21 | bwd_inner_microstep: 779.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 16:45:15,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.63 | bwd_microstep: 1534.58 | bwd_inner_microstep: 1534.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-10 16:45:17,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.82 | bwd_microstep: 1209.58 | bwd_inner_microstep: 1209.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425 [2024-06-10 16:45:19,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1308.53 | bwd_inner_microstep: 1308.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 16:45:21,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.05 | bwd_inner_microstep: 1375.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 16:45:22,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1380.18 | bwd_inner_microstep: 1380.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511 [2024-06-10 16:45:24,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1444.52 | bwd_inner_microstep: 1444.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 16:45:27,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1480.98 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517 [2024-06-10 16:45:29,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.62 | bwd_microstep: 1587.34 | bwd_inner_microstep: 1587.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384 [2024-06-10 16:45:30,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.14 | bwd_microstep: 1240.61 | bwd_inner_microstep: 1240.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650 [2024-06-10 16:45:32,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.29 | bwd_microstep: 1425.84 | bwd_inner_microstep: 1425.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 16:45:35,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.77 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-10 16:45:37,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.92 | bwd_microstep: 1599.85 | bwd_inner_microstep: 1599.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-10 16:45:39,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615 [2024-06-10 16:45:39,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.89 | bwd_microstep: 260.58 | bwd_inner_microstep: 260.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-10 16:45:41,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1409.39 | bwd_inner_microstep: 1409.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 16:45:43,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1299.59 | bwd_inner_microstep: 1299.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 16:45:45,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.39 | bwd_microstep: 1297.56 | bwd_inner_microstep: 1297.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 16:45:47,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1658.63 | bwd_inner_microstep: 1658.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 16:45:49,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1449.72 | bwd_inner_microstep: 1449.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2062 [2024-06-10 16:45:50,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.53 | bwd_microstep: 1008.62 | bwd_inner_microstep: 1008.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438 [2024-06-10 16:45:52,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.67 | bwd_microstep: 1379.06 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3769 [2024-06-10 16:45:55,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1706.02 | bwd_inner_microstep: 1705.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-10 16:45:58,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 16:45:58,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 3152.64 | bwd_inner_microstep: 1830.39 | bwd_allreduce_microstep: 1322.21 | step_microstep: 37.99 [2024-06-10 16:45:58,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16125.90 | bwd: 44921.84 | bwd_inner: 43598.72 | bwd_allreduce: 1322.43 | step: 39.61 {'loss': 1.2554, 'learning_rate': 1.8275581476883472e-05, 'epoch': 0.54} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3552 [2024-06-10 16:46:00,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.55 | bwd_microstep: 1451.20 | bwd_inner_microstep: 1451.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 16:46:03,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.11 | bwd_microstep: 1619.42 | bwd_inner_microstep: 1619.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864 [2024-06-10 16:46:05,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.96 | bwd_microstep: 1459.24 | bwd_inner_microstep: 1459.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 16:46:07,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1377.16 | bwd_inner_microstep: 1377.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 16:46:09,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1552.12 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-10 16:46:11,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1532.87 | bwd_inner_microstep: 1532.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1952 [2024-06-10 16:46:12,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.30 | bwd_microstep: 702.54 | bwd_inner_microstep: 702.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 16:46:14,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1386.15 | bwd_inner_microstep: 1386.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 16:46:15,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.18 | bwd_microstep: 790.59 | bwd_inner_microstep: 790.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2169 [2024-06-10 16:46:16,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.75 | bwd_microstep: 760.05 | bwd_inner_microstep: 760.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 16:46:18,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.67 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 16:46:20,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 16:46:22,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1647.19 | bwd_inner_microstep: 1647.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3661 [2024-06-10 16:46:24,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.48 | bwd_microstep: 1661.28 | bwd_inner_microstep: 1661.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 16:46:26,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.99 | bwd_microstep: 1479.52 | bwd_inner_microstep: 1479.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505 [2024-06-10 16:46:29,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1574.02 | bwd_inner_microstep: 1573.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 16:46:31,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.40 | bwd_microstep: 1599.01 | bwd_inner_microstep: 1598.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830 [2024-06-10 16:46:33,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.74 | bwd_microstep: 1488.40 | bwd_inner_microstep: 1488.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 16:46:35,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1394.61 | bwd_inner_microstep: 1394.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 16:46:37,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.81 | bwd_microstep: 1282.61 | bwd_inner_microstep: 1282.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029 [2024-06-10 16:46:38,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.10 | bwd_microstep: 808.64 | bwd_inner_microstep: 808.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 16:46:40,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.46 | bwd_microstep: 1498.54 | bwd_inner_microstep: 1498.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 16:46:42,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.81 | bwd_microstep: 1455.67 | bwd_inner_microstep: 1455.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530 [2024-06-10 16:46:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1425.08 | bwd_inner_microstep: 1425.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627 [2024-06-10 16:46:46,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3783 [2024-06-10 16:46:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147 [2024-06-10 16:46:49,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.95 | bwd_microstep: 948.65 | bwd_inner_microstep: 948.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3511 [2024-06-10 16:46:51,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1251.85 | bwd_inner_microstep: 1251.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-10 16:46:52,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.89 | bwd_microstep: 875.93 | bwd_inner_microstep: 875.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3810 [2024-06-10 16:46:54,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1528.59 | bwd_inner_microstep: 1528.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457 [2024-06-10 16:46:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1566.77 | bwd_inner_microstep: 1566.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801 [2024-06-10 16:47:00,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 16:47:00,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.80 | bwd_microstep: 3180.24 | bwd_inner_microstep: 1984.00 | bwd_allreduce_microstep: 1196.19 | step_microstep: 37.66 [2024-06-10 16:47:00,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16230.27 | bwd: 44863.01 | bwd_inner: 43665.93 | bwd_allreduce: 1196.42 | step: 39.17 {'loss': 1.1911, 'learning_rate': 1.823819034802091e-05, 'epoch': 0.54} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 16:47:02,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.30 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462 [2024-06-10 16:47:04,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.01 | bwd_microstep: 1209.44 | bwd_inner_microstep: 1209.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-10 16:47:06,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-10 16:47:08,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.71 | bwd_microstep: 1552.94 | bwd_inner_microstep: 1552.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 16:47:09,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1246.08 | bwd_inner_microstep: 1246.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2702 [2024-06-10 16:47:11,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.56 | bwd_microstep: 938.45 | bwd_inner_microstep: 938.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 16:47:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 16:47:14,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 16:47:16,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3724 [2024-06-10 16:47:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.59 | bwd_microstep: 1475.93 | bwd_inner_microstep: 1475.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 16:47:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1247.80 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 16:47:22,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.16 | bwd_microstep: 1339.31 | bwd_inner_microstep: 1339.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 16:47:24,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.39 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3686 [2024-06-10 16:47:26,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.89 | bwd_microstep: 1551.49 | bwd_inner_microstep: 1551.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:47:28,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.77 | bwd_microstep: 1390.75 | bwd_inner_microstep: 1390.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3690 [2024-06-10 16:47:30,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.15 | bwd_microstep: 1424.11 | bwd_inner_microstep: 1424.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138 [2024-06-10 16:47:31,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.30 | bwd_microstep: 928.74 | bwd_inner_microstep: 928.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637 [2024-06-10 16:47:33,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1409.60 | bwd_inner_microstep: 1409.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 16:47:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1485.69 | bwd_inner_microstep: 1485.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 16:47:37,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1388.21 | bwd_inner_microstep: 1388.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 16:47:39,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.82 | bwd_microstep: 1321.17 | bwd_inner_microstep: 1321.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 16:47:41,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 16:47:43,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.53 | bwd_microstep: 1357.94 | bwd_inner_microstep: 1357.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2227 [2024-06-10 16:47:44,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.51 | bwd_microstep: 800.56 | bwd_inner_microstep: 800.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144 [2024-06-10 16:47:45,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.36 | bwd_microstep: 834.09 | bwd_inner_microstep: 834.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 16:47:47,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1466.42 | bwd_inner_microstep: 1466.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 16:47:49,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1487.38 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 16:47:51,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.53 | bwd_microstep: 1251.81 | bwd_inner_microstep: 1251.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3633 [2024-06-10 16:47:53,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.07 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3545 [2024-06-10 16:47:55,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.49 | bwd_microstep: 1454.27 | bwd_inner_microstep: 1454.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 16:47:57,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1499.31 | bwd_inner_microstep: 1499.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-10 16:48:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 16:48:02,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 4332.41 | bwd_inner_microstep: 1813.98 | bwd_allreduce_microstep: 2518.37 | step_microstep: 37.75 [2024-06-10 16:48:02,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16053.91 | bwd: 45543.48 | bwd_inner: 43024.21 | bwd_allreduce: 2518.61 | step: 39.26 {'loss': 1.2361, 'learning_rate': 1.820080542424278e-05, 'epoch': 0.54} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-10 16:48:04,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1427.91 | bwd_inner_microstep: 1427.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3987 [2024-06-10 16:48:06,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.40 | bwd_microstep: 1602.62 | bwd_inner_microstep: 1602.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 16:48:08,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883 [2024-06-10 16:48:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.35 | bwd_microstep: 1682.19 | bwd_inner_microstep: 1682.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 16:48:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1376.96 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:48:14,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376 [2024-06-10 16:48:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.77 | bwd_microstep: 1027.31 | bwd_inner_microstep: 1027.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 16:48:17,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 16:48:19,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.82 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951 [2024-06-10 16:48:20,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.69 | bwd_microstep: 697.76 | bwd_inner_microstep: 697.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 16:48:22,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.79 | bwd_inner_microstep: 1252.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 16:48:23,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1151.48 | bwd_inner_microstep: 1151.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 16:48:26,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1501.56 | bwd_inner_microstep: 1501.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001 [2024-06-10 16:48:27,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.24 | bwd_microstep: 897.19 | bwd_inner_microstep: 897.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3442 [2024-06-10 16:48:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.22 | bwd_microstep: 1495.22 | bwd_inner_microstep: 1495.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3642 [2024-06-10 16:48:31,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.61 | bwd_microstep: 1643.43 | bwd_inner_microstep: 1643.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3635 [2024-06-10 16:48:33,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.64 | bwd_microstep: 1662.06 | bwd_inner_microstep: 1662.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 16:48:36,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.31 | bwd_microstep: 1599.39 | bwd_inner_microstep: 1599.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2419 [2024-06-10 16:48:37,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 963.87 | bwd_inner_microstep: 963.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999 [2024-06-10 16:48:38,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.06 | bwd_microstep: 739.05 | bwd_inner_microstep: 739.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2184 [2024-06-10 16:48:39,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.84 | bwd_microstep: 955.40 | bwd_inner_microstep: 955.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 16:48:41,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.78 | bwd_inner_microstep: 1508.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585 [2024-06-10 16:48:43,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1337.16 | bwd_inner_microstep: 1337.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134 [2024-06-10 16:48:44,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.85 | bwd_microstep: 863.30 | bwd_inner_microstep: 863.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 16:48:46,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.57 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 16:48:48,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3805 [2024-06-10 16:48:50,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.17 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970 [2024-06-10 16:48:51,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.21 | bwd_microstep: 829.33 | bwd_inner_microstep: 829.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3727 [2024-06-10 16:48:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.86 | bwd_microstep: 1487.10 | bwd_inner_microstep: 1487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778 [2024-06-10 16:48:55,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.00 | bwd_microstep: 1640.24 | bwd_inner_microstep: 1640.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607 [2024-06-10 16:48:57,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.00 | bwd_microstep: 1340.14 | bwd_inner_microstep: 1340.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 16:49:04,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 16:49:04,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.19 | bwd_microstep: 5707.87 | bwd_inner_microstep: 1662.06 | bwd_allreduce_microstep: 4045.75 | step_microstep: 38.21 [2024-06-10 16:49:04,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15585.07 | bwd: 45856.07 | bwd_inner: 41809.40 | bwd_allreduce: 4045.99 | step: 39.68 s/it] 54%|█████▍ | 933/1726 [16:06:32<13:34:00, 61.59s/it] 54%|█████▍ | 934/1726 [16:07:34<13:33:19, 61.62s/it] 54%|█████▍ | 934/1726 [16:07:34<13:33:19, 61.62s/it] 54%|█████▍ | 935/1726 [16:08:35<13:31:25, 61.55s/it] 54%|█████▍ | 935/1726 [16:08:35<13:31:25, 61.55s/it] 54%|█████▍ | 936/1726 [16:09:37<13:29:57, 61.52s/it] 54%|█████▍ | 936/1726 [16:09:37<13:29:57, 61.52s/it] 54%|█████▍ | 937/1726 [16:10:39<13:30:34, 61.64s/it] 54%|█████▍ | 937/1726 [16:10:39<13:30:34, 61.64s/it] 54%|█████▍ | 938/1726 [16:11:40<13:30:03, 61.68s/it] {'loss': 1.2365, 'learning_rate': 1.8163426837218604e-05, 'epoch': 0.54} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 16:49:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1476.50 | bwd_inner_microstep: 1476.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907 [2024-06-10 16:49:08,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.00 | bwd_microstep: 1586.50 | bwd_inner_microstep: 1586.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 16:49:10,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.42 | bwd_microstep: 1548.66 | bwd_inner_microstep: 1548.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778 [2024-06-10 16:49:12,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.28 | bwd_microstep: 1347.60 | bwd_inner_microstep: 1347.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 16:49:14,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1379.80 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 16:49:15,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.44 | bwd_microstep: 679.47 | bwd_inner_microstep: 679.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:49:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2115 [2024-06-10 16:49:18,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.82 | bwd_microstep: 858.19 | bwd_inner_microstep: 858.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886 [2024-06-10 16:49:19,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.49 | bwd_microstep: 680.88 | bwd_inner_microstep: 680.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082 [2024-06-10 16:49:20,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.01 | bwd_microstep: 822.79 | bwd_inner_microstep: 822.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3761 [2024-06-10 16:49:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.44 | bwd_microstep: 1470.88 | bwd_inner_microstep: 1470.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3687 [2024-06-10 16:49:24,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.66 | bwd_microstep: 1658.66 | bwd_inner_microstep: 1658.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944 [2024-06-10 16:49:25,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.42 | bwd_microstep: 759.43 | bwd_inner_microstep: 759.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1969 [2024-06-10 16:49:26,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.09 | bwd_microstep: 852.96 | bwd_inner_microstep: 852.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418 [2024-06-10 16:49:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.46 | bwd_microstep: 1277.30 | bwd_inner_microstep: 1277.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397 [2024-06-10 16:49:30,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.77 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480 [2024-06-10 16:49:32,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 1312.31 | bwd_inner_microstep: 1312.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3515 [2024-06-10 16:49:34,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.76 | bwd_microstep: 1528.17 | bwd_inner_microstep: 1528.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033 [2024-06-10 16:49:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2093 [2024-06-10 16:49:36,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.71 | bwd_microstep: 867.38 | bwd_inner_microstep: 867.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 16:49:38,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1386.68 | bwd_inner_microstep: 1386.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 16:49:40,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1438.92 | bwd_inner_microstep: 1438.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708 [2024-06-10 16:49:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.20 | bwd_microstep: 1435.52 | bwd_inner_microstep: 1435.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 16:49:44,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1405.40 | bwd_inner_microstep: 1405.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-10 16:49:46,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1507.70 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-10 16:49:48,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.24 | bwd_microstep: 1457.94 | bwd_inner_microstep: 1457.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3535 [2024-06-10 16:49:50,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.13 | bwd_microstep: 1521.96 | bwd_inner_microstep: 1521.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 16:49:52,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.22 | bwd_microstep: 1352.62 | bwd_inner_microstep: 1352.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798 [2024-06-10 16:49:54,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1457.54 | bwd_inner_microstep: 1457.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2031 [2024-06-10 16:49:55,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.99 | bwd_microstep: 901.76 | bwd_inner_microstep: 901.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2223 [2024-06-10 16:49:57,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.65 | bwd_microstep: 894.64 | bwd_inner_microstep: 894.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 16:50:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.23 | optimizer_step: 6.57 [2024-06-10 16:50:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 8095.18 | bwd_inner_microstep: 1681.79 | bwd_allreduce_microstep: 6413.33 | step_microstep: 38.22 [2024-06-10 16:50:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14959.40 | bwd: 46561.35 | bwd_inner: 40147.08 | bwd_allreduce: 6413.56 | step: 39.75 {'loss': 1.1699, 'learning_rate': 1.8126054718595553e-05, 'epoch': 0.54} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 16:50:07,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1381.80 | bwd_inner_microstep: 1381.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 16:50:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1239.79 | bwd_inner_microstep: 1239.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 16:50:11,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1340.60 | bwd_inner_microstep: 1340.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 16:50:13,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.17 | bwd_microstep: 1642.53 | bwd_inner_microstep: 1642.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 16:50:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.95 | bwd_microstep: 1554.08 | bwd_inner_microstep: 1554.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4103 [2024-06-10 16:50:18,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.56 | bwd_microstep: 1732.73 | bwd_inner_microstep: 1732.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1895 [2024-06-10 16:50:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.28 | bwd_microstep: 744.65 | bwd_inner_microstep: 744.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 16:50:20,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1244.44 | bwd_inner_microstep: 1244.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 16:50:22,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1249.28 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3720 [2024-06-10 16:50:24,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.87 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3451 [2024-06-10 16:50:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1302.35 | bwd_inner_microstep: 1302.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 16:50:27,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 793.88 | bwd_inner_microstep: 793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2900 [2024-06-10 16:50:29,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.25 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-10 16:50:31,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1508.59 | bwd_inner_microstep: 1508.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 16:50:33,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.10 | bwd_microstep: 1445.51 | bwd_inner_microstep: 1445.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3556 [2024-06-10 16:50:34,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.49 | bwd_microstep: 1199.92 | bwd_inner_microstep: 1199.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3649 [2024-06-10 16:50:36,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1318.89 | bwd_inner_microstep: 1318.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103 [2024-06-10 16:50:38,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.95 | bwd_microstep: 919.56 | bwd_inner_microstep: 919.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 16:50:39,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 16:50:40,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 791.45 | bwd_inner_microstep: 791.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 16:50:42,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015 [2024-06-10 16:50:44,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 897.24 | bwd_inner_microstep: 897.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3479 [2024-06-10 16:50:46,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1424.89 | bwd_inner_microstep: 1424.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3565 [2024-06-10 16:50:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.61 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2668 [2024-06-10 16:50:49,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.14 | bwd_microstep: 958.30 | bwd_inner_microstep: 958.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 16:50:51,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1349.00 | bwd_inner_microstep: 1348.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 16:50:52,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1299.27 | bwd_inner_microstep: 1299.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 16:50:54,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1458.37 | bwd_inner_microstep: 1458.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 16:50:57,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.59 | bwd_microstep: 1517.23 | bwd_inner_microstep: 1517.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3057 [2024-06-10 16:50:58,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.22 | bwd_microstep: 1236.25 | bwd_inner_microstep: 1236.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 16:51:00,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.50 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2231 [2024-06-10 16:51:07,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 16:51:07,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.87 | bwd_microstep: 5748.35 | bwd_inner_microstep: 1090.53 | bwd_allreduce_microstep: 4657.77 | step_microstep: 38.07 [2024-06-10 16:51:07,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15278.06 | bwd: 45509.06 | bwd_inner: 40850.39 | bwd_allreduce: 4658.00 | step: 39.56 {'loss': 1.2266, 'learning_rate': 1.808868919999804e-05, 'epoch': 0.54} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3393 [2024-06-10 16:51:08,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.32 | bwd_microstep: 1235.69 | bwd_inner_microstep: 1235.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3977 [2024-06-10 16:51:11,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.39 | bwd_microstep: 1698.07 | bwd_inner_microstep: 1698.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 16:51:12,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1244.44 | bwd_inner_microstep: 1244.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 16:51:14,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-10 16:51:16,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.69 | bwd_microstep: 1447.88 | bwd_inner_microstep: 1447.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 16:51:18,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1398.47 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567 [2024-06-10 16:51:20,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.87 | bwd_microstep: 1296.72 | bwd_inner_microstep: 1296.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 16:51:22,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 16:51:23,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1290.31 | bwd_inner_microstep: 1290.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015 [2024-06-10 16:51:25,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.93 | bwd_microstep: 895.28 | bwd_inner_microstep: 895.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 16:51:27,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.99 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1971 [2024-06-10 16:51:28,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.75 | bwd_microstep: 825.11 | bwd_inner_microstep: 825.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2781 [2024-06-10 16:51:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.99 | bwd_microstep: 1145.35 | bwd_inner_microstep: 1145.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3070 [2024-06-10 16:51:31,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.11 | bwd_microstep: 1236.08 | bwd_inner_microstep: 1236.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 16:51:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.21 | bwd_microstep: 1442.34 | bwd_inner_microstep: 1442.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904 [2024-06-10 16:51:34,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.29 | bwd_microstep: 685.29 | bwd_inner_microstep: 685.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 16:51:36,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.60 | bwd_microstep: 1614.95 | bwd_inner_microstep: 1614.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2570 [2024-06-10 16:51:38,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.67 | bwd_microstep: 1000.69 | bwd_inner_microstep: 1000.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 16:51:40,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1402.51 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-10 16:51:42,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.64 | bwd_microstep: 1430.65 | bwd_inner_microstep: 1430.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 16:51:44,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1380.36 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791 [2024-06-10 16:51:46,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1554.82 | bwd_inner_microstep: 1554.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808 [2024-06-10 16:51:48,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-10 16:51:49,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.05 | bwd_microstep: 1357.99 | bwd_inner_microstep: 1357.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 16:51:51,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1416.06 | bwd_inner_microstep: 1416.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 16:51:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 16:51:55,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2255 [2024-06-10 16:51:57,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.19 | bwd_microstep: 882.67 | bwd_inner_microstep: 882.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 16:51:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.65 | bwd_microstep: 1422.48 | bwd_inner_microstep: 1422.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 16:52:00,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.05 | bwd_microstep: 726.58 | bwd_inner_microstep: 726.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 16:52:02,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1450.74 | bwd_inner_microstep: 1450.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 16:52:06,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.18 | optimizer_step: 6.59 [2024-06-10 16:52:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.33 | bwd_microstep: 3656.19 | bwd_inner_microstep: 2158.19 | bwd_allreduce_microstep: 1497.95 | step_microstep: 37.72 [2024-06-10 16:52:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15574.54 | bwd: 43580.16 | bwd_inner: 42081.31 | bwd_allreduce: 1498.18 | step: 39.16 {'loss': 1.2691, 'learning_rate': 1.8051330413027227e-05, 'epoch': 0.55} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-10 16:52:08,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.45 | bwd_microstep: 1268.72 | bwd_inner_microstep: 1268.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2639 [2024-06-10 16:52:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.98 | bwd_microstep: 1018.43 | bwd_inner_microstep: 1018.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851 [2024-06-10 16:52:11,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.51 | bwd_microstep: 1662.03 | bwd_inner_microstep: 1662.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3487 [2024-06-10 16:52:13,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1231.72 | bwd_inner_microstep: 1231.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042 [2024-06-10 16:52:14,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.71 | bwd_microstep: 812.05 | bwd_inner_microstep: 812.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 16:52:16,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 16:52:17,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.41 | bwd_microstep: 795.84 | bwd_inner_microstep: 795.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3416 [2024-06-10 16:52:19,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1299.76 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3700 [2024-06-10 16:52:21,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.16 | bwd_microstep: 1659.23 | bwd_inner_microstep: 1659.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 16:52:22,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.12 | bwd_microstep: 684.34 | bwd_inner_microstep: 684.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410 [2024-06-10 16:52:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.83 | bwd_microstep: 1309.78 | bwd_inner_microstep: 1309.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2082 [2024-06-10 16:52:25,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.48 | bwd_microstep: 761.91 | bwd_inner_microstep: 761.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 16:52:27,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 16:52:29,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1297.29 | bwd_inner_microstep: 1297.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997 [2024-06-10 16:52:30,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.71 | bwd_microstep: 896.61 | bwd_inner_microstep: 896.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 16:52:32,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.10 | bwd_microstep: 1447.17 | bwd_inner_microstep: 1447.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389 [2024-06-10 16:52:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.03 | bwd_microstep: 1242.48 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388 [2024-06-10 16:52:36,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.55 | bwd_microstep: 1244.70 | bwd_inner_microstep: 1244.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 16:52:38,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.43 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 16:52:40,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1555.64 | bwd_inner_microstep: 1555.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 16:52:42,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1289.08 | bwd_inner_microstep: 1289.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 16:52:43,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.30 | bwd_microstep: 1200.88 | bwd_inner_microstep: 1200.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3813 [2024-06-10 16:52:45,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1262.52 | bwd_inner_microstep: 1262.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604 [2024-06-10 16:52:45,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 107.34 | bwd_microstep: 259.18 | bwd_inner_microstep: 259.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 16:52:47,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062 [2024-06-10 16:52:48,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.26 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 16:52:50,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1402.97 | bwd_inner_microstep: 1402.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 16:52:53,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3767 [2024-06-10 16:52:55,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.28 | bwd_microstep: 1587.00 | bwd_inner_microstep: 1586.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 16:52:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.13 | bwd_microstep: 1658.87 | bwd_inner_microstep: 1658.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2980 [2024-06-10 16:52:59,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.03 | bwd_microstep: 1198.20 | bwd_inner_microstep: 1198.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882 [2024-06-10 16:53:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 16:53:06,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 7115.01 | bwd_inner_microstep: 810.96 | bwd_allreduce_microstep: 6303.99 | step_microstep: 38.10 [2024-06-10 16:53:06,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14603.60 | bwd: 45292.03 | bwd_inner: 38987.13 | bwd_allreduce: 6304.22 | step: 39.62 {'loss': 1.1832, 'learning_rate': 1.801397848926058e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 16:53:08,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1332.36 | bwd_inner_microstep: 1332.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 16:53:10,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.44 | bwd_microstep: 1145.83 | bwd_inner_microstep: 1145.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-10 16:53:12,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.02 | bwd_microstep: 1351.22 | bwd_inner_microstep: 1351.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3496 [2024-06-10 16:53:13,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.29 | bwd_microstep: 1314.34 | bwd_inner_microstep: 1314.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1863 [2024-06-10 16:53:14,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.64 | bwd_microstep: 706.93 | bwd_inner_microstep: 706.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 16:53:16,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1244.66 | bwd_inner_microstep: 1244.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 16:53:18,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1281.02 | bwd_inner_microstep: 1280.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1948 [2024-06-10 16:53:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.32 | bwd_microstep: 760.54 | bwd_inner_microstep: 760.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2765 [2024-06-10 16:53:20,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.25 | bwd_microstep: 1047.79 | bwd_inner_microstep: 1047.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 16:53:22,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.59 | bwd_microstep: 1150.18 | bwd_inner_microstep: 1150.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 16:53:24,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1283.41 | bwd_inner_microstep: 1283.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3496 [2024-06-10 16:53:26,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1643.88 | bwd_inner_microstep: 1643.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 16:53:28,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1349.50 | bwd_inner_microstep: 1349.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 16:53:30,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.30 | bwd_microstep: 1470.39 | bwd_inner_microstep: 1470.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 16:53:32,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 16:53:33,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.51 | bwd_microstep: 800.55 | bwd_inner_microstep: 800.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 16:53:35,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1376.76 | bwd_inner_microstep: 1376.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1906 [2024-06-10 16:53:36,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 717.11 | bwd_inner_microstep: 717.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2305 [2024-06-10 16:53:37,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.64 | bwd_microstep: 1010.79 | bwd_inner_microstep: 1010.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 16:53:39,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1492.49 | bwd_inner_microstep: 1492.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278 [2024-06-10 16:53:41,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.81 | bwd_microstep: 1070.26 | bwd_inner_microstep: 1070.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2503 [2024-06-10 16:53:42,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.44 | bwd_microstep: 1026.83 | bwd_inner_microstep: 1026.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 16:53:44,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.13 | bwd_microstep: 1189.48 | bwd_inner_microstep: 1189.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3708 [2024-06-10 16:53:46,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.40 | bwd_microstep: 1362.05 | bwd_inner_microstep: 1362.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 16:53:47,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.52 | bwd_microstep: 1296.00 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3591 [2024-06-10 16:53:49,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.77 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 16:53:52,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1501.37 | bwd_inner_microstep: 1501.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 16:53:54,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.93 | bwd_microstep: 1495.49 | bwd_inner_microstep: 1495.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 16:53:56,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1399.27 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 16:53:57,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1358.16 | bwd_inner_microstep: 1358.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580 [2024-06-10 16:54:00,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.79 | bwd_microstep: 1692.31 | bwd_inner_microstep: 1692.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 16:54:08,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.29 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 16:54:08,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 7189.24 | bwd_inner_microstep: 1682.00 | bwd_allreduce_microstep: 5507.18 | step_microstep: 38.72 [2024-06-10 16:54:08,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.43 | bwd: 45905.09 | bwd_inner: 40397.00 | bwd_allreduce: 5507.41 | step: 40.17 {'loss': 1.2168, 'learning_rate': 1.797663356025136e-05, 'epoch': 0.55} 54%|█████▍ | 938/1726 [16:11:40<13:30:03, 61.68s/it] 54%|█████▍ | 939/1726 [16:12:42<13:29:42, 61.73s/it] 54%|█████▍ | 939/1726 [16:12:42<13:29:42, 61.73s/it] 54%|█████▍ | 940/1726 [16:13:43<13:26:14, 61.55s/it] 54%|█████▍ | 940/1726 [16:13:43<13:26:14, 61.55s/it] 55%|█████▍ | 941/1726 [16:14:43<13:17:05, 60.92s/it] 55%|█████▍ | 941/1726 [16:14:43<13:17:05, 60.92s/it] 55%|█████▍ | 942/1726 [16:15:43<13:13:20, 60.71s/it] 55%|█████▍ | 942/1726 [16:15:43<13:13:20, 60.71s/it] 55%|█████▍ | 943/1726 [16:16:44<13:14:48, 60.90s/it] 55%|█████▍ dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 16:54:09,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1375.01 | bwd_inner_microstep: 1374.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872 [2024-06-10 16:54:12,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.88 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 16:54:14,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1377.20 | bwd_inner_microstep: 1377.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788 [2024-06-10 16:54:16,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.82 | bwd_microstep: 1451.15 | bwd_inner_microstep: 1451.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4082 [2024-06-10 16:54:18,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1526.74 | bwd_inner_microstep: 1526.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-10 16:54:20,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1549.52 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-10 16:54:21,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.87 | bwd_microstep: 812.28 | bwd_inner_microstep: 812.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3706 [2024-06-10 16:54:23,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.33 | bwd_microstep: 1264.24 | bwd_inner_microstep: 1264.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3739 [2024-06-10 16:54:25,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1727.92 | bwd_inner_microstep: 1727.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2053 [2024-06-10 16:54:26,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.79 | bwd_microstep: 875.64 | bwd_inner_microstep: 875.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:54:28,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.38 | bwd_microstep: 1378.27 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 16:54:30,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-10 16:54:32,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.27 | bwd_microstep: 1615.94 | bwd_inner_microstep: 1615.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 16:54:34,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1251.95 | bwd_inner_microstep: 1251.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1916 [2024-06-10 16:54:35,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.26 | bwd_microstep: 750.24 | bwd_inner_microstep: 750.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3432 [2024-06-10 16:54:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.54 | bwd_microstep: 1230.95 | bwd_inner_microstep: 1230.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 16:54:39,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2120 [2024-06-10 16:54:40,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.66 | bwd_microstep: 734.17 | bwd_inner_microstep: 734.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 16:54:41,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1278.30 | bwd_inner_microstep: 1278.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 16:54:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 1531.30 | bwd_inner_microstep: 1531.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605 [2024-06-10 16:54:45,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1307.37 | bwd_inner_microstep: 1307.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 16:54:47,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1507.31 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 16:54:49,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1396.93 | bwd_inner_microstep: 1396.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 16:54:51,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2085 [2024-06-10 16:54:52,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.95 | bwd_microstep: 769.55 | bwd_inner_microstep: 769.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-10 16:54:54,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1201.19 | bwd_inner_microstep: 1201.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 16:54:56,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.80 | bwd_microstep: 1498.80 | bwd_inner_microstep: 1498.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 16:54:58,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1557.25 | bwd_inner_microstep: 1557.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824 [2024-06-10 16:55:00,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 1580.25 | bwd_inner_microstep: 1580.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 16:55:02,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1414.02 | bwd_inner_microstep: 1414.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3644 [2024-06-10 16:55:05,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.48 | bwd_microstep: 1708.64 | bwd_inner_microstep: 1708.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 16:55:08,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 16:55:08,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.01 | bwd_microstep: 2372.25 | bwd_inner_microstep: 1774.55 | bwd_allreduce_microstep: 597.65 | step_microstep: 37.76 [2024-06-10 16:55:08,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16033.12 | bwd: 43629.82 | bwd_inner: 43031.27 | bwd_allreduce: 597.87 | step: 39.18 {'loss': 1.2336, 'learning_rate': 1.7939295757528225e-05, 'epoch': 0.55} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1996 [2024-06-10 16:55:09,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.29 | bwd_microstep: 860.72 | bwd_inner_microstep: 860.66 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911 [2024-06-10 16:55:11,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.84 | bwd_microstep: 1688.78 | bwd_inner_microstep: 1688.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 16:55:13,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1552.44 | bwd_inner_microstep: 1552.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 16:55:15,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1495.78 | bwd_inner_microstep: 1495.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 16:55:17,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1252.83 | bwd_inner_microstep: 1252.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 16:55:19,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.14 | bwd_microstep: 1533.67 | bwd_inner_microstep: 1533.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 16:55:21,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1389.57 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 16:55:23,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.08 | bwd_microstep: 1389.02 | bwd_inner_microstep: 1388.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 16:55:25,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.43 | bwd_microstep: 1248.55 | bwd_inner_microstep: 1248.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 16:55:27,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 16:55:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.21 | bwd_microstep: 793.42 | bwd_inner_microstep: 793.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 16:55:30,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1620.85 | bwd_inner_microstep: 1620.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 16:55:32,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.94 | bwd_microstep: 1475.40 | bwd_inner_microstep: 1475.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857 [2024-06-10 16:55:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.58 | bwd_microstep: 675.51 | bwd_inner_microstep: 675.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 16:55:35,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.56 | bwd_microstep: 1489.45 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 16:55:37,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.01 | bwd_microstep: 1301.34 | bwd_inner_microstep: 1301.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404 [2024-06-10 16:55:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.25 | bwd_microstep: 1306.54 | bwd_inner_microstep: 1306.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-10 16:55:40,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.73 | bwd_microstep: 697.50 | bwd_inner_microstep: 697.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2644 [2024-06-10 16:55:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.05 | bwd_microstep: 1115.88 | bwd_inner_microstep: 1115.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 16:55:43,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1256.05 | bwd_inner_microstep: 1256.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 16:55:45,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1391.48 | bwd_inner_microstep: 1391.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461 [2024-06-10 16:55:46,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1182.21 | bwd_inner_microstep: 1182.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 16:55:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1653.44 | bwd_inner_microstep: 1653.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558 [2024-06-10 16:55:51,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1429.47 | bwd_inner_microstep: 1429.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 16:55:53,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.05 | bwd_microstep: 1509.77 | bwd_inner_microstep: 1509.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 16:55:55,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.06 | bwd_microstep: 1544.31 | bwd_inner_microstep: 1544.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479 [2024-06-10 16:55:57,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.65 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820 [2024-06-10 16:55:59,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.87 | bwd_microstep: 1581.35 | bwd_inner_microstep: 1581.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3826 [2024-06-10 16:56:01,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.69 | bwd_microstep: 1510.08 | bwd_inner_microstep: 1510.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 16:56:03,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.76 | bwd_microstep: 1460.28 | bwd_inner_microstep: 1460.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585 [2024-06-10 16:56:05,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1305.76 | bwd_inner_microstep: 1305.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816 [2024-06-10 16:56:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.60 [2024-06-10 16:56:08,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 2400.33 | bwd_inner_microstep: 2052.06 | bwd_allreduce_microstep: 348.23 | step_microstep: 37.50 [2024-06-10 16:56:08,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16111.10 | bwd: 43677.29 | bwd_inner: 43328.13 | bwd_allreduce: 348.47 | step: 39.00 {'loss': 1.2113, 'learning_rate': 1.790196521259472e-05, 'epoch': 0.55} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144 [2024-06-10 16:56:09,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.51 | bwd_microstep: 920.84 | bwd_inner_microstep: 920.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878 [2024-06-10 16:56:11,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.13 | bwd_microstep: 1681.78 | bwd_inner_microstep: 1681.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 16:56:14,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.62 | bwd_microstep: 1653.15 | bwd_inner_microstep: 1653.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 16:56:15,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1244.85 | bwd_inner_microstep: 1244.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 16:56:17,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.96 | bwd_microstep: 1297.14 | bwd_inner_microstep: 1297.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 16:56:19,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1248.77 | bwd_inner_microstep: 1248.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 16:56:21,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1245.54 | bwd_inner_microstep: 1245.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592 [2024-06-10 16:56:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.24 | bwd_microstep: 1308.27 | bwd_inner_microstep: 1308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480 [2024-06-10 16:56:24,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.05 | bwd_microstep: 1410.83 | bwd_inner_microstep: 1410.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 16:56:26,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1446.38 | bwd_inner_microstep: 1446.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 16:56:29,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.57 | bwd_microstep: 1714.62 | bwd_inner_microstep: 1714.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 16:56:31,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3021 [2024-06-10 16:56:32,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.43 | bwd_microstep: 1229.05 | bwd_inner_microstep: 1229.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284 [2024-06-10 16:56:34,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.07 | bwd_microstep: 1068.49 | bwd_inner_microstep: 1068.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3471 [2024-06-10 16:56:36,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1421.81 | bwd_inner_microstep: 1421.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 16:56:38,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1519.71 | bwd_inner_microstep: 1519.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 16:56:40,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 16:56:42,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1507.67 | bwd_inner_microstep: 1507.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987 [2024-06-10 16:56:43,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.73 | bwd_microstep: 706.70 | bwd_inner_microstep: 706.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-10 16:56:45,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1482.96 | bwd_inner_microstep: 1482.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 16:56:47,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.79 | bwd_microstep: 1336.12 | bwd_inner_microstep: 1336.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425 [2024-06-10 16:56:48,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.97 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433 [2024-06-10 16:56:50,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1188.00 | bwd_inner_microstep: 1187.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083 [2024-06-10 16:56:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.25 | bwd_microstep: 916.65 | bwd_inner_microstep: 916.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-10 16:56:54,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1605.38 | bwd_inner_microstep: 1605.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 16:56:55,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1381.16 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 16:56:57,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1253.52 | bwd_inner_microstep: 1253.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3813 [2024-06-10 16:57:00,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.78 | bwd_microstep: 1859.93 | bwd_inner_microstep: 1859.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3563 [2024-06-10 16:57:02,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1525.39 | bwd_inner_microstep: 1525.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3760 [2024-06-10 16:57:04,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1307.71 | bwd_inner_microstep: 1307.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2230 [2024-06-10 16:57:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.91 | bwd_microstep: 962.15 | bwd_inner_microstep: 962.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 16:57:08,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 16:57:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 2645.59 | bwd_inner_microstep: 1524.54 | bwd_allreduce_microstep: 1121.01 | step_microstep: 37.60 [2024-06-10 16:57:08,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15993.07 | bwd: 44142.36 | bwd_inner: 43020.43 | bwd_allreduce: 1121.24 | step: 39.02 {'loss': 1.2606, 'learning_rate': 1.7864642056928823e-05, 'epoch': 0.55} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 16:57:10,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2623 [2024-06-10 16:57:11,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.31 | bwd_microstep: 1049.76 | bwd_inner_microstep: 1049.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515 [2024-06-10 16:57:13,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1247.78 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480 [2024-06-10 16:57:15,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.25 | bwd_microstep: 1404.93 | bwd_inner_microstep: 1404.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-10 16:57:17,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.40 | bwd_microstep: 1644.79 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506 [2024-06-10 16:57:19,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.62 | bwd_microstep: 1551.57 | bwd_inner_microstep: 1551.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 16:57:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408 [2024-06-10 16:57:23,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.94 | bwd_microstep: 1179.07 | bwd_inner_microstep: 1179.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549 [2024-06-10 16:57:25,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.55 | bwd_microstep: 1198.51 | bwd_inner_microstep: 1198.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 16:57:27,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196 [2024-06-10 16:57:28,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 954.65 | bwd_inner_microstep: 954.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-10 16:57:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3464 [2024-06-10 16:57:32,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1324.18 | bwd_inner_microstep: 1324.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-10 16:57:34,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1511.50 | bwd_inner_microstep: 1511.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669 [2024-06-10 16:57:36,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1556.10 | bwd_inner_microstep: 1556.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970 [2024-06-10 16:57:37,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.13 | bwd_microstep: 890.39 | bwd_inner_microstep: 890.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-10 16:57:39,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1334.27 | bwd_inner_microstep: 1334.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879 [2024-06-10 16:57:40,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.63 | bwd_microstep: 678.92 | bwd_inner_microstep: 678.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3375 [2024-06-10 16:57:42,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.85 | bwd_microstep: 1271.82 | bwd_inner_microstep: 1271.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 16:57:44,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.48 | bwd_microstep: 1485.15 | bwd_inner_microstep: 1485.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3416 [2024-06-10 16:57:46,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1309.07 | bwd_inner_microstep: 1309.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 16:57:47,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.07 | bwd_microstep: 795.82 | bwd_inner_microstep: 795.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 16:57:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.21 | bwd_microstep: 1497.70 | bwd_inner_microstep: 1497.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 16:57:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1350.81 | bwd_inner_microstep: 1350.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 16:57:52,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.55 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 16:57:54,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1358.92 | bwd_inner_microstep: 1358.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 16:57:56,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1458.21 | bwd_inner_microstep: 1458.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 16:57:57,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1389.77 | bwd_inner_microstep: 1389.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 16:57:59,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1451.61 | bwd_inner_microstep: 1451.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 16:58:01,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 973.04 | bwd_inner_microstep: 973.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 16:58:03,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.23 | bwd_microstep: 1551.51 | bwd_inner_microstep: 1551.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430 [2024-06-10 16:58:10,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 16:58:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 6424.32 | bwd_inner_microstep: 1606.59 | bwd_allreduce_microstep: 4817.66 | step_microstep: 38.93 [2024-06-10 16:58:10,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15397.32 | bwd: 46086.01 | bwd_inner: 41267.44 | bwd_allreduce: 4817.90 | step: 40.42 {'loss': 1.1439, 'learning_rate': 1.7827326421982513e-05, 'epoch': 0.55} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456 [2024-06-10 16:58:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1537.97 | bwd_inner_microstep: 1537.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3891 [2024-06-10 16:58:14,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.48 | bwd_microstep: 1479.21 | bwd_inner_microstep: 1479.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 16:58:16,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1343.51 | bwd_inner_microstep: 1343.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939 [2024-06-10 16:58:17,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.16 | bwd_microstep: 822.21 | bwd_inner_microstep: 822.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1355 [2024-06-10 16:58:18,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 221.49 | bwd_microstep: 581.81 | bwd_inner_microstep: 581.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4091 [2024-06-10 16:58:20,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.07 | bwd_microstep: 1693.03 | bwd_inner_microstep: 1693.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 16:58:22,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1523.78 | bwd_inner_microstep: 1523.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-10 16:58:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.59 | bwd_microstep: 1156.15 | bwd_inner_microstep: 1156.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 16:58:26,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1486.36 | bwd_inner_microstep: 1486.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403 [2024-06-10 16:58:28,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.87 | bwd_microstep: 1215.75 | bwd_inner_microstep: 1215.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 16:58:30,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1522.71 | bwd_inner_microstep: 1522.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3614 [2024-06-10 16:58:32,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.03 | bwd_microstep: 1446.98 | bwd_inner_microstep: 1446.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 16:58:34,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963 [2024-06-10 16:58:35,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.24 | bwd_microstep: 828.33 | bwd_inner_microstep: 828.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506 [2024-06-10 16:58:37,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1318.61 | bwd_inner_microstep: 1318.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 16:58:39,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1390.62 | bwd_inner_microstep: 1390.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651 [2024-06-10 16:58:41,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.85 | bwd_microstep: 1543.52 | bwd_inner_microstep: 1543.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446 [2024-06-10 16:58:43,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1414.06 | bwd_inner_microstep: 1414.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 16:58:45,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.85 | bwd_microstep: 1648.04 | bwd_inner_microstep: 1648.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009 [2024-06-10 16:58:46,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.29 | bwd_microstep: 899.06 | bwd_inner_microstep: 899.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076 [2024-06-10 16:58:47,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 819.64 | bwd_inner_microstep: 819.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 16:58:49,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1347.08 | bwd_inner_microstep: 1347.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 16:58:51,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.01 | bwd_microstep: 1415.41 | bwd_inner_microstep: 1415.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 16:58:53,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.20 | bwd_microstep: 1179.48 | bwd_inner_microstep: 1179.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 16:58:55,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.17 | bwd_microstep: 1409.99 | bwd_inner_microstep: 1409.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035 [2024-06-10 16:58:56,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044 [2024-06-10 16:58:57,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.75 | bwd_microstep: 810.90 | bwd_inner_microstep: 810.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-10 16:58:59,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1398.15 | bwd_inner_microstep: 1398.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3812 [2024-06-10 16:59:01,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1357.47 | bwd_inner_microstep: 1357.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 16:59:03,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1502.26 | bwd_inner_microstep: 1502.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580 [2024-06-10 16:59:05,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.14 | bwd_microstep: 1602.11 | bwd_inner_microstep: 1602.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 16:59:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 16:59:12,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 6051.38 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 4605.75 | step_microstep: 38.11 [2024-06-10 16:59:12,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.56 | bwd: 45948.47 | bwd_inner: 41341.82 | bwd_allreduce: 4605.97 | step: 39.61 {'loss': 1.2524, 'learning_rate': 1.7790018439181243e-05, 'epoch': 0.55} | 943/1726 [16:16:44<13:14:48, 60.90s/it] 55%|█████▍ | 944/1726 [16:17:44<13:10:13, 60.63s/it] 55%|█████▍ | 944/1726 [16:17:44<13:10:13, 60.63s/it] 55%|█████▍ | 945/1726 [16:18:44<13:07:13, 60.48s/it] 55%|█████▍ | 945/1726 [16:18:44<13:07:13, 60.48s/it] 55%|█████▍ | 946/1726 [16:19:45<13:06:10, 60.48s/it] 55%|█████▍ | 946/1726 [16:19:45<13:06:10, 60.48s/it] 55%|█████▍ | 947/1726 [16:20:47<13:10:25, 60.88s/it] 55%|█████▍ | 947/1726 [16:20:47<13:10:25, 60.88s/it] 55%|█████▍ | 948/1726 [16:21:48<13:12:36, 61.13s/it] 55%|█████▍ | 948/1726 [16:21:48<13:12:36, 61.13s/it]dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3447 [2024-06-10 16:59:14,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.78 | bwd_microstep: 1541.46 | bwd_inner_microstep: 1541.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3928 [2024-06-10 16:59:16,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.15 | bwd_microstep: 1587.63 | bwd_inner_microstep: 1587.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 16:59:18,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1383.47 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 16:59:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1479.49 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 16:59:22,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.77 | bwd_microstep: 1244.24 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952 [2024-06-10 16:59:23,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.26 | bwd_microstep: 728.30 | bwd_inner_microstep: 728.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 16:59:25,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1388.30 | bwd_inner_microstep: 1388.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 16:59:26,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1282.77 | bwd_inner_microstep: 1282.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906 [2024-06-10 16:59:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.28 | bwd_microstep: 684.68 | bwd_inner_microstep: 684.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3690 [2024-06-10 16:59:29,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3675 [2024-06-10 16:59:31,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.36 | bwd_microstep: 1356.14 | bwd_inner_microstep: 1356.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661 [2024-06-10 16:59:33,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1484.40 | bwd_inner_microstep: 1484.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2978 [2024-06-10 16:59:35,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.21 | bwd_microstep: 1195.87 | bwd_inner_microstep: 1195.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684 [2024-06-10 16:59:37,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1718.72 | bwd_inner_microstep: 1718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465 [2024-06-10 16:59:39,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1329.53 | bwd_inner_microstep: 1329.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922 [2024-06-10 16:59:40,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.89 | bwd_microstep: 818.54 | bwd_inner_microstep: 818.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463 [2024-06-10 16:59:42,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.20 | bwd_microstep: 1241.98 | bwd_inner_microstep: 1241.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 16:59:44,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.84 | bwd_microstep: 1386.18 | bwd_inner_microstep: 1386.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3660 [2024-06-10 16:59:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1323.26 | bwd_inner_microstep: 1323.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-10 16:59:48,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.61 | bwd_microstep: 1532.34 | bwd_inner_microstep: 1532.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-10 16:59:50,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.33 | bwd_microstep: 1628.26 | bwd_inner_microstep: 1628.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865 [2024-06-10 16:59:51,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.15 | bwd_microstep: 739.27 | bwd_inner_microstep: 739.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 16:59:53,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1560.17 | bwd_inner_microstep: 1560.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 16:59:55,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.86 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 16:59:58,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1603.07 | bwd_inner_microstep: 1603.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291 [2024-06-10 16:59:59,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.01 | bwd_microstep: 1071.31 | bwd_inner_microstep: 1071.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806 [2024-06-10 17:00:01,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.08 | bwd_microstep: 1749.89 | bwd_inner_microstep: 1749.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-10 17:00:03,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.34 | bwd_microstep: 973.15 | bwd_inner_microstep: 973.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578 [2024-06-10 17:00:05,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1462.80 | bwd_inner_microstep: 1462.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4154 [2024-06-10 17:00:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.27 | bwd_microstep: 1552.71 | bwd_inner_microstep: 1552.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 17:00:09,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1651.38 | bwd_inner_microstep: 1651.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 17:00:15,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 17:00:15,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.46 | bwd_microstep: 5095.53 | bwd_inner_microstep: 900.41 | bwd_allreduce_microstep: 4195.06 | step_microstep: 37.96 [2024-06-10 17:00:15,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15843.24 | bwd: 46760.38 | bwd_inner: 42564.41 | bwd_allreduce: 4195.29 | step: 39.41 {'loss': 1.2015, 'learning_rate': 1.775271823992354e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 17:00:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1392.54 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4386 [2024-06-10 17:00:19,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.59 | bwd_microstep: 1710.79 | bwd_inner_microstep: 1710.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 17:00:20,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.81 | bwd_microstep: 790.67 | bwd_inner_microstep: 790.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 17:00:22,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1248.25 | bwd_inner_microstep: 1248.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2674 [2024-06-10 17:00:23,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.64 | bwd_microstep: 1024.54 | bwd_inner_microstep: 1024.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 17:00:25,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.12 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3717 [2024-06-10 17:00:27,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1270.77 | bwd_inner_microstep: 1270.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 17:00:29,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1623.42 | bwd_inner_microstep: 1623.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 17:00:31,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1482.79 | bwd_inner_microstep: 1482.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 17:00:33,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1480.52 | bwd_inner_microstep: 1480.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 17:00:35,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1285.09 | bwd_inner_microstep: 1285.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 17:00:36,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3074 [2024-06-10 17:00:38,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.81 | bwd_microstep: 1238.52 | bwd_inner_microstep: 1238.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406 [2024-06-10 17:00:40,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.96 | bwd_microstep: 1309.28 | bwd_inner_microstep: 1309.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 17:00:42,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1394.73 | bwd_inner_microstep: 1394.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651 [2024-06-10 17:00:44,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.86 | bwd_microstep: 1323.20 | bwd_inner_microstep: 1323.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2188 [2024-06-10 17:00:45,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.38 | bwd_microstep: 858.32 | bwd_inner_microstep: 858.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-10 17:00:47,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.22 | bwd_microstep: 1532.58 | bwd_inner_microstep: 1532.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-10 17:00:49,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1319.44 | bwd_inner_microstep: 1319.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 17:00:51,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.42 | bwd_microstep: 1291.53 | bwd_inner_microstep: 1291.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 17:00:53,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3703 [2024-06-10 17:00:55,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1233.54 | bwd_inner_microstep: 1233.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 17:00:57,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1499.97 | bwd_inner_microstep: 1499.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003 [2024-06-10 17:00:58,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.39 | bwd_microstep: 740.15 | bwd_inner_microstep: 740.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 17:00:59,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 17:01:02,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.71 | bwd_microstep: 1540.55 | bwd_inner_microstep: 1540.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 17:01:03,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1282.92 | bwd_inner_microstep: 1282.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565 [2024-06-10 17:01:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.71 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-10 17:01:06,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.17 | bwd_microstep: 702.22 | bwd_inner_microstep: 702.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 17:01:08,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1485.62 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 17:01:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.45 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596 [2024-06-10 17:01:15,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.32 | optimizer_step: 6.59 [2024-06-10 17:01:15,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.06 | bwd_microstep: 4903.99 | bwd_inner_microstep: 1814.54 | bwd_allreduce_microstep: 3089.39 | step_microstep: 39.50 [2024-06-10 17:01:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.06 | bwd: 44889.52 | bwd_inner: 41799.12 | bwd_allreduce: 3089.67 | step: 41.04 {'loss': 1.193, 'learning_rate': 1.7715425955580512e-05, 'epoch': 0.55} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924 [2024-06-10 17:01:17,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.16 | bwd_microstep: 839.31 | bwd_inner_microstep: 839.18 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931 [2024-06-10 17:01:19,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 17:01:21,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.25 | bwd_microstep: 1551.62 | bwd_inner_microstep: 1551.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3875 [2024-06-10 17:01:23,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.87 | bwd_microstep: 1680.04 | bwd_inner_microstep: 1680.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 17:01:25,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1376.57 | bwd_inner_microstep: 1376.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 17:01:27,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.21 | bwd_microstep: 1478.85 | bwd_inner_microstep: 1478.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 17:01:29,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 17:01:31,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.57 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-10 17:01:33,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.87 | bwd_microstep: 1526.60 | bwd_inner_microstep: 1526.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 17:01:35,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1496.85 | bwd_inner_microstep: 1496.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 17:01:37,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1482.88 | bwd_inner_microstep: 1482.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 17:01:39,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1286.36 | bwd_inner_microstep: 1286.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 17:01:40,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 798.12 | bwd_inner_microstep: 798.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 17:01:41,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.35 | bwd_microstep: 1277.65 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3511 [2024-06-10 17:01:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3246 [2024-06-10 17:01:45,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.09 | bwd_microstep: 1278.63 | bwd_inner_microstep: 1278.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612 [2024-06-10 17:01:47,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.12 | bwd_microstep: 1431.06 | bwd_inner_microstep: 1431.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 17:01:48,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 790.99 | bwd_inner_microstep: 790.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3537 [2024-06-10 17:01:51,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1593.07 | bwd_inner_microstep: 1593.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-10 17:01:53,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.54 | bwd_microstep: 1619.64 | bwd_inner_microstep: 1619.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1958 [2024-06-10 17:01:54,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.11 | bwd_microstep: 736.22 | bwd_inner_microstep: 736.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 560 [2024-06-10 17:01:54,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.09 | bwd_microstep: 247.79 | bwd_inner_microstep: 247.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 17:01:56,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1395.00 | bwd_inner_microstep: 1394.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 17:01:58,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1609.22 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 17:02:00,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1493.05 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566 [2024-06-10 17:02:02,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1549.28 | bwd_inner_microstep: 1549.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 17:02:04,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1400.37 | bwd_inner_microstep: 1400.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787 [2024-06-10 17:02:07,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.71 | bwd_microstep: 1653.42 | bwd_inner_microstep: 1653.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 17:02:08,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.71 | bwd_microstep: 970.44 | bwd_inner_microstep: 970.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2023 [2024-06-10 17:02:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.76 | bwd_microstep: 807.81 | bwd_inner_microstep: 807.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 17:02:11,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1497.87 | bwd_inner_microstep: 1497.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 17:02:17,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-10 17:02:17,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 4940.49 | bwd_inner_microstep: 1687.20 | bwd_allreduce_microstep: 3253.24 | step_microstep: 38.45 [2024-06-10 17:02:17,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15641.87 | bwd: 45271.35 | bwd_inner: 42017.11 | bwd_allreduce: 3253.52 | step: 39.92 {'loss': 1.2385, 'learning_rate': 1.7678141717495394e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 17:02:19,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1373.31 | bwd_inner_microstep: 1373.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3924 [2024-06-10 17:02:21,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1590.81 | bwd_inner_microstep: 1590.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-10 17:02:22,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.09 | bwd_microstep: 1156.21 | bwd_inner_microstep: 1156.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3791 [2024-06-10 17:02:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.52 | bwd_microstep: 1444.89 | bwd_inner_microstep: 1444.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 17:02:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 17:02:28,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.16 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 17:02:30,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225 [2024-06-10 17:02:32,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.31 | bwd_microstep: 959.33 | bwd_inner_microstep: 959.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656 [2024-06-10 17:02:33,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.47 | bwd_microstep: 1022.61 | bwd_inner_microstep: 1022.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502 [2024-06-10 17:02:35,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1417.43 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 17:02:37,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.37 | bwd_microstep: 1483.22 | bwd_inner_microstep: 1483.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 17:02:39,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.25 | bwd_microstep: 1512.55 | bwd_inner_microstep: 1512.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 17:02:41,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.70 | bwd_microstep: 1282.57 | bwd_inner_microstep: 1282.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486 [2024-06-10 17:02:43,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1314.61 | bwd_inner_microstep: 1314.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-10 17:02:45,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1445.85 | bwd_inner_microstep: 1445.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 17:02:47,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1606.40 | bwd_inner_microstep: 1606.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 17:02:49,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.61 | bwd_inner_microstep: 1410.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-10 17:02:51,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1314.39 | bwd_inner_microstep: 1314.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666 [2024-06-10 17:02:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1323.62 | bwd_inner_microstep: 1323.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534 [2024-06-10 17:02:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1228.95 | bwd_inner_microstep: 1228.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 17:02:56,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1414.33 | bwd_inner_microstep: 1414.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 17:02:58,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 17:03:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1397.72 | bwd_inner_microstep: 1397.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-10 17:03:01,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 700.24 | bwd_inner_microstep: 700.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828 [2024-06-10 17:03:03,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.26 | bwd_microstep: 1359.72 | bwd_inner_microstep: 1359.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3815 [2024-06-10 17:03:05,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1506.13 | bwd_inner_microstep: 1506.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3865 [2024-06-10 17:03:07,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1401.00 | bwd_inner_microstep: 1400.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2039 [2024-06-10 17:03:08,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.04 | bwd_microstep: 782.68 | bwd_inner_microstep: 782.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 17:03:10,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.79 | bwd_microstep: 1496.32 | bwd_inner_microstep: 1496.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-10 17:03:12,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1300.04 | bwd_inner_microstep: 1300.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-10 17:03:14,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.64 | bwd_microstep: 1598.88 | bwd_inner_microstep: 1598.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3593 [2024-06-10 17:03:18,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.63 [2024-06-10 17:03:18,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 3069.94 | bwd_inner_microstep: 1768.81 | bwd_allreduce_microstep: 1301.08 | step_microstep: 37.94 [2024-06-10 17:03:18,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16093.31 | bwd: 44429.23 | bwd_inner: 43127.25 | bwd_allreduce: 1301.31 | step: 39.47 {'loss': 1.2139, 'learning_rate': 1.7640865656983084e-05, 'epoch': 0.55} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3486 [2024-06-10 17:03:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.57 | bwd_microstep: 1574.70 | bwd_inner_microstep: 1574.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 17:03:21,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.33 | bwd_inner_microstep: 792.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470 [2024-06-10 17:03:23,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.78 | bwd_microstep: 1341.02 | bwd_inner_microstep: 1340.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786 [2024-06-10 17:03:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.13 | bwd_microstep: 1644.41 | bwd_inner_microstep: 1644.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224 [2024-06-10 17:03:26,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 956.14 | bwd_inner_microstep: 956.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 17:03:27,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 796.46 | bwd_inner_microstep: 796.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 17:03:29,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 17:03:31,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.44 | bwd_inner_microstep: 1245.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505 [2024-06-10 17:03:33,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.58 | bwd_microstep: 1190.10 | bwd_inner_microstep: 1190.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 17:03:34,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.05 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2116 [2024-06-10 17:03:36,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.38 | bwd_microstep: 929.01 | bwd_inner_microstep: 928.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 17:03:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1480.68 | bwd_inner_microstep: 1480.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955 [2024-06-10 17:03:39,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 825.72 | bwd_inner_microstep: 825.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1904 [2024-06-10 17:03:40,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.37 | bwd_microstep: 714.22 | bwd_inner_microstep: 714.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602 [2024-06-10 17:03:42,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1432.65 | bwd_inner_microstep: 1432.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-10 17:03:43,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.05 | bwd_microstep: 800.10 | bwd_inner_microstep: 800.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 17:03:45,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1353.18 | bwd_inner_microstep: 1353.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430 [2024-06-10 17:03:46,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.65 | bwd_microstep: 1192.76 | bwd_inner_microstep: 1192.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 17:03:48,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1470.05 | bwd_inner_microstep: 1470.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3529 [2024-06-10 17:03:50,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1399.40 | bwd_inner_microstep: 1399.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 17:03:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.55 | bwd_microstep: 1404.31 | bwd_inner_microstep: 1404.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 17:03:55,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.78 | bwd_microstep: 1658.16 | bwd_inner_microstep: 1658.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 17:03:56,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.61 | bwd_microstep: 806.59 | bwd_inner_microstep: 806.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 17:03:58,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2288 [2024-06-10 17:03:59,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.12 | bwd_microstep: 1022.55 | bwd_inner_microstep: 1022.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 17:04:01,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1652.75 | bwd_inner_microstep: 1652.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 17:04:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3779 [2024-06-10 17:04:05,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1384.33 | bwd_inner_microstep: 1384.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 17:04:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1637.11 | bwd_inner_microstep: 1637.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-10 17:04:09,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1653.70 | bwd_inner_microstep: 1653.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 17:04:11,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 17:04:20,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.27 | optimizer_step: 6.57 [2024-06-10 17:04:20,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 7826.85 | bwd_inner_microstep: 1323.82 | bwd_allreduce_microstep: 6502.97 | step_microstep: 38.48 [2024-06-10 17:04:20,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15047.78 | bwd: 46789.80 | bwd_inner: 40285.82 | bwd_allreduce: 6503.27 | step: 40.01 {'loss': 1.2357, 'learning_rate': 1.7603597905329658e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 17:04:22,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.46 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 17:04:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1240.92 | bwd_inner_microstep: 1240.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4242 [2024-06-10 17:04:26,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.70 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 17:04:27,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1244.54 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 17:04:29,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.52 | bwd_microstep: 1538.00 | bwd_inner_microstep: 1537.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 17:04:31,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.98 | bwd_microstep: 1275.63 | bwd_inner_microstep: 1275.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866 [2024-06-10 17:04:32,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.46 | bwd_microstep: 676.84 | bwd_inner_microstep: 676.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2886 [2024-06-10 17:04:34,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.41 | bwd_microstep: 1181.81 | bwd_inner_microstep: 1181.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 17:04:36,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.34 | bwd_microstep: 1283.56 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 17:04:38,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.17 | bwd_microstep: 1485.22 | bwd_inner_microstep: 1485.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 17:04:39,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1388.43 | bwd_inner_microstep: 1388.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425 [2024-06-10 17:04:41,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.25 | bwd_microstep: 1158.37 | bwd_inner_microstep: 1158.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3647 [2024-06-10 17:04:43,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1279.48 | bwd_inner_microstep: 1279.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969 [2024-06-10 17:04:44,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 895.91 | bwd_inner_microstep: 895.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3533 [2024-06-10 17:04:46,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.17 | bwd_microstep: 1554.28 | bwd_inner_microstep: 1554.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3627 [2024-06-10 17:04:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.44 | bwd_microstep: 1454.69 | bwd_inner_microstep: 1454.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 17:04:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.00 | bwd_microstep: 1521.93 | bwd_inner_microstep: 1521.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 17:04:52,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1477.46 | bwd_inner_microstep: 1477.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 17:04:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.01 | bwd_microstep: 1334.32 | bwd_inner_microstep: 1334.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643 [2024-06-10 17:04:56,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3559 [2024-06-10 17:04:58,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.20 | bwd_microstep: 1360.17 | bwd_inner_microstep: 1360.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437 [2024-06-10 17:05:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1284.06 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 17:05:02,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1353.00 | bwd_inner_microstep: 1352.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 17:05:04,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.39 | bwd_microstep: 1443.54 | bwd_inner_microstep: 1443.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2223 [2024-06-10 17:05:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.94 | bwd_microstep: 863.09 | bwd_inner_microstep: 863.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 17:05:07,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1277.63 | bwd_inner_microstep: 1277.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-10 17:05:09,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.37 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-10 17:05:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.92 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-10 17:05:13,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.12 | bwd_microstep: 1545.69 | bwd_inner_microstep: 1545.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 17:05:15,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.89 | bwd_microstep: 1653.96 | bwd_inner_microstep: 1653.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 17:05:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1499.54 | bwd_inner_microstep: 1499.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 17:05:21,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.15 | optimizer_gradients: 4.06 | optimizer_step: 6.63 [2024-06-10 17:05:21,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 3447.13 | bwd_inner_microstep: 1757.31 | bwd_allreduce_microstep: 1689.76 | step_microstep: 38.17 [2024-06-10 17:05:21,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16181.54 | bwd: 45126.00 | bwd_inner: 43435.33 | bwd_allreduce: 1689.99 | step: 39.71 55%|█████▍ | 949/1726 [16:22:51<13:18:36, 61.67s/it] 55%|█████▍ | 949/1726 [16:22:51<13:18:36, 61.67s/it] 55%|█████▌ | 950/1726 [16:23:52<13:14:30, 61.43s/it] 55%|█████▌ | 950/1726 [16:23:52<13:14:30, 61.43s/it] 55%|█████▌ | 951/1726 [16:24:53<13:12:46, 61.38s/it] 55%|█████▌ | 951/1726 [16:24:53<13:12:46, 61.38s/it] 55%|█████▌ | 952/1726 [16:25:54<13:09:43, 61.22s/it] 55%|█████▌ | 952/1726 [16:25:54<13:09:43, 61.22s/it] 55%|█████▌ | 953/1726 [16:26:57<13:12:21, 61.50s/it] 55%|█████▌ | 953/1726 [16:26:57<13:12:21, 61.50s/it] 55%|█████▌ | 954/1726 [16:2{'loss': 1.1952, 'learning_rate': 1.7566338593791955e-05, 'epoch': 0.55} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410 [2024-06-10 17:05:23,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.88 | bwd_microstep: 1367.20 | bwd_inner_microstep: 1367.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294 [2024-06-10 17:05:25,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.63 | bwd_microstep: 874.36 | bwd_inner_microstep: 874.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 17:05:26,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1378.55 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 17:05:29,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.87 | bwd_microstep: 1543.10 | bwd_inner_microstep: 1543.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 17:05:30,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 17:05:32,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.32 | bwd_microstep: 1197.62 | bwd_inner_microstep: 1197.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3707 [2024-06-10 17:05:34,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1488.69 | bwd_inner_microstep: 1488.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2162 [2024-06-10 17:05:35,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.52 | bwd_microstep: 910.27 | bwd_inner_microstep: 910.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2772 [2024-06-10 17:05:37,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.98 | bwd_microstep: 951.95 | bwd_inner_microstep: 951.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3483 [2024-06-10 17:05:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.63 | bwd_microstep: 1675.56 | bwd_inner_microstep: 1675.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3505 [2024-06-10 17:05:41,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1510.81 | bwd_inner_microstep: 1510.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-10 17:05:43,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.74 | bwd_microstep: 1337.20 | bwd_inner_microstep: 1337.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091 [2024-06-10 17:05:44,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 824.81 | bwd_inner_microstep: 824.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 17:05:46,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1452.74 | bwd_inner_microstep: 1452.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-10 17:05:48,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1511.56 | bwd_inner_microstep: 1511.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 17:05:50,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.72 | bwd_microstep: 1524.27 | bwd_inner_microstep: 1524.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3660 [2024-06-10 17:05:52,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.13 | bwd_microstep: 1259.45 | bwd_inner_microstep: 1259.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299 [2024-06-10 17:05:53,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.26 | bwd_microstep: 978.55 | bwd_inner_microstep: 978.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-10 17:05:54,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 697.72 | bwd_inner_microstep: 697.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 17:05:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.16 | bwd_microstep: 800.97 | bwd_inner_microstep: 800.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987 [2024-06-10 17:05:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.25 | bwd_microstep: 706.61 | bwd_inner_microstep: 706.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-10 17:05:58,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.18 | bwd_microstep: 816.31 | bwd_inner_microstep: 816.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 17:06:00,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1504.97 | bwd_inner_microstep: 1504.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533 [2024-06-10 17:06:02,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1324.54 | bwd_inner_microstep: 1324.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074 [2024-06-10 17:06:03,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.78 | bwd_microstep: 1010.18 | bwd_inner_microstep: 1010.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 17:06:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.55 | bwd_microstep: 1402.41 | bwd_inner_microstep: 1402.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 17:06:07,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.06 | bwd_microstep: 1301.52 | bwd_inner_microstep: 1301.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818 [2024-06-10 17:06:09,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1614.48 | bwd_inner_microstep: 1614.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-10 17:06:11,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.94 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398 [2024-06-10 17:06:13,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.65 | bwd_microstep: 1368.15 | bwd_inner_microstep: 1368.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431 [2024-06-10 17:06:15,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.15 | bwd_inner_microstep: 1405.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 17:06:24,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.58 [2024-06-10 17:06:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.45 | bwd_microstep: 8109.24 | bwd_inner_microstep: 1985.51 | bwd_allreduce_microstep: 6123.67 | step_microstep: 38.50 [2024-06-10 17:06:24,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15149.39 | bwd: 46772.94 | bwd_inner: 40648.36 | bwd_allreduce: 6123.91 | step: 40.08 {'loss': 1.1751, 'learning_rate': 1.7529087853597072e-05, 'epoch': 0.55} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462 [2024-06-10 17:06:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3886 [2024-06-10 17:06:28,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.04 | bwd_microstep: 1478.22 | bwd_inner_microstep: 1478.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 17:06:30,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1242.07 | bwd_inner_microstep: 1242.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 17:06:32,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1476.93 | bwd_inner_microstep: 1476.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 17:06:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.68 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-10 17:06:36,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.06 | bwd_microstep: 1479.89 | bwd_inner_microstep: 1479.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 17:06:38,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1387.02 | bwd_inner_microstep: 1386.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477 [2024-06-10 17:06:39,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.87 | bwd_microstep: 1245.66 | bwd_inner_microstep: 1245.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 864 [2024-06-10 17:06:40,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.15 | bwd_microstep: 349.97 | bwd_inner_microstep: 349.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 17:06:42,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.97 | bwd_microstep: 1494.65 | bwd_inner_microstep: 1494.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-10 17:06:44,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.14 | bwd_microstep: 1302.43 | bwd_inner_microstep: 1302.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 720 [2024-06-10 17:06:44,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.08 | bwd_microstep: 292.49 | bwd_inner_microstep: 292.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099 [2024-06-10 17:06:45,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.76 | bwd_microstep: 915.23 | bwd_inner_microstep: 915.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 17:06:48,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.86 | bwd_microstep: 1716.21 | bwd_inner_microstep: 1716.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934 [2024-06-10 17:06:49,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.13 | bwd_microstep: 758.12 | bwd_inner_microstep: 758.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3618 [2024-06-10 17:06:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.25 | bwd_microstep: 1567.77 | bwd_inner_microstep: 1567.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 17:06:53,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1490.00 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3537 [2024-06-10 17:06:55,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1551.29 | bwd_inner_microstep: 1551.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668 [2024-06-10 17:06:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.17 | bwd_microstep: 1624.79 | bwd_inner_microstep: 1624.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 17:06:59,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.43 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 17:07:00,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1288.35 | bwd_inner_microstep: 1288.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 17:07:02,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.71 | bwd_microstep: 1391.73 | bwd_inner_microstep: 1391.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2159 [2024-06-10 17:07:04,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.79 | bwd_microstep: 857.07 | bwd_inner_microstep: 857.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 17:07:06,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.50 | bwd_microstep: 1460.11 | bwd_inner_microstep: 1460.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-10 17:07:07,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.48 | bwd_microstep: 1185.97 | bwd_inner_microstep: 1185.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 17:07:09,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.25 | bwd_microstep: 1496.49 | bwd_inner_microstep: 1496.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-10 17:07:10,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.41 | bwd_microstep: 687.72 | bwd_inner_microstep: 687.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 17:07:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1439.64 | bwd_inner_microstep: 1439.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 17:07:14,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1555.46 | bwd_inner_microstep: 1555.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 17:07:16,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.40 | bwd_microstep: 1157.41 | bwd_inner_microstep: 1157.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806 [2024-06-10 17:07:18,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1481.08 | bwd_inner_microstep: 1481.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2261 [2024-06-10 17:07:26,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-10 17:07:26,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.65 | bwd_microstep: 7162.54 | bwd_inner_microstep: 1216.54 | bwd_allreduce_microstep: 5945.94 | step_microstep: 38.26 [2024-06-10 17:07:26,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15124.06 | bwd: 46458.43 | bwd_inner: 40511.56 | bwd_allreduce: 5946.18 | step: 39.72 {'loss': 1.1929, 'learning_rate': 1.7491845815941926e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 17:07:27,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.89 | bwd_microstep: 1356.31 | bwd_inner_microstep: 1356.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911 [2024-06-10 17:07:30,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.64 | bwd_microstep: 1682.61 | bwd_inner_microstep: 1682.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 17:07:32,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1376.89 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2254 [2024-06-10 17:07:33,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.62 | bwd_microstep: 900.22 | bwd_inner_microstep: 900.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-10 17:07:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.79 | bwd_microstep: 725.67 | bwd_inner_microstep: 725.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 17:07:36,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.33 | bwd_microstep: 1531.62 | bwd_inner_microstep: 1531.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1895 [2024-06-10 17:07:37,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.56 | bwd_microstep: 683.43 | bwd_inner_microstep: 683.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 17:07:38,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.23 | bwd_microstep: 678.96 | bwd_inner_microstep: 678.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 17:07:41,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.45 | bwd_microstep: 2300.76 | bwd_inner_microstep: 2300.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 17:07:43,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.66 | bwd_microstep: 1439.00 | bwd_inner_microstep: 1438.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 17:07:45,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1346.43 | bwd_inner_microstep: 1346.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3381 [2024-06-10 17:07:46,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.52 | bwd_microstep: 1176.87 | bwd_inner_microstep: 1176.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 17:07:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1348.25 | bwd_inner_microstep: 1348.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2488 [2024-06-10 17:07:50,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.44 | bwd_microstep: 1142.67 | bwd_inner_microstep: 1142.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 682 [2024-06-10 17:07:50,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.76 | bwd_microstep: 283.52 | bwd_inner_microstep: 283.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864 [2024-06-10 17:07:52,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.60 | bwd_microstep: 1569.50 | bwd_inner_microstep: 1569.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 17:07:53,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 804.60 | bwd_inner_microstep: 804.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3928 [2024-06-10 17:07:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.39 | bwd_microstep: 1335.03 | bwd_inner_microstep: 1335.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 17:07:57,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1295.96 | bwd_inner_microstep: 1295.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181 [2024-06-10 17:07:58,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.00 | bwd_microstep: 857.79 | bwd_inner_microstep: 857.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706 [2024-06-10 17:08:00,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1431.94 | bwd_inner_microstep: 1431.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 17:08:01,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.24 | bwd_microstep: 806.00 | bwd_inner_microstep: 805.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 17:08:03,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.82 | bwd_microstep: 1424.06 | bwd_inner_microstep: 1424.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3560 [2024-06-10 17:08:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.24 | bwd_microstep: 1265.39 | bwd_inner_microstep: 1265.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3646 [2024-06-10 17:08:07,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1438.78 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2226 [2024-06-10 17:08:08,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.23 | bwd_microstep: 959.69 | bwd_inner_microstep: 959.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2271 [2024-06-10 17:08:10,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.35 | bwd_microstep: 1003.96 | bwd_inner_microstep: 1003.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 17:08:12,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1279.46 | bwd_inner_microstep: 1279.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 17:08:14,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1592.30 | bwd_inner_microstep: 1592.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3584 [2024-06-10 17:08:16,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.17 | bwd_microstep: 1348.25 | bwd_inner_microstep: 1348.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3589 [2024-06-10 17:08:17,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1339.25 | bwd_inner_microstep: 1339.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 17:08:26,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 17:08:26,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.79 | bwd_microstep: 7954.42 | bwd_inner_microstep: 1643.31 | bwd_allreduce_microstep: 6311.06 | step_microstep: 37.97 [2024-06-10 17:08:26,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14390.69 | bwd: 45679.61 | bwd_inner: 39367.64 | bwd_allreduce: 6311.29 | step: 39.58 {'loss': 1.2031, 'learning_rate': 1.7454612611992777e-05, 'epoch': 0.55} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 17:08:28,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1369.38 | bwd_inner_microstep: 1369.27 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 17:08:30,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1373.72 | bwd_inner_microstep: 1373.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763 [2024-06-10 17:08:32,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.65 | bwd_microstep: 1461.31 | bwd_inner_microstep: 1461.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732 [2024-06-10 17:08:34,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1425.78 | bwd_inner_microstep: 1425.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 17:08:36,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.80 | bwd_microstep: 1379.84 | bwd_inner_microstep: 1379.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 17:08:38,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 17:08:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.25 | bwd_microstep: 1379.67 | bwd_inner_microstep: 1379.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 17:08:41,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1426.80 | bwd_inner_microstep: 1426.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-10 17:08:43,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 1154.35 | bwd_inner_microstep: 1154.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 17:08:45,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.63 | bwd_microstep: 1151.00 | bwd_inner_microstep: 1150.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-10 17:08:47,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1418.76 | bwd_inner_microstep: 1418.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-10 17:08:49,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1434.69 | bwd_inner_microstep: 1434.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 17:08:50,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1359.06 | bwd_inner_microstep: 1359.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 17:08:52,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 17:08:54,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1255.38 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565 [2024-06-10 17:08:56,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1430.13 | bwd_inner_microstep: 1430.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 17:08:58,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 17:09:00,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1396.85 | bwd_inner_microstep: 1396.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443 [2024-06-10 17:09:01,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.66 | bwd_microstep: 1154.20 | bwd_inner_microstep: 1154.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 17:09:03,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 17:09:06,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.58 | bwd_microstep: 1656.65 | bwd_inner_microstep: 1656.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-10 17:09:08,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1622.93 | bwd_inner_microstep: 1622.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107 [2024-06-10 17:09:09,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.78 | bwd_microstep: 920.76 | bwd_inner_microstep: 920.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823 [2024-06-10 17:09:11,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.02 | bwd_microstep: 1586.35 | bwd_inner_microstep: 1586.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-10 17:09:13,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.51 | bwd_microstep: 1302.26 | bwd_inner_microstep: 1302.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2193 [2024-06-10 17:09:14,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.18 | bwd_microstep: 890.86 | bwd_inner_microstep: 890.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-10 17:09:16,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.87 | bwd_microstep: 1282.79 | bwd_inner_microstep: 1282.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3570 [2024-06-10 17:09:18,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.93 | bwd_microstep: 1350.34 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 17:09:20,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.16 | bwd_microstep: 1549.58 | bwd_inner_microstep: 1549.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2902 [2024-06-10 17:09:22,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.68 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3804 [2024-06-10 17:09:24,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1351.59 | bwd_inner_microstep: 1351.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 17:09:26,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-10 17:09:26,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1414.52 | bwd_inner_microstep: 1406.84 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.57 [2024-06-10 17:09:26,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16208.70 | bwd: 43207.99 | bwd_inner: 43199.38 | bwd_allreduce: 7.91 | step: 39.04 {'loss': 1.2159, 'learning_rate': 1.7417388372884775e-05, 'epoch': 0.56} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3440 [2024-06-10 17:09:28,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.49 | bwd_microstep: 1494.69 | bwd_inner_microstep: 1494.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 17:09:30,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 17:09:31,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1374.35 | bwd_inner_microstep: 1374.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 17:09:34,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.43 | bwd_microstep: 1653.61 | bwd_inner_microstep: 1653.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 17:09:36,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.46 | bwd_microstep: 1478.39 | bwd_inner_microstep: 1478.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 17:09:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.72 | bwd_microstep: 1149.55 | bwd_inner_microstep: 1149.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 17:09:39,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.40 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2396 [2024-06-10 17:09:41,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.99 | bwd_microstep: 935.16 | bwd_inner_microstep: 935.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 17:09:42,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1382.70 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954 [2024-06-10 17:09:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.46 | bwd_microstep: 700.74 | bwd_inner_microstep: 700.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3545 [2024-06-10 17:09:45,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1229.87 | bwd_inner_microstep: 1229.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 17:09:47,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.12 | bwd_microstep: 1486.69 | bwd_inner_microstep: 1486.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3604 [2024-06-10 17:09:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1369.16 | bwd_inner_microstep: 1369.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3384 [2024-06-10 17:09:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.51 | bwd_microstep: 1143.24 | bwd_inner_microstep: 1143.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 17:09:53,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2165 [2024-06-10 17:09:54,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.78 | bwd_microstep: 916.87 | bwd_inner_microstep: 916.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2686 [2024-06-10 17:09:55,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.94 | bwd_microstep: 1121.52 | bwd_inner_microstep: 1121.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823 [2024-06-10 17:09:58,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.56 | bwd_microstep: 1511.94 | bwd_inner_microstep: 1511.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 17:09:59,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1380.67 | bwd_inner_microstep: 1380.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3818 [2024-06-10 17:10:02,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.41 | bwd_microstep: 1717.66 | bwd_inner_microstep: 1717.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 17:10:04,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 17:10:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1374.58 | bwd_inner_microstep: 1374.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 17:10:08,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 17:10:10,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1343.81 | bwd_inner_microstep: 1343.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2283 [2024-06-10 17:10:11,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.15 | bwd_microstep: 1070.88 | bwd_inner_microstep: 1070.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 17:10:13,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1654.32 | bwd_inner_microstep: 1654.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042 [2024-06-10 17:10:15,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.19 | bwd_microstep: 907.51 | bwd_inner_microstep: 907.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741 [2024-06-10 17:10:17,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1639.99 | bwd_inner_microstep: 1639.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982 [2024-06-10 17:10:18,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.14 | bwd_microstep: 735.13 | bwd_inner_microstep: 735.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 17:10:20,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.55 | bwd_microstep: 1402.54 | bwd_inner_microstep: 1402.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716 [2024-06-10 17:10:22,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.71 | bwd_microstep: 1463.83 | bwd_inner_microstep: 1463.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 17:10:24,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-10 17:10:24,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.88 | bwd_microstep: 1977.62 | bwd_inner_microstep: 1482.68 | bwd_allreduce_microstep: 494.89 | step_microstep: 37.68 [2024-06-10 17:10:24,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15721.19 | bwd: 42569.04 | bwd_inner: 42073.25 | bwd_allreduce: 495.11 | step: 39.18 7:58<13:11:53, 61.55s/it] 55%|█████▌ | 954/1726 [16:27:58<13:11:53, 61.55s/it] 55%|█████▌ | 955/1726 [16:29:00<13:13:38, 61.76s/it] 55%|█████▌ | 955/1726 [16:29:00<13:13:38, 61.76s/it] 55%|█████▌ | 956/1726 [16:30:02<13:13:11, 61.81s/it] 55%|█████▌ | 956/1726 [16:30:02<13:13:11, 61.81s/it] 55%|█████▌ | 957/1726 [16:31:03<13:06:41, 61.38s/it] 55%|█████▌ | 957/1726 [16:31:03<13:06:41, 61.38s/it] 56%|█████▌ | 958/1726 [16:32:02<12:59:22, 60.89s/it] 56%|█████▌ | 958/1726 [16:32:02<12:59:22, 60.89s/it] 56%|█████▌ | 959/1726 [16:33:01<12:49:39, 60.21s/it] {'loss': 1.2427, 'learning_rate': 1.7380173229721494e-05, 'epoch': 0.56} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 17:10:26,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.93 | bwd_microstep: 1380.52 | bwd_inner_microstep: 1380.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 17:10:28,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1383.00 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 17:10:30,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1383.09 | bwd_inner_microstep: 1383.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3801 [2024-06-10 17:10:32,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1556.90 | bwd_inner_microstep: 1556.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-10 17:10:34,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1462.89 | bwd_inner_microstep: 1462.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3744 [2024-06-10 17:10:36,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.62 | bwd_microstep: 1581.75 | bwd_inner_microstep: 1581.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 17:10:38,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 17:10:40,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1280.12 | bwd_inner_microstep: 1280.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 17:10:41,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.62 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2148 [2024-06-10 17:10:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.90 | bwd_microstep: 975.78 | bwd_inner_microstep: 975.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 17:10:44,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1253.29 | bwd_inner_microstep: 1253.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3699 [2024-06-10 17:10:46,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.22 | bwd_microstep: 1513.09 | bwd_inner_microstep: 1513.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396 [2024-06-10 17:10:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1339.30 | bwd_inner_microstep: 1339.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3700 [2024-06-10 17:10:50,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.00 | bwd_microstep: 1720.06 | bwd_inner_microstep: 1720.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2612 [2024-06-10 17:10:52,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.95 | bwd_microstep: 947.62 | bwd_inner_microstep: 947.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3527 [2024-06-10 17:10:54,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1322.06 | bwd_inner_microstep: 1322.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 17:10:56,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1475.92 | bwd_inner_microstep: 1475.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 17:10:58,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1509.41 | bwd_inner_microstep: 1509.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948 [2024-06-10 17:10:59,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 698.12 | bwd_inner_microstep: 698.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2292 [2024-06-10 17:11:00,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.41 | bwd_microstep: 1004.24 | bwd_inner_microstep: 1004.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 17:11:02,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-10 17:11:04,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.96 | bwd_microstep: 1644.76 | bwd_inner_microstep: 1644.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 17:11:06,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1430.08 | bwd_inner_microstep: 1430.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462 [2024-06-10 17:11:08,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.95 | bwd_microstep: 1181.64 | bwd_inner_microstep: 1181.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-10 17:11:10,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.90 | bwd_microstep: 1181.61 | bwd_inner_microstep: 1181.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 17:11:12,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 1555.99 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756 [2024-06-10 17:11:14,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.44 | bwd_microstep: 1444.08 | bwd_inner_microstep: 1444.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3569 [2024-06-10 17:11:16,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.32 | bwd_microstep: 1447.20 | bwd_inner_microstep: 1447.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-10 17:11:17,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.57 | bwd_microstep: 779.57 | bwd_inner_microstep: 779.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 17:11:19,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 17:11:21,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.11 | bwd_microstep: 1551.16 | bwd_inner_microstep: 1551.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3460 [2024-06-10 17:11:33,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.60 [2024-06-10 17:11:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.98 | bwd_microstep: 10849.92 | bwd_inner_microstep: 2060.53 | bwd_allreduce_microstep: 8789.32 | step_microstep: 38.79 [2024-06-10 17:11:33,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15999.93 | bwd: 51941.16 | bwd_inner: 43150.92 | bwd_allreduce: 8789.56 | step: 40.28 {'loss': 1.1465, 'learning_rate': 1.734296731357448e-05, 'epoch': 0.56} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 17:11:35,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1461.21 | bwd_inner_microstep: 1461.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 17:11:37,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1382.68 | bwd_inner_microstep: 1382.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4021 [2024-06-10 17:11:39,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.49 | bwd_microstep: 1604.75 | bwd_inner_microstep: 1604.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 17:11:41,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1381.46 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 17:11:42,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1243.57 | bwd_inner_microstep: 1243.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 17:11:44,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1396.97 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3616 [2024-06-10 17:11:46,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1245.62 | bwd_inner_microstep: 1245.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 17:11:48,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 17:11:50,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.99 | bwd_microstep: 1480.99 | bwd_inner_microstep: 1480.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 17:11:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 794.24 | bwd_inner_microstep: 794.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1915 [2024-06-10 17:11:52,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.44 | bwd_microstep: 748.97 | bwd_inner_microstep: 748.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 17:11:54,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1485.29 | bwd_inner_microstep: 1485.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3525 [2024-06-10 17:11:56,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1444.31 | bwd_inner_microstep: 1444.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3662 [2024-06-10 17:11:59,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.89 | bwd_microstep: 1768.65 | bwd_inner_microstep: 1768.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-10 17:12:01,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.15 | bwd_microstep: 1725.01 | bwd_inner_microstep: 1724.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3656 [2024-06-10 17:12:03,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.98 | bwd_microstep: 1618.80 | bwd_inner_microstep: 1618.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982 [2024-06-10 17:12:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.73 | bwd_microstep: 890.54 | bwd_inner_microstep: 890.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 17:12:06,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1282.41 | bwd_inner_microstep: 1282.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503 [2024-06-10 17:12:08,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.49 | bwd_microstep: 1191.92 | bwd_inner_microstep: 1191.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177 [2024-06-10 17:12:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.27 | bwd_microstep: 857.11 | bwd_inner_microstep: 857.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590 [2024-06-10 17:12:11,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1309.80 | bwd_inner_microstep: 1309.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 17:12:13,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.87 | bwd_microstep: 1297.02 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 17:12:15,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.92 | bwd_microstep: 1457.95 | bwd_inner_microstep: 1457.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 17:12:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1395.80 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 17:12:18,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.11 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800 [2024-06-10 17:12:20,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.58 | bwd_microstep: 1350.93 | bwd_inner_microstep: 1350.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 17:12:23,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.71 | bwd_microstep: 1645.95 | bwd_inner_microstep: 1645.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 17:12:25,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1490.08 | bwd_inner_microstep: 1490.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2281 [2024-06-10 17:12:26,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.34 | bwd_microstep: 1031.71 | bwd_inner_microstep: 1031.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2966 [2024-06-10 17:12:27,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.61 | bwd_microstep: 1040.25 | bwd_inner_microstep: 1040.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584 [2024-06-10 17:12:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.70 | bwd_microstep: 1443.92 | bwd_inner_microstep: 1443.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 17:12:33,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.50 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 17:12:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 2534.22 | bwd_inner_microstep: 1851.89 | bwd_allreduce_microstep: 682.28 | step_microstep: 38.89 [2024-06-10 17:12:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.76 | bwd: 43697.87 | bwd_inner: 43014.68 | bwd_allreduce: 682.51 | step: 40.39 {'loss': 1.1938, 'learning_rate': 1.7305770755482788e-05, 'epoch': 0.56} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3437 [2024-06-10 17:12:35,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 1543.27 | bwd_inner_microstep: 1543.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-10 17:12:37,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.79 | bwd_microstep: 1408.86 | bwd_inner_microstep: 1408.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 17:12:39,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.28 | bwd_microstep: 1507.10 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3794 [2024-06-10 17:12:41,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1380.71 | bwd_inner_microstep: 1380.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 17:12:42,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.11 | bwd_microstep: 1180.65 | bwd_inner_microstep: 1180.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 17:12:44,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1379.25 | bwd_inner_microstep: 1379.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 17:12:46,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.15 | bwd_microstep: 1246.84 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 17:12:48,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1415.84 | bwd_inner_microstep: 1415.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 17:12:50,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1280.54 | bwd_inner_microstep: 1280.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 17:12:51,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-10 17:12:53,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.28 | bwd_microstep: 788.22 | bwd_inner_microstep: 788.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2009 [2024-06-10 17:12:54,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.43 | bwd_microstep: 775.47 | bwd_inner_microstep: 775.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 17:12:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 17:12:58,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.00 | bwd_microstep: 1450.05 | bwd_inner_microstep: 1450.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4013 [2024-06-10 17:13:00,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.74 | bwd_microstep: 1808.87 | bwd_inner_microstep: 1808.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427 [2024-06-10 17:13:02,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.55 | bwd_microstep: 1279.86 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 17:13:04,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.66 | bwd_microstep: 1518.86 | bwd_inner_microstep: 1518.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3670 [2024-06-10 17:13:06,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.77 | bwd_microstep: 1691.02 | bwd_inner_microstep: 1691.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3436 [2024-06-10 17:13:08,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1334.13 | bwd_inner_microstep: 1334.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3483 [2024-06-10 17:13:10,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.84 | bwd_microstep: 1250.74 | bwd_inner_microstep: 1250.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 17:13:12,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1408.39 | bwd_inner_microstep: 1408.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 17:13:14,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1395.78 | bwd_inner_microstep: 1395.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826 [2024-06-10 17:13:16,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2176 [2024-06-10 17:13:17,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 855.42 | bwd_inner_microstep: 855.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3505 [2024-06-10 17:13:19,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.56 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 17:13:21,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1552.15 | bwd_inner_microstep: 1552.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-10 17:13:23,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1412.41 | bwd_inner_microstep: 1412.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 17:13:25,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2293 [2024-06-10 17:13:26,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.92 | bwd_microstep: 1072.98 | bwd_inner_microstep: 1072.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 17:13:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1653.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 17:13:31,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1352.74 | bwd_inner_microstep: 1352.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 17:13:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.61 [2024-06-10 17:13:34,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.81 | bwd_microstep: 3270.98 | bwd_inner_microstep: 1889.51 | bwd_allreduce_microstep: 1381.41 | step_microstep: 37.70 [2024-06-10 17:13:34,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16219.37 | bwd: 45186.24 | bwd_inner: 43803.91 | bwd_allreduce: 1381.65 | step: 39.26 {'loss': 1.2126, 'learning_rate': 1.7268583686452474e-05, 'epoch': 0.56} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 17:13:36,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.48 | bwd_microstep: 1233.28 | bwd_inner_microstep: 1233.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3944 [2024-06-10 17:13:38,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.66 | bwd_microstep: 1691.95 | bwd_inner_microstep: 1691.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854 [2024-06-10 17:13:41,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.43 | bwd_microstep: 1659.91 | bwd_inner_microstep: 1659.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823 [2024-06-10 17:13:43,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1498.46 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 17:13:44,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1247.73 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743 [2024-06-10 17:13:47,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.84 | bwd_microstep: 1531.31 | bwd_inner_microstep: 1531.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 17:13:48,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.15 | bwd_microstep: 1150.68 | bwd_inner_microstep: 1150.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958 [2024-06-10 17:13:49,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.44 | bwd_microstep: 794.05 | bwd_inner_microstep: 794.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1903 [2024-06-10 17:13:50,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.74 | bwd_microstep: 715.39 | bwd_inner_microstep: 715.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-10 17:13:52,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.30 | bwd_microstep: 1441.73 | bwd_inner_microstep: 1441.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 17:13:54,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.92 | bwd_microstep: 1342.77 | bwd_inner_microstep: 1342.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 17:13:56,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.01 | bwd_microstep: 1429.95 | bwd_inner_microstep: 1429.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3513 [2024-06-10 17:13:58,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.70 | bwd_microstep: 1652.73 | bwd_inner_microstep: 1652.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077 [2024-06-10 17:14:00,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.02 | bwd_microstep: 820.20 | bwd_inner_microstep: 820.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 17:14:01,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 17:14:03,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 17:14:05,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1431.33 | bwd_inner_microstep: 1431.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 17:14:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 17:14:09,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1462.04 | bwd_inner_microstep: 1462.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 17:14:11,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 17:14:13,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3566 [2024-06-10 17:14:14,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.23 | bwd_microstep: 1237.61 | bwd_inner_microstep: 1237.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2069 [2024-06-10 17:14:16,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.91 | bwd_microstep: 960.64 | bwd_inner_microstep: 960.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 17:14:18,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1447.63 | bwd_inner_microstep: 1447.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 17:14:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1599.06 | bwd_inner_microstep: 1599.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459 [2024-06-10 17:14:22,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.33 | bwd_microstep: 1569.88 | bwd_inner_microstep: 1569.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 17:14:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1535.65 | bwd_inner_microstep: 1535.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059 [2024-06-10 17:14:25,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.23 | bwd_microstep: 913.38 | bwd_inner_microstep: 913.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449 [2024-06-10 17:14:27,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.75 | bwd_microstep: 1300.21 | bwd_inner_microstep: 1300.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592 [2024-06-10 17:14:29,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.11 | bwd_microstep: 1305.92 | bwd_inner_microstep: 1305.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 17:14:31,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.99 | bwd_microstep: 1250.97 | bwd_inner_microstep: 1250.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816 [2024-06-10 17:14:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-10 17:14:36,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 4243.04 | bwd_inner_microstep: 1797.78 | bwd_allreduce_microstep: 2445.21 | step_microstep: 37.94 [2024-06-10 17:14:36,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.55 | bwd: 45074.16 | bwd_inner: 42628.04 | bwd_allreduce: 2445.44 | step: 39.39 {'loss': 1.1877, 'learning_rate': 1.723140623745622e-05, 'epoch': 0.56} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 17:14:38,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1465.02 | bwd_inner_microstep: 1464.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902 [2024-06-10 17:14:40,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.55 | bwd_microstep: 1692.73 | bwd_inner_microstep: 1692.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-10 17:14:42,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1492.61 | bwd_inner_microstep: 1492.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843 [2024-06-10 17:14:44,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.61 | bwd_microstep: 1557.74 | bwd_inner_microstep: 1557.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 17:14:46,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.78 | bwd_microstep: 1541.57 | bwd_inner_microstep: 1541.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 17:14:47,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.94 | bwd_microstep: 800.08 | bwd_inner_microstep: 800.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 17:14:49,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1380.75 | bwd_inner_microstep: 1380.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-10 17:14:51,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.89 | bwd_microstep: 1186.44 | bwd_inner_microstep: 1186.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-10 17:14:53,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.06 | bwd_microstep: 1629.67 | bwd_inner_microstep: 1629.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 17:14:55,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1387.93 | bwd_inner_microstep: 1387.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 17:14:57,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1414.71 | bwd_inner_microstep: 1414.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 17:14:59,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.38 | bwd_microstep: 1390.28 | bwd_inner_microstep: 1390.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717 [2024-06-10 17:15:01,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 1560.42 | bwd_inner_microstep: 1560.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 17:15:03,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.82 | bwd_microstep: 1342.19 | bwd_inner_microstep: 1342.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-10 17:15:05,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.93 | bwd_microstep: 1525.68 | bwd_inner_microstep: 1525.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2126 [2024-06-10 17:15:06,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.11 | bwd_microstep: 863.16 | bwd_inner_microstep: 863.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925 [2024-06-10 17:15:09,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.06 | bwd_microstep: 1596.45 | bwd_inner_microstep: 1596.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 17:15:11,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1555.20 | bwd_inner_microstep: 1555.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108 [2024-06-10 17:15:12,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.63 | bwd_microstep: 824.61 | bwd_inner_microstep: 824.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 17:15:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.79 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 17:15:15,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.81 | bwd_microstep: 807.36 | bwd_inner_microstep: 807.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 17:15:17,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1346.27 | bwd_inner_microstep: 1346.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1994 [2024-06-10 17:15:18,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.05 | bwd_microstep: 709.95 | bwd_inner_microstep: 709.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 17:15:19,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 17:15:21,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.00 | bwd_microstep: 1299.82 | bwd_inner_microstep: 1299.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 17:15:23,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1280.57 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891 [2024-06-10 17:15:24,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.96 | bwd_microstep: 714.39 | bwd_inner_microstep: 714.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-10 17:15:26,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-10 17:15:28,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.69 | bwd_microstep: 1454.19 | bwd_inner_microstep: 1454.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596 [2024-06-10 17:15:30,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1595.74 | bwd_inner_microstep: 1595.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 17:15:32,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.83 | bwd_microstep: 1335.97 | bwd_inner_microstep: 1335.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2225 [2024-06-10 17:15:36,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 17:15:36,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.50 | bwd_microstep: 3550.62 | bwd_inner_microstep: 1127.28 | bwd_allreduce_microstep: 2423.29 | step_microstep: 38.05 [2024-06-10 17:15:36,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15645.99 | bwd: 44295.42 | bwd_inner: 41871.22 | bwd_allreduce: 2423.52 | step: 39.56 {'loss': 1.2416, 'learning_rate': 1.7194238539432807e-05, 'epoch': 0.56} 56%|█████▌ | 959/1726 [16:33:01<12:49:39, 60.21s/it] 56%|█████▌ | 960/1726 [16:34:09<13:19:32, 62.63s/it] 56%|█████▌ | 960/1726 [16:34:09<13:19:32, 62.63s/it] 56%|█████▌ | 961/1726 [16:35:09<13:08:33, 61.85s/it] 56%|█████▌ | 961/1726 [16:35:09<13:08:33, 61.85s/it] 56%|█████▌ | 962/1726 [16:36:11<13:07:07, 61.82s/it] 56%|█████▌ | 962/1726 [16:36:11<13:07:07, 61.82s/it] 56%|█████▌ | 963/1726 [16:37:12<13:04:02, 61.65s/it] 56%|█████▌ | 963/1726 [16:37:12<13:04:02, 61.65s/it] 56%|█████▌ | 964/1726 [16:38:13<12:57:46, 61.24s/it] 56%dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 17:15:38,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1362.69 | bwd_inner_microstep: 1362.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3946 [2024-06-10 17:15:40,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.06 | bwd_microstep: 1623.44 | bwd_inner_microstep: 1623.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 17:15:42,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1349.36 | bwd_inner_microstep: 1349.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 17:15:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.07 | bwd_microstep: 1180.48 | bwd_inner_microstep: 1180.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2241 [2024-06-10 17:15:45,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.24 | bwd_microstep: 867.47 | bwd_inner_microstep: 867.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-10 17:15:47,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1350.41 | bwd_inner_microstep: 1350.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921 [2024-06-10 17:15:48,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.87 | bwd_microstep: 725.31 | bwd_inner_microstep: 725.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 17:15:49,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.79 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3693 [2024-06-10 17:15:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4143 [2024-06-10 17:15:54,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.86 | bwd_microstep: 1668.06 | bwd_inner_microstep: 1668.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.42 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 17:15:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3933 [2024-06-10 17:15:58,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.01 | bwd_microstep: 1688.17 | bwd_inner_microstep: 1688.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-10 17:16:00,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1525.71 | bwd_inner_microstep: 1525.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 17:16:02,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3441 [2024-06-10 17:16:04,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 17:16:06,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1389.70 | bwd_inner_microstep: 1389.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453 [2024-06-10 17:16:07,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1290.01 | bwd_inner_microstep: 1289.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454 [2024-06-10 17:16:09,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-10 17:16:11,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1285.89 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3443 [2024-06-10 17:16:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.74 | bwd_microstep: 1217.69 | bwd_inner_microstep: 1217.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 17:16:15,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1508.46 | bwd_inner_microstep: 1508.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293 [2024-06-10 17:16:16,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 880.36 | bwd_inner_microstep: 880.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082 [2024-06-10 17:16:17,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.84 | bwd_microstep: 851.80 | bwd_inner_microstep: 851.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 17:16:19,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1507.18 | bwd_inner_microstep: 1507.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 17:16:21,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1480.97 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3758 [2024-06-10 17:16:23,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.35 | bwd_microstep: 1676.27 | bwd_inner_microstep: 1676.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2692 [2024-06-10 17:16:25,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.92 | bwd_microstep: 1033.26 | bwd_inner_microstep: 1033.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 17:16:27,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.59 | bwd_microstep: 1425.73 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576 [2024-06-10 17:16:29,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1527.15 | bwd_inner_microstep: 1527.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420 [2024-06-10 17:16:31,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.36 | bwd_microstep: 1377.10 | bwd_inner_microstep: 1377.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-10 17:16:33,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1310.81 | bwd_inner_microstep: 1310.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 17:16:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 17:16:37,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 4221.18 | bwd_inner_microstep: 1445.68 | bwd_allreduce_microstep: 2775.46 | step_microstep: 37.76 [2024-06-10 17:16:37,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15887.72 | bwd: 45281.53 | bwd_inner: 42505.17 | bwd_allreduce: 2775.69 | step: 40.60 {'loss': 1.234, 'learning_rate': 1.715708072328668e-05, 'epoch': 0.56} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3493 [2024-06-10 17:16:39,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1433.27 | bwd_inner_microstep: 1433.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3472 [2024-06-10 17:16:41,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1325.68 | bwd_inner_microstep: 1325.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3490 [2024-06-10 17:16:43,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1244.45 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 17:16:45,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1549.71 | bwd_inner_microstep: 1549.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 17:16:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1394.11 | bwd_inner_microstep: 1394.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-10 17:16:48,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.60 | bwd_microstep: 818.96 | bwd_inner_microstep: 818.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 17:16:50,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.20 | bwd_microstep: 1533.36 | bwd_inner_microstep: 1533.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 17:16:52,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 17:16:54,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1388.44 | bwd_inner_microstep: 1388.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 17:16:56,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1386.16 | bwd_inner_microstep: 1386.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 17:16:58,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1301.07 | bwd_inner_microstep: 1301.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583 [2024-06-10 17:16:59,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1237.08 | bwd_inner_microstep: 1237.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-10 17:17:01,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.84 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-10 17:17:03,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.58 | bwd_microstep: 1369.65 | bwd_inner_microstep: 1369.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 17:17:05,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1557.52 | bwd_inner_microstep: 1557.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998 [2024-06-10 17:17:06,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.00 | bwd_microstep: 737.45 | bwd_inner_microstep: 737.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3629 [2024-06-10 17:17:08,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1372.28 | bwd_inner_microstep: 1372.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 17:17:10,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1511.79 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 17:17:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.65 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 17:17:14,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.17 | bwd_microstep: 1639.83 | bwd_inner_microstep: 1639.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 17:17:16,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047 [2024-06-10 17:17:17,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 809.79 | bwd_inner_microstep: 809.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 17:17:19,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.03 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 17:17:21,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2183 [2024-06-10 17:17:22,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.15 | bwd_microstep: 953.89 | bwd_inner_microstep: 953.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 17:17:24,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1413.49 | bwd_inner_microstep: 1413.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-10 17:17:26,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1355.70 | bwd_inner_microstep: 1355.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 17:17:28,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.86 | bwd_microstep: 1658.45 | bwd_inner_microstep: 1658.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3061 [2024-06-10 17:17:30,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1234.79 | bwd_inner_microstep: 1234.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 17:17:32,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1471.01 | bwd_inner_microstep: 1470.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3797 [2024-06-10 17:17:34,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1577.72 | bwd_inner_microstep: 1577.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-10 17:17:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 17:17:40,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 4755.36 | bwd_inner_microstep: 1617.11 | bwd_allreduce_microstep: 3138.20 | step_microstep: 37.90 [2024-06-10 17:17:40,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15986.57 | bwd: 45915.07 | bwd_inner: 42775.97 | bwd_allreduce: 3138.43 | step: 39.41 {'loss': 1.1931, 'learning_rate': 1.7119932919887453e-05, 'epoch': 0.56} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 17:17:42,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1432.64 | bwd_inner_microstep: 1432.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1862 [2024-06-10 17:17:43,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.25 | bwd_microstep: 673.77 | bwd_inner_microstep: 673.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 17:17:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1370.72 | bwd_inner_microstep: 1370.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797 [2024-06-10 17:17:47,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.57 | bwd_microstep: 1642.70 | bwd_inner_microstep: 1642.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2248 [2024-06-10 17:17:48,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.83 | bwd_microstep: 868.16 | bwd_inner_microstep: 868.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 17:17:50,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3545 [2024-06-10 17:17:51,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1197.90 | bwd_inner_microstep: 1197.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 17:17:53,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.42 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 17:17:55,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.52 | bwd_microstep: 1397.86 | bwd_inner_microstep: 1397.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 17:17:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1245.83 | bwd_inner_microstep: 1245.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 17:17:59,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.50 | bwd_microstep: 1185.89 | bwd_inner_microstep: 1185.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 17:18:00,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1252.76 | bwd_inner_microstep: 1252.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 17:18:02,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1286.82 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3656 [2024-06-10 17:18:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.69 | bwd_microstep: 1652.28 | bwd_inner_microstep: 1652.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3642 [2024-06-10 17:18:07,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 1559.57 | bwd_inner_microstep: 1559.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 17:18:09,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.61 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 17:18:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1445.56 | bwd_inner_microstep: 1445.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3850 [2024-06-10 17:18:13,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.93 | bwd_microstep: 1829.93 | bwd_inner_microstep: 1829.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3853 [2024-06-10 17:18:16,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.52 | bwd_microstep: 1764.85 | bwd_inner_microstep: 1764.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-10 17:18:18,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1512.71 | bwd_inner_microstep: 1512.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 17:18:20,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1584.48 | bwd_inner_microstep: 1584.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 17:18:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1288.77 | bwd_inner_microstep: 1288.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 17:18:24,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1510.97 | bwd_inner_microstep: 1510.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2242 [2024-06-10 17:18:25,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.61 | bwd_microstep: 930.05 | bwd_inner_microstep: 930.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 17:18:27,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3605 [2024-06-10 17:18:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1371.33 | bwd_inner_microstep: 1371.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-10 17:18:31,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.52 | bwd_microstep: 1354.96 | bwd_inner_microstep: 1354.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515 [2024-06-10 17:18:32,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1417.29 | bwd_inner_microstep: 1417.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3558 [2024-06-10 17:18:34,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.42 | bwd_microstep: 1201.82 | bwd_inner_microstep: 1201.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 17:18:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1495.97 | bwd_inner_microstep: 1495.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 17:18:38,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.80 | bwd_microstep: 1394.20 | bwd_inner_microstep: 1394.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3770 [2024-06-10 17:18:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.20 | optimizer_step: 6.58 [2024-06-10 17:18:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.10 | bwd_microstep: 3387.09 | bwd_inner_microstep: 1976.94 | bwd_allreduce_microstep: 1410.10 | step_microstep: 37.96 [2024-06-10 17:18:42,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16492.28 | bwd: 45725.09 | bwd_inner: 44314.09 | bwd_allreduce: 1410.32 | step: 39.41 {'loss': 1.2262, 'learning_rate': 1.7082795260069515e-05, 'epoch': 0.56} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475 [2024-06-10 17:18:44,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.51 | bwd_microstep: 1212.31 | bwd_inner_microstep: 1212.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 17:18:46,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1252.12 | bwd_inner_microstep: 1252.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 17:18:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 17:18:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.70 | bwd_microstep: 1280.91 | bwd_inner_microstep: 1280.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402 [2024-06-10 17:18:51,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.58 | bwd_microstep: 1210.93 | bwd_inner_microstep: 1210.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807 [2024-06-10 17:18:53,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.97 | bwd_microstep: 1354.35 | bwd_inner_microstep: 1354.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 17:18:55,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 751 [2024-06-10 17:18:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.89 | bwd_microstep: 299.29 | bwd_inner_microstep: 299.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-10 17:18:57,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-10 17:18:59,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1277.50 | bwd_inner_microstep: 1277.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3699 [2024-06-10 17:19:01,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.74 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1980 [2024-06-10 17:19:02,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.11 | bwd_microstep: 856.87 | bwd_inner_microstep: 856.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508 [2024-06-10 17:19:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.70 | bwd_microstep: 1545.02 | bwd_inner_microstep: 1544.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 17:19:06,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1482.47 | bwd_inner_microstep: 1482.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 17:19:08,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1351.50 | bwd_inner_microstep: 1351.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2172 [2024-06-10 17:19:10,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.12 | bwd_microstep: 1047.71 | bwd_inner_microstep: 1047.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3533 [2024-06-10 17:19:11,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1257.50 | bwd_inner_microstep: 1257.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3943 [2024-06-10 17:19:14,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.57 | bwd_microstep: 1529.08 | bwd_inner_microstep: 1529.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 17:19:15,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 807.43 | bwd_inner_microstep: 807.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-10 17:19:17,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1424.38 | bwd_inner_microstep: 1424.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3547 [2024-06-10 17:19:19,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1420.61 | bwd_inner_microstep: 1420.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 17:19:21,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-10 17:19:23,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.31 | bwd_microstep: 1461.83 | bwd_inner_microstep: 1461.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 17:19:25,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1354.85 | bwd_inner_microstep: 1354.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607 [2024-06-10 17:19:26,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1309.56 | bwd_inner_microstep: 1309.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3839 [2024-06-10 17:19:28,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.85 | bwd_microstep: 1420.42 | bwd_inner_microstep: 1420.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 17:19:30,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.82 | bwd_microstep: 1503.32 | bwd_inner_microstep: 1503.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3730 [2024-06-10 17:19:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.25 | bwd_microstep: 1835.43 | bwd_inner_microstep: 1835.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3571 [2024-06-10 17:19:35,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1444.96 | bwd_inner_microstep: 1444.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 17:19:37,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.60 | bwd_microstep: 1507.24 | bwd_inner_microstep: 1507.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066 [2024-06-10 17:19:38,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 917.45 | bwd_inner_microstep: 917.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791 [2024-06-10 17:19:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 17:19:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.18 | bwd_microstep: 3911.85 | bwd_inner_microstep: 1870.08 | bwd_allreduce_microstep: 2041.72 | step_microstep: 38.19 [2024-06-10 17:19:43,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.88 | bwd: 44466.62 | bwd_inner: 42424.00 | bwd_allreduce: 2041.95 | step: 39.65 {'loss': 1.2436, 'learning_rate': 1.704566787463151e-05, 'epoch': 0.56} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 17:19:45,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1244.22 | bwd_inner_microstep: 1244.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3945 [2024-06-10 17:19:47,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.33 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471 [2024-06-10 17:19:48,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.13 | bwd_microstep: 1214.51 | bwd_inner_microstep: 1214.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 17:19:50,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.68 | bwd_microstep: 1444.33 | bwd_inner_microstep: 1444.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 17:19:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1639.12 | bwd_inner_microstep: 1639.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410 [2024-06-10 17:19:54,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1307.75 | bwd_inner_microstep: 1307.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 17:19:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1247.60 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708 [2024-06-10 17:19:58,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.68 | bwd_microstep: 1626.31 | bwd_inner_microstep: 1626.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 847 [2024-06-10 17:19:59,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.32 | bwd_microstep: 346.49 | bwd_inner_microstep: 346.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3704 [2024-06-10 17:20:01,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.38 | bwd_microstep: 1630.09 | bwd_inner_microstep: 1630.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 17:20:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1383.48 | bwd_inner_microstep: 1383.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 17:20:05,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.96 | bwd_microstep: 1472.58 | bwd_inner_microstep: 1472.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 17:20:07,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1507.07 | bwd_inner_microstep: 1507.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 17:20:09,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-10 17:20:11,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1512.57 | bwd_inner_microstep: 1512.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983 [2024-06-10 17:20:12,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.14 | bwd_microstep: 891.19 | bwd_inner_microstep: 891.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3876 [2024-06-10 17:20:14,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.73 | bwd_microstep: 1353.55 | bwd_inner_microstep: 1353.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 17:20:16,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.82 | bwd_microstep: 1299.97 | bwd_inner_microstep: 1299.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143 [2024-06-10 17:20:17,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.68 | bwd_microstep: 932.36 | bwd_inner_microstep: 932.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 17:20:19,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1417.22 | bwd_inner_microstep: 1417.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3654 [2024-06-10 17:20:21,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1227.19 | bwd_inner_microstep: 1227.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-10 17:20:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.09 | bwd_microstep: 687.36 | bwd_inner_microstep: 687.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 17:20:24,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1405.42 | bwd_inner_microstep: 1405.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 17:20:26,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1379.83 | bwd_inner_microstep: 1379.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2283 [2024-06-10 17:20:27,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.66 | bwd_microstep: 940.86 | bwd_inner_microstep: 940.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279 [2024-06-10 17:20:28,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.74 | bwd_microstep: 877.15 | bwd_inner_microstep: 877.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 17:20:30,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.32 | bwd_microstep: 1501.88 | bwd_inner_microstep: 1501.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-10 17:20:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.66 | bwd_microstep: 779.38 | bwd_inner_microstep: 779.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3601 [2024-06-10 17:20:33,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1308.91 | bwd_inner_microstep: 1308.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 17:20:35,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1396.61 | bwd_inner_microstep: 1396.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2021 [2024-06-10 17:20:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.69 | bwd_microstep: 904.08 | bwd_inner_microstep: 904.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618 [2024-06-10 17:20:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.09 | optimizer_step: 6.59 [2024-06-10 17:20:46,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.17 | bwd_microstep: 8638.94 | bwd_inner_microstep: 1618.56 | bwd_allreduce_microstep: 7020.32 | step_microstep: 37.89 [2024-06-10 17:20:46,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15075.12 | bwd: 47393.84 | bwd_inner: 40372.61 | bwd_allreduce: 7020.56 | step: 39.38 {'loss': 1.1777, 'learning_rate': 1.700855089433589e-05, 'epoch': 0.56} |█████▌ | 964/1726 [16:38:13<12:57:46, 61.24s/it] 56%|█████▌ | 965/1726 [16:39:14<12:57:43, 61.32s/it] 56%|█████▌ | 965/1726 [16:39:14<12:57:43, 61.32s/it] 56%|█████▌ | 966/1726 [16:40:16<13:00:11, 61.59s/it] 56%|█████▌ | 966/1726 [16:40:16<13:00:11, 61.59s/it] 56%|█████▌ | 967/1726 [16:41:19<13:02:47, 61.88s/it] 56%|█████▌ | 967/1726 [16:41:19<13:02:47, 61.88s/it] 56%|█████▌ | 968/1726 [16:42:20<12:56:52, 61.49s/it] 56%|█████▌ | 968/1726 [16:42:20<12:56:52, 61.49s/it] 56%|█████▌ | 969/1726 [16:43:22<13:00:46, 61.88s/it] 56%|█████▌ | 969/1726 [16:43:22> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000 [INFO|configuration_utils.py:473] 2024-06-10 17:52:38,625 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/config.json [INFO|configuration_utils.py:594] 2024-06-10 17:52:38,627 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 17:52:47,451 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 17:52:47,534 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 17:52:47,536 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 17:52:47,537 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/added_tokens.json [2024-06-10 17:52:47,751] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is about to be saved! [2024-06-10 17:52:47,763] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt [2024-06-10 17:52:47,763] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... [2024-06-10 17:52:56,843] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2024-06-10 17:52:56,855] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 17:53:08,739] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 17:53:08,757] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1000/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 17:53:08,758] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1000 is ready now! [INFO|trainer.py:3028] 2024-06-10 17:53:08,794 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-400] due to args.save_total_limit dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 17:53:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.15 | bwd_microstep: 1365.73 | bwd_inner_microstep: 1365.53 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 17:53:13,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.30 | bwd_microstep: 1483.85 | bwd_inner_microstep: 1483.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-10 17:53:15,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.64 | bwd_microstep: 1347.26 | bwd_inner_microstep: 1347.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795 [2024-06-10 17:53:17,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.51 | bwd_microstep: 1439.32 | bwd_inner_microstep: 1439.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 17:53:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.98 | bwd_microstep: 1334.64 | bwd_inner_microstep: 1334.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775 [2024-06-10 17:53:20,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.80 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756 [2024-06-10 17:53:22,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1431.43 | bwd_inner_microstep: 1431.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 17:53:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.12 | bwd_microstep: 1147.85 | bwd_inner_microstep: 1147.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 17:53:26,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1284.43 | bwd_inner_microstep: 1284.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 17:53:27,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.29 | bwd_microstep: 788.04 | bwd_inner_microstep: 788.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 17:53:29,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 17:53:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1243.58 | bwd_inner_microstep: 1243.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 17:53:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1348.69 | bwd_inner_microstep: 1348.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695 [2024-06-10 17:53:35,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1524.82 | bwd_inner_microstep: 1524.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 17:53:37,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1476.80 | bwd_inner_microstep: 1476.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3639 [2024-06-10 17:53:39,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1342.49 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 17:53:40,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1349.18 | bwd_inner_microstep: 1349.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 17:53:41,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.75 | bwd_microstep: 726.39 | bwd_inner_microstep: 726.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-10 17:53:43,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.28 | bwd_microstep: 801.12 | bwd_inner_microstep: 801.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1964 [2024-06-10 17:53:44,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.32 | bwd_microstep: 841.35 | bwd_inner_microstep: 841.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 17:53:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 17:53:48,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1387.22 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-10 17:53:50,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.87 | bwd_microstep: 1525.69 | bwd_inner_microstep: 1525.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 17:53:52,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 1460.26 | bwd_inner_microstep: 1460.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998 [2024-06-10 17:53:53,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.18 | bwd_microstep: 896.99 | bwd_inner_microstep: 896.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187 [2024-06-10 17:53:54,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.25 | bwd_microstep: 954.93 | bwd_inner_microstep: 954.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 17:53:56,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1391.03 | bwd_inner_microstep: 1391.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3822 [2024-06-10 17:53:59,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.04 | bwd_microstep: 1750.32 | bwd_inner_microstep: 1750.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3682 [2024-06-10 17:54:01,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1587.52 | bwd_inner_microstep: 1587.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 17:54:03,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1284.10 | bwd_inner_microstep: 1284.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 17:54:05,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1533.49 | bwd_inner_microstep: 1533.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 17:54:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.19 | optimizer_step: 6.60 [2024-06-10 17:54:09,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 3536.41 | bwd_inner_microstep: 1751.46 | bwd_allreduce_microstep: 1784.90 | step_microstep: 38.01 [2024-06-10 17:54:09,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15785.95 | bwd: 44064.73 | bwd_inner: 42278.79 | bwd_allreduce: 1785.21 | step: 39.60 {'loss': 1.2108, 'learning_rate': 1.5827081854536237e-05, 'epoch': 0.58} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 17:54:11,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.59 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3933 [2024-06-10 17:54:13,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1494.41 | bwd_inner_microstep: 1494.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878 [2024-06-10 17:54:15,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.48 | bwd_microstep: 1681.34 | bwd_inner_microstep: 1681.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939 [2024-06-10 17:54:16,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.41 | bwd_microstep: 853.30 | bwd_inner_microstep: 853.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3773 [2024-06-10 17:54:18,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1402.33 | bwd_inner_microstep: 1402.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 17:54:20,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-10 17:54:22,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1549.97 | bwd_inner_microstep: 1549.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479 [2024-06-10 17:54:24,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1404.39 | bwd_inner_microstep: 1404.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3611 [2024-06-10 17:54:27,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1655.57 | bwd_inner_microstep: 1655.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3496 [2024-06-10 17:54:29,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.17 | bwd_microstep: 1717.35 | bwd_inner_microstep: 1717.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2103 [2024-06-10 17:54:30,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.19 | bwd_microstep: 1013.16 | bwd_inner_microstep: 1013.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 17:54:32,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.97 | bwd_microstep: 1487.33 | bwd_inner_microstep: 1487.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056 [2024-06-10 17:54:34,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.95 | bwd_microstep: 913.59 | bwd_inner_microstep: 913.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 17:54:35,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 800.55 | bwd_inner_microstep: 800.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 17:54:37,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.62 | bwd_microstep: 1484.18 | bwd_inner_microstep: 1484.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 17:54:38,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 796.93 | bwd_inner_microstep: 796.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518 [2024-06-10 17:54:39,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.37 | bwd_microstep: 1193.01 | bwd_inner_microstep: 1192.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 17:54:41,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1397.21 | bwd_inner_microstep: 1397.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 17:54:43,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 974.62 | bwd_inner_microstep: 974.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839 [2024-06-10 17:54:45,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.84 | bwd_microstep: 1460.20 | bwd_inner_microstep: 1460.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 17:54:47,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1256.22 | bwd_inner_microstep: 1256.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-10 17:54:48,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.02 | bwd_microstep: 800.18 | bwd_inner_microstep: 800.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 17:54:50,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1516.38 | bwd_inner_microstep: 1516.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 17:54:52,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1399.49 | bwd_inner_microstep: 1399.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 17:54:54,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.22 | bwd_microstep: 1664.14 | bwd_inner_microstep: 1664.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 17:54:56,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1351.66 | bwd_inner_microstep: 1351.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3582 [2024-06-10 17:54:58,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.21 | bwd_microstep: 1253.64 | bwd_inner_microstep: 1253.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 17:55:00,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1442.97 | bwd_inner_microstep: 1442.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 17:55:02,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.90 | bwd_microstep: 1647.88 | bwd_inner_microstep: 1647.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2292 [2024-06-10 17:55:03,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.72 | bwd_microstep: 1073.79 | bwd_inner_microstep: 1073.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3440 [2024-06-10 17:55:10,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.11 | bwd_microstep: 1611.44 | bwd_inner_microstep: 1611.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2811 [2024-06-10 17:55:16,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 20.32 | optimizer_gradients: 4.32 | optimizer_step: 6.59 [2024-06-10 17:55:16,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.74 | bwd_microstep: 6208.04 | bwd_inner_microstep: 1303.45 | bwd_allreduce_microstep: 4904.53 | step_microstep: 41.94 [2024-06-10 17:55:16,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15721.17 | bwd: 47167.02 | bwd_inner: 42261.58 | bwd_allreduce: 4904.76 | step: 43.42 {'loss': 1.2651, 'learning_rate': 1.5790381337762643e-05, 'epoch': 0.58} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 17:55:18,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.66 | bwd_microstep: 1266.77 | bwd_inner_microstep: 1266.67 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3554 [2024-06-10 17:55:20,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1228.35 | bwd_inner_microstep: 1228.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 17:55:22,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.46 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3925 [2024-06-10 17:55:24,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.96 | bwd_microstep: 1692.95 | bwd_inner_microstep: 1692.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928 [2024-06-10 17:55:25,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.24 | bwd_microstep: 726.17 | bwd_inner_microstep: 726.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 17:55:27,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.16 | bwd_microstep: 1244.81 | bwd_inner_microstep: 1244.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873 [2024-06-10 17:55:28,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.10 | bwd_microstep: 678.11 | bwd_inner_microstep: 678.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2888 [2024-06-10 17:55:29,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.25 | bwd_microstep: 1022.65 | bwd_inner_microstep: 1022.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 17:55:31,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1411.49 | bwd_inner_microstep: 1411.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3760 [2024-06-10 17:55:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.12 | bwd_microstep: 1604.42 | bwd_inner_microstep: 1604.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 17:55:35,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492 [2024-06-10 17:55:37,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.87 | bwd_microstep: 1442.22 | bwd_inner_microstep: 1442.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995 [2024-06-10 17:55:38,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.27 | bwd_microstep: 807.70 | bwd_inner_microstep: 807.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629 [2024-06-10 17:55:40,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.84 | bwd_microstep: 1533.53 | bwd_inner_microstep: 1533.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 17:55:42,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.24 | bwd_microstep: 1281.97 | bwd_inner_microstep: 1281.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-10 17:55:44,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.63 | bwd_microstep: 1190.10 | bwd_inner_microstep: 1190.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1932 [2024-06-10 17:55:45,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.60 | bwd_microstep: 848.08 | bwd_inner_microstep: 848.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2117 [2024-06-10 17:55:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.76 | bwd_microstep: 922.63 | bwd_inner_microstep: 922.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-10 17:55:49,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.51 | bwd_microstep: 1618.39 | bwd_inner_microstep: 1618.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2962 [2024-06-10 17:55:50,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.03 | bwd_microstep: 1101.44 | bwd_inner_microstep: 1101.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2132 [2024-06-10 17:55:51,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.03 | bwd_microstep: 834.83 | bwd_inner_microstep: 834.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 17:55:53,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1484.24 | bwd_inner_microstep: 1484.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3607 [2024-06-10 17:55:55,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1341.37 | bwd_inner_microstep: 1341.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 17:55:57,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1404.65 | bwd_inner_microstep: 1404.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594 [2024-06-10 17:55:59,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1306.68 | bwd_inner_microstep: 1306.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 17:56:01,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.94 | bwd_microstep: 1280.85 | bwd_inner_microstep: 1280.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3555 [2024-06-10 17:56:03,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.36 | bwd_microstep: 1347.70 | bwd_inner_microstep: 1347.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557 [2024-06-10 17:56:05,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1420.58 | bwd_inner_microstep: 1420.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3570 [2024-06-10 17:56:06,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1361.22 | bwd_inner_microstep: 1361.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238 [2024-06-10 17:56:08,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.65 | bwd_microstep: 863.46 | bwd_inner_microstep: 863.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003 [2024-06-10 17:56:09,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 737.87 | bwd_inner_microstep: 737.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729 [2024-06-10 17:56:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 17:56:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.90 | bwd_microstep: 8016.27 | bwd_inner_microstep: 1729.70 | bwd_allreduce_microstep: 6286.51 | step_microstep: 38.26 [2024-06-10 17:56:17,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14773.07 | bwd: 45689.85 | bwd_inner: 39402.34 | bwd_allreduce: 6286.80 | step: 39.82 {'loss': 1.2468, 'learning_rate': 1.5753695647244083e-05, 'epoch': 0.58} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 17:56:19,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1340.76 | bwd_inner_microstep: 1340.55 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.13 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-10 17:56:20,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.14 | bwd_microstep: 674.37 | bwd_inner_microstep: 674.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 17:56:22,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1863 [2024-06-10 17:56:23,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.20 | bwd_microstep: 739.16 | bwd_inner_microstep: 739.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 17:56:25,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.49 | bwd_microstep: 1243.77 | bwd_inner_microstep: 1243.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 17:56:27,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1394.54 | bwd_inner_microstep: 1394.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 17:56:28,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3701 [2024-06-10 17:56:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1291.77 | bwd_inner_microstep: 1291.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 4032 [2024-06-10 17:56:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.57 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 17:56:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1448.89 | bwd_inner_microstep: 1448.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 17:56:36,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.05 | bwd_microstep: 1449.62 | bwd_inner_microstep: 1449.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3279 [2024-06-10 17:56:38,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.96 | bwd_microstep: 1284.37 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507 [2024-06-10 17:56:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1448.18 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 17:56:42,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1391.86 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 17:56:44,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.99 | bwd_microstep: 1511.91 | bwd_inner_microstep: 1511.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 17:56:46,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1508.99 | bwd_inner_microstep: 1508.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3516 [2024-06-10 17:56:48,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.47 | bwd_microstep: 1224.03 | bwd_inner_microstep: 1224.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 17:56:50,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1392.05 | bwd_inner_microstep: 1392.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 17:56:52,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1299.99 | bwd_inner_microstep: 1299.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 17:56:53,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1294.75 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 17:56:55,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.63 | bwd_microstep: 1391.29 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 17:56:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 17:56:59,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3708 [2024-06-10 17:57:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.67 | bwd_microstep: 1667.82 | bwd_inner_microstep: 1667.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2270 [2024-06-10 17:57:02,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.70 | bwd_microstep: 811.46 | bwd_inner_microstep: 811.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 17:57:05,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1602.75 | bwd_inner_microstep: 1602.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 17:57:07,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.27 | bwd_microstep: 1562.70 | bwd_inner_microstep: 1562.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-10 17:57:09,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.90 | bwd_microstep: 1496.78 | bwd_inner_microstep: 1496.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3620 [2024-06-10 17:57:11,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.04 | bwd_microstep: 1675.27 | bwd_inner_microstep: 1675.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3566 [2024-06-10 17:57:13,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.79 | bwd_microstep: 1331.65 | bwd_inner_microstep: 1331.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 17:57:15,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.63 | bwd_microstep: 1335.13 | bwd_inner_microstep: 1335.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3439 [2024-06-10 17:57:21,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.61 [2024-06-10 17:57:21,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.38 | bwd_microstep: 5230.48 | bwd_inner_microstep: 1582.74 | bwd_allreduce_microstep: 3647.69 | step_microstep: 37.82 [2024-06-10 17:57:21,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16153.41 | bwd: 46948.34 | bwd_inner: 43299.60 | bwd_allreduce: 3648.00 | step: 39.41 {'loss': 1.2429, 'learning_rate': 1.571702491218738e-05, 'epoch': 0.58} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396 [2024-06-10 17:57:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1273.22 | bwd_inner_microstep: 1273.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 17:57:24,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1241.79 | bwd_inner_microstep: 1241.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899 [2024-06-10 17:57:26,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.42 | bwd_microstep: 1482.26 | bwd_inner_microstep: 1482.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2915 [2024-06-10 17:57:28,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.14 | bwd_microstep: 1127.27 | bwd_inner_microstep: 1127.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-10 17:57:30,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1638.21 | bwd_inner_microstep: 1638.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 17:57:32,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1483.42 | bwd_inner_microstep: 1483.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576 [2024-06-10 17:57:34,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.32 | bwd_microstep: 1205.33 | bwd_inner_microstep: 1205.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 17:57:35,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.83 | bwd_microstep: 700.38 | bwd_inner_microstep: 700.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 17:57:37,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1385.59 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3694 [2024-06-10 17:57:39,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1428.52 | bwd_inner_microstep: 1428.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 17:57:40,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 17:57:41,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.27 | bwd_microstep: 797.50 | bwd_inner_microstep: 797.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 17:57:43,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1344.66 | bwd_inner_microstep: 1344.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409 [2024-06-10 17:57:45,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1310.42 | bwd_inner_microstep: 1310.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 17:57:47,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1513.84 | bwd_inner_microstep: 1513.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404 [2024-06-10 17:57:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1436.29 | bwd_inner_microstep: 1436.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 17:57:51,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1585.78 | bwd_inner_microstep: 1585.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-10 17:57:52,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 800.91 | bwd_inner_microstep: 800.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610 [2024-06-10 17:57:54,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1346.11 | bwd_inner_microstep: 1346.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-10 17:57:55,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 805.02 | bwd_inner_microstep: 805.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807 [2024-06-10 17:57:58,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1486.01 | bwd_inner_microstep: 1485.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623 [2024-06-10 17:58:00,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825 [2024-06-10 17:58:02,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.69 | bwd_microstep: 1486.50 | bwd_inner_microstep: 1486.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 17:58:04,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.12 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 17:58:06,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.77 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3591 [2024-06-10 17:58:08,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.86 | bwd_microstep: 1762.69 | bwd_inner_microstep: 1762.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607 [2024-06-10 17:58:10,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.87 | bwd_microstep: 1706.82 | bwd_inner_microstep: 1706.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-10 17:58:12,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.26 | bwd_microstep: 1401.80 | bwd_inner_microstep: 1401.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2051 [2024-06-10 17:58:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.24 | bwd_microstep: 752.40 | bwd_inner_microstep: 752.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3573 [2024-06-10 17:58:15,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1364.92 | bwd_inner_microstep: 1364.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 17:58:17,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1433.36 | bwd_inner_microstep: 1433.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577 [2024-06-10 17:58:21,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.63 [2024-06-10 17:58:21,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.31 | bwd_microstep: 3373.73 | bwd_inner_microstep: 1612.23 | bwd_allreduce_microstep: 1761.44 | step_microstep: 37.85 [2024-06-10 17:58:21,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15871.68 | bwd: 44318.67 | bwd_inner: 42556.33 | bwd_allreduce: 1761.67 | step: 39.43 {'loss': 1.2446, 'learning_rate': 1.5680369261746674e-05, 'epoch': 0.58} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1926 [2024-06-10 17:58:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 815.47 | bwd_inner_microstep: 815.40 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 17:58:23,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.49 | bwd_microstep: 793.62 | bwd_inner_microstep: 793.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 17:58:25,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.76 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 17:58:27,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.67 | bwd_microstep: 1252.40 | bwd_inner_microstep: 1252.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408 [2024-06-10 17:58:29,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.00 | bwd_microstep: 1183.02 | bwd_inner_microstep: 1182.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586 [2024-06-10 17:58:31,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1435.87 | bwd_inner_microstep: 1435.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 17:58:33,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.57 | bwd_microstep: 1291.19 | bwd_inner_microstep: 1291.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615 [2024-06-10 17:58:34,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.02 | bwd_microstep: 1314.02 | bwd_inner_microstep: 1313.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-10 17:58:36,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1532.56 | bwd_inner_microstep: 1532.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 17:58:38,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 17:58:40,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1161.29 | bwd_inner_microstep: 1161.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3947 [2024-06-10 17:58:42,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.47 | bwd_microstep: 1603.74 | bwd_inner_microstep: 1603.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3483 [2024-06-10 17:58:45,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.06 | bwd_microstep: 1679.01 | bwd_inner_microstep: 1678.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642 [2024-06-10 17:58:47,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.78 | bwd_microstep: 1678.22 | bwd_inner_microstep: 1678.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510 [2024-06-10 17:58:48,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1193.63 | bwd_inner_microstep: 1193.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 17:58:51,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.33 | bwd_microstep: 1660.10 | bwd_inner_microstep: 1660.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 17:58:53,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1404.78 | bwd_inner_microstep: 1404.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3820 [2024-06-10 17:58:55,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1506.64 | bwd_inner_microstep: 1506.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 17:58:57,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1287.08 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-10 17:58:58,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1357.50 | bwd_inner_microstep: 1357.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-10 17:59:00,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1450.91 | bwd_inner_microstep: 1450.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-10 17:59:02,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1317.05 | bwd_inner_microstep: 1317.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-10 17:59:04,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.22 | bwd_microstep: 1423.57 | bwd_inner_microstep: 1423.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1996 [2024-06-10 17:59:05,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.28 | bwd_microstep: 832.61 | bwd_inner_microstep: 832.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2286 [2024-06-10 17:59:07,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.20 | bwd_microstep: 910.70 | bwd_inner_microstep: 910.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3637 [2024-06-10 17:59:09,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1580.04 | bwd_inner_microstep: 1580.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-10 17:59:11,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.05 | bwd_microstep: 1303.78 | bwd_inner_microstep: 1303.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-10 17:59:12,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.81 | bwd_microstep: 973.14 | bwd_inner_microstep: 973.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878 [2024-06-10 17:59:14,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1587.86 | bwd_inner_microstep: 1587.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3569 [2024-06-10 17:59:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.86 | bwd_microstep: 1697.10 | bwd_inner_microstep: 1697.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831 [2024-06-10 17:59:19,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.84 | bwd_microstep: 1755.76 | bwd_inner_microstep: 1755.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420 [2024-06-10 17:59:41,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.61 [2024-06-10 17:59:41,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.63 | bwd_microstep: 21864.03 | bwd_inner_microstep: 1761.36 | bwd_allreduce_microstep: 20102.61 | step_microstep: 38.59 [2024-06-10 17:59:41,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16185.81 | bwd: 63615.31 | bwd_inner: 43511.75 | bwd_allreduce: 20102.87 | step: 40.12 58%|█████▊ | 1001/1726 [17:16:46<14:22:17, 71.36s/it] 58%|█████▊ | 1001/1726 [17:16:46<14:22:17, 71.36s/it] 58%|█████▊ | 1002/1726 [17:17:53<14:07:01, 70.19s/it] 58%|█████▊ | 1002/1726 [17:17:53<14:07:01, 70.19s/it] 58%|█████▊ | 1003/1726 [17:18:54<13:31:51, 67.37s/it] 58%|█████▊ | 1003/1726 [17:18:54<13:31:51, 67.37s/it] 58%|█████▊ | 1004/1726 [17:19:57<13:16:32, 66.20s/it] 58%|█████▊ | 1004/1726 [17:19:57<13:16:32, 66.20s/it] 58%|█████▊ | 1005/1726 [17:20:58<12:55:02, 64.50s/it] 58%|█████▊ | 1005/1726 [17:20:58<12:55:02, 64.50s/it] 58%|█████▊ |{'loss': 1.215, 'learning_rate': 1.564372882502297e-05, 'epoch': 0.58} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3542 [2024-06-10 17:59:44,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.16 | bwd_microstep: 1572.17 | bwd_inner_microstep: 1572.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3986 [2024-06-10 17:59:46,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.66 | bwd_microstep: 1696.77 | bwd_inner_microstep: 1696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 17:59:48,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.01 | bwd_microstep: 1640.51 | bwd_inner_microstep: 1640.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 17:59:50,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.43 | bwd_microstep: 1540.98 | bwd_inner_microstep: 1540.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3398 [2024-06-10 17:59:52,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.48 | bwd_microstep: 1145.03 | bwd_inner_microstep: 1145.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 17:59:54,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.39 | bwd_microstep: 1276.68 | bwd_inner_microstep: 1276.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 17:59:56,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172 [2024-06-10 17:59:57,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.68 | bwd_microstep: 945.52 | bwd_inner_microstep: 945.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688 [2024-06-10 17:59:59,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397 [2024-06-10 18:00:01,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.50 | bwd_microstep: 1429.75 | bwd_inner_microstep: 1429.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435 [2024-06-10 18:00:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.15 | bwd_microstep: 1306.68 | bwd_inner_microstep: 1306.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433 [2024-06-10 18:00:05,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1443.40 | bwd_inner_microstep: 1443.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513 [2024-06-10 18:00:07,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.42 | bwd_microstep: 1428.76 | bwd_inner_microstep: 1428.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489 [2024-06-10 18:00:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.99 | bwd_microstep: 1219.00 | bwd_inner_microstep: 1218.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1974 [2024-06-10 18:00:09,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.02 | bwd_microstep: 827.85 | bwd_inner_microstep: 827.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2110 [2024-06-10 18:00:11,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.97 | bwd_microstep: 918.72 | bwd_inner_microstep: 918.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2301 [2024-06-10 18:00:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.18 | bwd_microstep: 910.94 | bwd_inner_microstep: 910.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 18:00:14,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1314.14 | bwd_inner_microstep: 1314.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-10 18:00:16,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1300.84 | bwd_inner_microstep: 1300.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3465 [2024-06-10 18:00:17,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.34 | bwd_microstep: 1185.28 | bwd_inner_microstep: 1185.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450 [2024-06-10 18:00:19,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1283.64 | bwd_inner_microstep: 1283.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1982 [2024-06-10 18:00:20,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.90 | bwd_microstep: 705.36 | bwd_inner_microstep: 705.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 18:00:22,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1390.17 | bwd_inner_microstep: 1390.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 18:00:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.22 | bwd_microstep: 1611.51 | bwd_inner_microstep: 1611.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-10 18:00:26,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.27 | bwd_microstep: 1529.67 | bwd_inner_microstep: 1529.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3734 [2024-06-10 18:00:28,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.20 | bwd_microstep: 1457.45 | bwd_inner_microstep: 1457.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 18:00:30,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.59 | bwd_microstep: 1354.71 | bwd_inner_microstep: 1354.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 18:00:32,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1491.12 | bwd_inner_microstep: 1491.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174 [2024-06-10 18:00:33,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.44 | bwd_microstep: 887.79 | bwd_inner_microstep: 887.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2202 [2024-06-10 18:00:35,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.74 | bwd_microstep: 1053.89 | bwd_inner_microstep: 1053.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 18:00:37,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.30 | bwd_microstep: 1640.77 | bwd_inner_microstep: 1640.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3767 [2024-06-10 18:00:42,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.48 | optimizer_step: 6.62 [2024-06-10 18:00:42,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.59 | bwd_microstep: 4619.59 | bwd_inner_microstep: 1915.65 | bwd_allreduce_microstep: 2703.85 | step_microstep: 42.79 [2024-06-10 18:00:42,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15712.68 | bwd: 44934.24 | bwd_inner: 42229.44 | bwd_allreduce: 2704.10 | step: 44.26 {'loss': 1.1875, 'learning_rate': 1.5607103731063708e-05, 'epoch': 0.58} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493 [2024-06-10 18:00:44,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.28 | bwd_microstep: 1336.43 | bwd_inner_microstep: 1336.32 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 18:00:46,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1276.56 | bwd_inner_microstep: 1276.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1935 [2024-06-10 18:00:47,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.37 | bwd_microstep: 756.19 | bwd_inner_microstep: 756.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3835 [2024-06-10 18:00:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.64 | bwd_microstep: 1602.33 | bwd_inner_microstep: 1602.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-10 18:00:51,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.80 | bwd_microstep: 1314.62 | bwd_inner_microstep: 1314.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 18:00:53,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 18:00:54,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 793.79 | bwd_inner_microstep: 793.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 18:00:56,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.75 | bwd_microstep: 1148.71 | bwd_inner_microstep: 1148.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 18:00:57,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 18:00:59,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 1516.81 | bwd_inner_microstep: 1516.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3651 [2024-06-10 18:01:01,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1471.58 | bwd_inner_microstep: 1471.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614 [2024-06-10 18:01:04,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.01 | bwd_microstep: 1602.06 | bwd_inner_microstep: 1602.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 18:01:06,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1391.10 | bwd_inner_microstep: 1391.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 18:01:07,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1245.78 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 18:01:09,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.29 | bwd_microstep: 1386.21 | bwd_inner_microstep: 1386.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 18:01:11,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1485.35 | bwd_inner_microstep: 1485.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 18:01:13,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1513.29 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 18:01:15,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1390.53 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 18:01:17,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1558.23 | bwd_inner_microstep: 1558.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 18:01:19,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1495.50 | bwd_inner_microstep: 1495.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 18:01:21,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140 [2024-06-10 18:01:23,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.52 | bwd_microstep: 738.08 | bwd_inner_microstep: 738.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3832 [2024-06-10 18:01:24,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.39 | bwd_microstep: 1295.71 | bwd_inner_microstep: 1295.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 18:01:26,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1383.12 | bwd_inner_microstep: 1383.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2273 [2024-06-10 18:01:28,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.80 | bwd_microstep: 1005.36 | bwd_inner_microstep: 1005.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3606 [2024-06-10 18:01:30,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1370.81 | bwd_inner_microstep: 1370.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 18:01:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1494.07 | bwd_inner_microstep: 1494.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2922 [2024-06-10 18:01:33,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.46 | bwd_microstep: 1187.24 | bwd_inner_microstep: 1187.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381 [2024-06-10 18:01:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1436.12 | bwd_inner_microstep: 1436.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 18:01:37,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1286.10 | bwd_inner_microstep: 1286.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 18:01:39,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.47 | bwd_microstep: 1379.36 | bwd_inner_microstep: 1379.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-10 18:01:45,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:01:45,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.29 | bwd_microstep: 4969.03 | bwd_inner_microstep: 1866.52 | bwd_allreduce_microstep: 3102.46 | step_microstep: 38.07 [2024-06-10 18:01:45,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15957.28 | bwd: 45861.08 | bwd_inner: 42757.61 | bwd_allreduce: 3102.74 | step: 39.57 {'loss': 1.1487, 'learning_rate': 1.5570494108862256e-05, 'epoch': 0.58} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 18:01:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.34 | bwd_microstep: 1467.89 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4012 [2024-06-10 18:01:49,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.62 | bwd_microstep: 1605.64 | bwd_inner_microstep: 1605.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-10 18:01:51,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.60 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 18:01:53,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.80 | bwd_microstep: 1549.33 | bwd_inner_microstep: 1549.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 18:01:55,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1534.16 | bwd_inner_microstep: 1534.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045 [2024-06-10 18:01:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.36 | bwd_microstep: 808.61 | bwd_inner_microstep: 808.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508 [2024-06-10 18:01:58,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.48 | bwd_microstep: 1219.93 | bwd_inner_microstep: 1219.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2209 [2024-06-10 18:01:59,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.41 | bwd_microstep: 954.48 | bwd_inner_microstep: 954.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 18:02:01,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 18:02:03,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 18:02:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.26 | bwd_microstep: 1282.43 | bwd_inner_microstep: 1282.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 18:02:07,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3664 [2024-06-10 18:02:09,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.76 | bwd_microstep: 1655.73 | bwd_inner_microstep: 1655.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563 [2024-06-10 18:02:11,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.14 | bwd_microstep: 1457.34 | bwd_inner_microstep: 1457.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-10 18:02:13,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.13 | bwd_microstep: 1369.29 | bwd_inner_microstep: 1369.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-10 18:02:15,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.54 | bwd_microstep: 1344.81 | bwd_inner_microstep: 1344.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 18:02:17,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.07 | bwd_microstep: 1192.96 | bwd_inner_microstep: 1192.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523 [2024-06-10 18:02:18,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1228.61 | bwd_inner_microstep: 1228.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-10 18:02:21,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.00 | bwd_microstep: 1639.77 | bwd_inner_microstep: 1639.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3627 [2024-06-10 18:02:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1574.44 | bwd_inner_microstep: 1574.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 18:02:25,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.71 | bwd_microstep: 1488.72 | bwd_inner_microstep: 1488.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-10 18:02:27,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.89 | bwd_microstep: 1342.59 | bwd_inner_microstep: 1342.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 18:02:29,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1495.62 | bwd_inner_microstep: 1495.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288 [2024-06-10 18:02:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.05 | bwd_microstep: 877.90 | bwd_inner_microstep: 877.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 18:02:32,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1405.45 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 18:02:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1498.16 | bwd_inner_microstep: 1498.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 18:02:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.28 | bwd_microstep: 1450.15 | bwd_inner_microstep: 1450.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 18:02:38,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2269 [2024-06-10 18:02:39,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.93 | bwd_microstep: 1005.79 | bwd_inner_microstep: 1005.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3733 [2024-06-10 18:02:41,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.46 | bwd_microstep: 1737.37 | bwd_inner_microstep: 1737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568 [2024-06-10 18:02:43,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.93 | bwd_microstep: 1299.77 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576 [2024-06-10 18:02:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.17 | optimizer_step: 6.57 [2024-06-10 18:02:47,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.68 | bwd_microstep: 2819.09 | bwd_inner_microstep: 1708.86 | bwd_allreduce_microstep: 1110.18 | step_microstep: 37.91 [2024-06-10 18:02:47,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16421.29 | bwd: 45374.20 | bwd_inner: 44263.12 | bwd_allreduce: 1110.40 | step: 39.37 {'loss': 1.2081, 'learning_rate': 1.5533900087357527e-05, 'epoch': 0.58} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 18:02:48,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 18:02:50,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1408.26 | bwd_inner_microstep: 1408.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900 [2024-06-10 18:02:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.19 | bwd_microstep: 1584.79 | bwd_inner_microstep: 1584.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-10 18:02:53,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.72 | bwd_microstep: 706.44 | bwd_inner_microstep: 706.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 18:02:55,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 18:02:57,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1483.99 | bwd_inner_microstep: 1483.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569 [2024-06-10 18:02:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1204.73 | bwd_inner_microstep: 1204.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 18:03:00,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1532.20 | bwd_inner_microstep: 1532.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-10 18:03:03,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1632.23 | bwd_inner_microstep: 1632.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 18:03:05,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.92 | bwd_microstep: 1390.62 | bwd_inner_microstep: 1390.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-10 18:03:07,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1529.80 | bwd_inner_microstep: 1529.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-10 18:03:09,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.91 | bwd_microstep: 1413.92 | bwd_inner_microstep: 1413.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 18:03:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.62 | bwd_microstep: 1439.37 | bwd_inner_microstep: 1439.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 18:03:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1256.08 | bwd_inner_microstep: 1256.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150 [2024-06-10 18:03:14,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.26 | bwd_microstep: 1045.32 | bwd_inner_microstep: 1045.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3635 [2024-06-10 18:03:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.22 | bwd_microstep: 1424.17 | bwd_inner_microstep: 1424.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3427 [2024-06-10 18:03:18,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1851 [2024-06-10 18:03:19,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.44 | bwd_microstep: 671.93 | bwd_inner_microstep: 671.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 18:03:20,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.32 | bwd_microstep: 1439.85 | bwd_inner_microstep: 1439.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 18:03:22,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1353.44 | bwd_inner_microstep: 1353.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 18:03:24,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.49 | bwd_microstep: 1364.66 | bwd_inner_microstep: 1364.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3440 [2024-06-10 18:03:26,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1425.79 | bwd_inner_microstep: 1425.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2008 [2024-06-10 18:03:27,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.16 | bwd_microstep: 772.53 | bwd_inner_microstep: 772.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 18:03:29,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 974.64 | bwd_inner_microstep: 974.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 18:03:31,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1455.29 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 18:03:33,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.82 | bwd_microstep: 1544.99 | bwd_inner_microstep: 1544.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1969 [2024-06-10 18:03:34,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.73 | bwd_microstep: 703.60 | bwd_inner_microstep: 703.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 18:03:35,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.77 | bwd_microstep: 978.13 | bwd_inner_microstep: 978.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 18:03:37,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.04 | bwd_microstep: 1587.72 | bwd_inner_microstep: 1587.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 18:03:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 18:03:41,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.62 | bwd_microstep: 1647.10 | bwd_inner_microstep: 1647.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037 [2024-06-10 18:03:47,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 18:03:47,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.66 | bwd_microstep: 5698.04 | bwd_inner_microstep: 929.83 | bwd_allreduce_microstep: 4768.16 | step_microstep: 37.92 [2024-06-10 18:03:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15116.87 | bwd: 45362.38 | bwd_inner: 40593.32 | bwd_allreduce: 4768.39 | step: 39.35 {'loss': 1.1692, 'learning_rate': 1.5497321795433474e-05, 'epoch': 0.59} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481 [2024-06-10 18:03:50,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.72 | bwd_microstep: 1570.19 | bwd_inner_microstep: 1570.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975 [2024-06-10 18:03:52,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.11 | bwd_microstep: 1601.90 | bwd_inner_microstep: 1601.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 18:03:53,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.23 | bwd_microstep: 694.20 | bwd_inner_microstep: 694.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-10 18:03:55,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.09 | bwd_microstep: 1406.05 | bwd_inner_microstep: 1406.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 18:03:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 797.09 | bwd_inner_microstep: 797.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 18:03:58,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.39 | bwd_microstep: 1243.81 | bwd_inner_microstep: 1243.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3426 [2024-06-10 18:03:59,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.50 | bwd_microstep: 1186.43 | bwd_inner_microstep: 1186.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 18:04:01,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.40 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3438 [2024-06-10 18:04:03,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1188.92 | bwd_inner_microstep: 1188.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3578 [2024-06-10 18:04:04,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1270.49 | bwd_inner_microstep: 1270.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2062 [2024-06-10 18:04:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.52 | bwd_microstep: 863.44 | bwd_inner_microstep: 863.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646 [2024-06-10 18:04:08,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1412.10 | bwd_inner_microstep: 1412.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1985 [2024-06-10 18:04:09,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.62 | bwd_microstep: 894.93 | bwd_inner_microstep: 894.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-10 18:04:11,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1401.41 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652 [2024-06-10 18:04:13,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.97 | bwd_microstep: 1447.64 | bwd_inner_microstep: 1447.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 18:04:15,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.19 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 18:04:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1486.97 | bwd_inner_microstep: 1486.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 18:04:19,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.20 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3398 [2024-06-10 18:04:21,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1402.37 | bwd_inner_microstep: 1402.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 18:04:23,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 1358.31 | bwd_inner_microstep: 1358.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 18:04:25,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.70 | bwd_microstep: 1295.65 | bwd_inner_microstep: 1295.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3815 [2024-06-10 18:04:27,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.84 | bwd_microstep: 1616.81 | bwd_inner_microstep: 1616.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 18:04:29,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1399.05 | bwd_inner_microstep: 1399.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 18:04:31,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1409.11 | bwd_inner_microstep: 1409.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848 [2024-06-10 18:04:33,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.94 | bwd_microstep: 1561.20 | bwd_inner_microstep: 1561.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275 [2024-06-10 18:04:34,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.83 | bwd_microstep: 880.70 | bwd_inner_microstep: 880.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 18:04:36,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1418.27 | bwd_inner_microstep: 1418.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446 [2024-06-10 18:04:38,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.55 | bwd_microstep: 1312.85 | bwd_inner_microstep: 1312.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-10 18:04:40,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.71 | bwd_microstep: 1480.46 | bwd_inner_microstep: 1480.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-10 18:04:42,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.62 | bwd_microstep: 1600.47 | bwd_inner_microstep: 1600.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3569 [2024-06-10 18:04:44,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1555.10 | bwd_inner_microstep: 1555.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-10 18:04:51,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.31 | optimizer_step: 6.59 [2024-06-10 18:04:51,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 6201.67 | bwd_inner_microstep: 1607.67 | bwd_allreduce_microstep: 4593.94 | step_microstep: 38.40 [2024-06-10 18:04:51,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15910.89 | bwd: 47232.38 | bwd_inner: 42637.52 | bwd_allreduce: 4594.17 | step: 39.82 1006/1726 [17:22:18<13:50:18, 69.19s/it] 58%|█████▊ | 1006/1726 [17:22:18<13:50:18, 69.19s/it] 58%|█████▊ | 1007/1726 [17:23:19<13:19:40, 66.73s/it] 58%|█████▊ | 1007/1726 [17:23:19<13:19:40, 66.73s/it] 58%|█████▊ | 1008/1726 [17:24:21<13:02:06, 65.36s/it] 58%|█████▊ | 1008/1726 [17:24:21<13:02:06, 65.36s/it] 58%|█████▊ | 1009/1726 [17:25:23<12:49:26, 64.39s/it] 58%|█████▊ | 1009/1726 [17:25:23<12:49:26, 64.39s/it] 59%|█████▊ | 1010/1726 [17:26:24<12:35:32, 63.31s/it] 59%|█████▊ | 1010/1726 [17:26:24<12:35:32, 63.31s/it] 59%|█████▊ | 1011/1726 [17:27:28<12:35:03{'loss': 1.201, 'learning_rate': 1.546075936191866e-05, 'epoch': 0.59} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 18:04:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1432.27 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3887 [2024-06-10 18:04:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 18:04:57,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.55 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3789 [2024-06-10 18:04:59,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.85 | bwd_microstep: 1349.54 | bwd_inner_microstep: 1349.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 18:05:01,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.82 | bwd_microstep: 1649.91 | bwd_inner_microstep: 1649.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2054 [2024-06-10 18:05:02,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.19 | bwd_microstep: 784.31 | bwd_inner_microstep: 784.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3475 [2024-06-10 18:05:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1243.82 | bwd_inner_microstep: 1243.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 18:05:05,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 793.25 | bwd_inner_microstep: 793.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-10 18:05:07,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.28 | bwd_microstep: 1628.33 | bwd_inner_microstep: 1628.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-10 18:05:09,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.84 | bwd_microstep: 1160.70 | bwd_inner_microstep: 1160.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 18:05:11,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.38 | bwd_microstep: 1159.57 | bwd_inner_microstep: 1159.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985 [2024-06-10 18:05:13,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1508.09 | bwd_inner_microstep: 1508.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424 [2024-06-10 18:05:14,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.75 | bwd_microstep: 1278.87 | bwd_inner_microstep: 1278.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891 [2024-06-10 18:05:16,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.37 | bwd_microstep: 774.75 | bwd_inner_microstep: 774.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3995 [2024-06-10 18:05:18,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.18 | bwd_microstep: 1702.43 | bwd_inner_microstep: 1702.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2478 [2024-06-10 18:05:19,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.20 | bwd_microstep: 1016.57 | bwd_inner_microstep: 1016.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2402 [2024-06-10 18:05:21,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.50 | bwd_microstep: 1098.93 | bwd_inner_microstep: 1098.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 18:05:23,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.08 | bwd_microstep: 1372.40 | bwd_inner_microstep: 1372.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 18:05:25,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.39 | bwd_microstep: 1387.71 | bwd_inner_microstep: 1387.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545 [2024-06-10 18:05:26,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1360.65 | bwd_inner_microstep: 1360.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 18:05:29,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.70 | bwd_microstep: 1644.71 | bwd_inner_microstep: 1644.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 18:05:31,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1397.03 | bwd_inner_microstep: 1397.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 18:05:33,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1344.36 | bwd_inner_microstep: 1344.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3818 [2024-06-10 18:05:35,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.27 | bwd_microstep: 1747.71 | bwd_inner_microstep: 1747.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3634 [2024-06-10 18:05:37,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1456.93 | bwd_inner_microstep: 1456.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 18:05:39,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1456.45 | bwd_inner_microstep: 1456.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2076 [2024-06-10 18:05:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.43 | bwd_microstep: 791.29 | bwd_inner_microstep: 791.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2045 [2024-06-10 18:05:41,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.23 | bwd_microstep: 750.50 | bwd_inner_microstep: 750.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3797 [2024-06-10 18:05:43,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.62 | bwd_microstep: 1356.19 | bwd_inner_microstep: 1356.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 18:05:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.87 | bwd_microstep: 1559.28 | bwd_inner_microstep: 1559.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 18:05:47,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.77 | bwd_microstep: 1313.27 | bwd_inner_microstep: 1313.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 18:05:52,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 18:05:52,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 4440.24 | bwd_inner_microstep: 1437.85 | bwd_allreduce_microstep: 3002.34 | step_microstep: 38.29 [2024-06-10 18:05:52,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15688.45 | bwd: 44992.17 | bwd_inner: 41988.80 | bwd_allreduce: 3002.64 | step: 39.85 {'loss': 1.1801, 'learning_rate': 1.5424212915585766e-05, 'epoch': 0.59} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421 [2024-06-10 18:05:54,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.43 | bwd_microstep: 1437.21 | bwd_inner_microstep: 1437.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3938 [2024-06-10 18:05:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1556.27 | bwd_inner_microstep: 1556.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3423 [2024-06-10 18:05:58,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.01 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 18:06:00,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1378.69 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 18:06:02,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1479.51 | bwd_inner_microstep: 1479.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 18:06:04,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.30 | bwd_microstep: 1387.51 | bwd_inner_microstep: 1387.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738 [2024-06-10 18:06:06,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 18:06:08,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1385.61 | bwd_inner_microstep: 1385.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 18:06:09,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 18:06:11,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.50 | bwd_microstep: 1389.29 | bwd_inner_microstep: 1389.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1958 [2024-06-10 18:06:12,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.45 | bwd_microstep: 766.10 | bwd_inner_microstep: 766.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2653 [2024-06-10 18:06:14,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.39 | bwd_microstep: 1113.61 | bwd_inner_microstep: 1113.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 18:06:16,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.63 | bwd_microstep: 1353.21 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 18:06:18,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.71 | bwd_microstep: 1195.16 | bwd_inner_microstep: 1195.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 18:06:19,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1251.31 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 18:06:21,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1394.96 | bwd_inner_microstep: 1394.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 18:06:22,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.39 | bwd_microstep: 698.07 | bwd_inner_microstep: 698.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837 [2024-06-10 18:06:24,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.62 | bwd_microstep: 1360.24 | bwd_inner_microstep: 1360.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650 [2024-06-10 18:06:26,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.53 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 18:06:28,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1392.64 | bwd_inner_microstep: 1392.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 18:06:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1285.08 | bwd_inner_microstep: 1285.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963 [2024-06-10 18:06:31,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 703.73 | bwd_inner_microstep: 703.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 18:06:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1349.28 | bwd_inner_microstep: 1349.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3530 [2024-06-10 18:06:35,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1424.71 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 18:06:36,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 18:06:38,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1428.17 | bwd_inner_microstep: 1428.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 18:06:40,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1283.00 | bwd_inner_microstep: 1282.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3801 [2024-06-10 18:06:42,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1389.75 | bwd_inner_microstep: 1389.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459 [2024-06-10 18:06:44,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.93 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3418 [2024-06-10 18:06:46,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1540.68 | bwd_inner_microstep: 1540.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3764 [2024-06-10 18:06:48,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 1610.89 | bwd_inner_microstep: 1610.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3823 [2024-06-10 18:06:53,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.61 [2024-06-10 18:06:53,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.45 | bwd_microstep: 3512.29 | bwd_inner_microstep: 1707.41 | bwd_allreduce_microstep: 1804.82 | step_microstep: 39.22 [2024-06-10 18:06:53,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.17 | bwd: 44370.71 | bwd_inner: 42564.96 | bwd_allreduce: 1805.05 | step: 40.86 {'loss': 1.2534, 'learning_rate': 1.53876825851512e-05, 'epoch': 0.59} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 18:06:54,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1274.18 | bwd_inner_microstep: 1273.98 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3919 [2024-06-10 18:06:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1490.39 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 18:06:59,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.18 | bwd_microstep: 1539.89 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 18:07:00,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.41 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 18:07:02,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 18:07:04,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.29 | bwd_microstep: 1283.32 | bwd_inner_microstep: 1283.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-10 18:07:05,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 709.62 | bwd_inner_microstep: 709.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3421 [2024-06-10 18:07:07,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1396.47 | bwd_inner_microstep: 1396.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3515 [2024-06-10 18:07:09,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1429.26 | bwd_inner_microstep: 1429.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1862 [2024-06-10 18:07:10,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.46 | bwd_microstep: 706.58 | bwd_inner_microstep: 706.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476 [2024-06-10 18:07:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1441.68 | bwd_inner_microstep: 1441.59 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.16 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 18:07:14,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1491.67 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 18:07:15,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 795.60 | bwd_inner_microstep: 795.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 18:07:17,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1408.35 | bwd_inner_microstep: 1408.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612 [2024-06-10 18:07:19,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1313.79 | bwd_inner_microstep: 1313.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 18:07:21,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1302.46 | bwd_inner_microstep: 1302.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085 [2024-06-10 18:07:22,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.52 | bwd_microstep: 918.13 | bwd_inner_microstep: 918.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 18:07:24,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1296.19 | bwd_inner_microstep: 1296.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442 [2024-06-10 18:07:25,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1158.09 | bwd_inner_microstep: 1158.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 18:07:27,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1501.52 | bwd_inner_microstep: 1501.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 18:07:29,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 18:07:31,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1462.94 | bwd_inner_microstep: 1462.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 18:07:33,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.30 | bwd_microstep: 1186.06 | bwd_inner_microstep: 1186.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3422 [2024-06-10 18:07:35,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1296.46 | bwd_inner_microstep: 1296.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012 [2024-06-10 18:07:36,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.38 | bwd_microstep: 711.58 | bwd_inner_microstep: 711.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2185 [2024-06-10 18:07:37,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.25 | bwd_microstep: 834.84 | bwd_inner_microstep: 834.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-10 18:07:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1325.44 | bwd_inner_microstep: 1325.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581 [2024-06-10 18:07:41,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1373.21 | bwd_inner_microstep: 1373.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733 [2024-06-10 18:07:43,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.13 | bwd_microstep: 1339.99 | bwd_inner_microstep: 1339.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-10 18:07:44,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1346.88 | bwd_inner_microstep: 1346.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2960 [2024-06-10 18:07:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.11 | bwd_microstep: 1249.23 | bwd_inner_microstep: 1249.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385 [2024-06-10 18:07:54,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.60 | optimizer_step: 6.60 [2024-06-10 18:07:54,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 6824.60 | bwd_inner_microstep: 1631.82 | bwd_allreduce_microstep: 5192.66 | step_microstep: 41.50 [2024-06-10 18:07:54,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15128.92 | bwd: 45559.57 | bwd_inner: 40365.70 | bwd_allreduce: 5193.07 | step: 43.29 {'loss': 1.1768, 'learning_rate': 1.5351168499274588e-05, 'epoch': 0.59} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 18:07:56,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 18:07:58,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.84 | bwd_microstep: 1504.29 | bwd_inner_microstep: 1504.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 18:07:59,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.94 | bwd_microstep: 1344.22 | bwd_inner_microstep: 1344.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-10 18:08:02,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.33 | bwd_microstep: 1538.30 | bwd_inner_microstep: 1538.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 18:08:04,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.40 | bwd_microstep: 1435.57 | bwd_inner_microstep: 1435.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 18:08:06,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1412.76 | bwd_inner_microstep: 1412.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769 [2024-06-10 18:08:08,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1472.73 | bwd_inner_microstep: 1472.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 18:08:09,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.88 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 18:08:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 18:08:13,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1250.32 | bwd_inner_microstep: 1250.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734 [2024-06-10 18:08:15,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1437.27 | bwd_inner_microstep: 1437.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 18:08:17,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1380.16 | bwd_inner_microstep: 1380.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2152 [2024-06-10 18:08:18,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.30 | bwd_microstep: 1044.82 | bwd_inner_microstep: 1044.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 18:08:20,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1479.35 | bwd_inner_microstep: 1479.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-10 18:08:23,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.91 | bwd_microstep: 1519.12 | bwd_inner_microstep: 1519.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3448 [2024-06-10 18:08:25,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.63 | bwd_microstep: 1452.34 | bwd_inner_microstep: 1452.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-10 18:08:27,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1411.35 | bwd_inner_microstep: 1411.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 18:08:29,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1454.30 | bwd_inner_microstep: 1454.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 18:08:30,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.41 | bwd_microstep: 1183.60 | bwd_inner_microstep: 1183.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646 [2024-06-10 18:08:32,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.87 | bwd_microstep: 1619.22 | bwd_inner_microstep: 1619.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1976 [2024-06-10 18:08:33,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.46 | bwd_microstep: 766.75 | bwd_inner_microstep: 766.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1989 [2024-06-10 18:08:34,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.73 | bwd_microstep: 708.36 | bwd_inner_microstep: 708.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-10 18:08:36,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.77 | bwd_microstep: 1219.59 | bwd_inner_microstep: 1219.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-10 18:08:38,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1329.01 | bwd_inner_microstep: 1328.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2247 [2024-06-10 18:08:39,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.25 | bwd_microstep: 872.13 | bwd_inner_microstep: 872.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-10 18:08:40,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 806.11 | bwd_inner_microstep: 806.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 18:08:42,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1302.29 | bwd_inner_microstep: 1302.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 18:08:44,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.31 | bwd_microstep: 1260.65 | bwd_inner_microstep: 1260.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 18:08:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1607.44 | bwd_inner_microstep: 1607.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3423 [2024-06-10 18:08:48,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-10 18:08:50,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.43 | bwd_microstep: 1515.29 | bwd_inner_microstep: 1515.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3807 [2024-06-10 18:08:55,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-10 18:08:55,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.06 | bwd_microstep: 4332.08 | bwd_inner_microstep: 1913.43 | bwd_allreduce_microstep: 2418.58 | step_microstep: 39.34 [2024-06-10 18:08:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15967.56 | bwd: 45302.46 | bwd_inner: 42882.92 | bwd_allreduce: 2418.82 | step: 41.07 {'loss': 1.2218, 'learning_rate': 1.5314670786558358e-05, 'epoch': 0.59} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 18:08:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1490.71 | bwd_inner_microstep: 1490.60 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.16 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 18:08:58,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.09 | bwd_microstep: 777.36 | bwd_inner_microstep: 777.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2312 [2024-06-10 18:09:00,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.10 | bwd_microstep: 916.00 | bwd_inner_microstep: 915.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 18:09:02,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 1552.10 | bwd_inner_microstep: 1552.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 18:09:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.03 | bwd_microstep: 1538.44 | bwd_inner_microstep: 1538.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2234 [2024-06-10 18:09:05,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.22 | bwd_microstep: 959.11 | bwd_inner_microstep: 959.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2239 [2024-06-10 18:09:06,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.84 | bwd_microstep: 866.47 | bwd_inner_microstep: 866.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 18:09:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.10 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-10 18:09:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.45 | bwd_microstep: 1191.03 | bwd_inner_microstep: 1191.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 18:09:12,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.83 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 18:09:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1392.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3506 [2024-06-10 18:09:15,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1460.41 | bwd_inner_microstep: 1460.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 18:09:17,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 18:09:19,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.09 | bwd_microstep: 1378.05 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1935 [2024-06-10 18:09:20,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.64 | bwd_microstep: 824.60 | bwd_inner_microstep: 824.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2683 [2024-06-10 18:09:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1200.45 | bwd_inner_microstep: 1200.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2172 [2024-06-10 18:09:24,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.83 | bwd_microstep: 1051.80 | bwd_inner_microstep: 1051.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 18:09:25,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818 [2024-06-10 18:09:28,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.12 | bwd_microstep: 1624.53 | bwd_inner_microstep: 1624.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2123 [2024-06-10 18:09:29,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.26 | bwd_microstep: 1026.57 | bwd_inner_microstep: 1026.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 18:09:31,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1558.60 | bwd_inner_microstep: 1558.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 18:09:33,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1257.71 | bwd_inner_microstep: 1257.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 18:09:35,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.66 | bwd_microstep: 1657.48 | bwd_inner_microstep: 1657.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 18:09:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.77 | bwd_microstep: 1416.31 | bwd_inner_microstep: 1416.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 18:09:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.07 | bwd_microstep: 1358.36 | bwd_inner_microstep: 1358.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 18:09:41,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1557.45 | bwd_inner_microstep: 1557.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 18:09:43,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1414.65 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942 [2024-06-10 18:09:44,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.14 | bwd_microstep: 725.86 | bwd_inner_microstep: 725.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822 [2024-06-10 18:09:46,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1418.87 | bwd_inner_microstep: 1418.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 18:09:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.29 | bwd_microstep: 1533.99 | bwd_inner_microstep: 1533.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-10 18:09:50,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.35 | bwd_microstep: 1439.90 | bwd_inner_microstep: 1439.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 18:09:59,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.14 | optimizer_step: 6.58 [2024-06-10 18:09:59,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.29 | bwd_microstep: 8110.07 | bwd_inner_microstep: 1809.72 | bwd_allreduce_microstep: 6300.29 | step_microstep: 38.96 [2024-06-10 18:09:59,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15457.31 | bwd: 47931.84 | bwd_inner: 41630.53 | bwd_allreduce: 6300.58 | step: 40.52 , 63.36s/it] 59%|█████▊ | 1011/1726 [17:27:28<12:35:03, 63.36s/it] 59%|█████▊ | 1012/1726 [17:28:29<12:25:37, 62.66s/it] 59%|█████▊ | 1012/1726 [17:28:29<12:25:37, 62.66s/it] 59%|█████▊ | 1013/1726 [17:29:29<12:17:19, 62.05s/it] 59%|█████▊ | 1013/1726 [17:29:29<12:17:19, 62.05s/it] 59%|█████▊ | 1014/1726 [17:30:30<12:12:42, 61.75s/it] 59%|█████▊ | 1014/1726 [17:30:30<12:12:42, 61.75s/it] 59%|█████▉ | 1015/1726 [17:31:32<12:11:15, 61.71s/it] 59%|█████▉ | 1015/1726 [17:31:32<12:11:15, 61.71s/it] 59%|█████▉ | 1016/1726 [17:32:36<12:17:24, 62.32s/it] {'loss': 1.2499, 'learning_rate': 1.5278189575547265e-05, 'epoch': 0.59} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 18:10:01,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.46 | bwd_microstep: 1237.22 | bwd_inner_microstep: 1237.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2434 [2024-06-10 18:10:02,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.68 | bwd_microstep: 1008.46 | bwd_inner_microstep: 1008.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 18:10:04,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1250.53 | bwd_inner_microstep: 1250.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 18:10:06,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1275.99 | bwd_inner_microstep: 1275.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 18:10:07,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 795.04 | bwd_inner_microstep: 795.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 18:10:09,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1493.86 | bwd_inner_microstep: 1493.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2047 [2024-06-10 18:10:10,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.53 | bwd_microstep: 810.55 | bwd_inner_microstep: 810.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-10 18:10:12,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.89 | bwd_microstep: 1538.93 | bwd_inner_microstep: 1538.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2156 [2024-06-10 18:10:13,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.62 | bwd_microstep: 1005.82 | bwd_inner_microstep: 1005.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3727 [2024-06-10 18:10:16,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.33 | bwd_microstep: 1681.25 | bwd_inner_microstep: 1681.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 18:10:18,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1346.34 | bwd_inner_microstep: 1346.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 18:10:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1484.93 | bwd_inner_microstep: 1484.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975 [2024-06-10 18:10:21,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.93 | bwd_microstep: 763.41 | bwd_inner_microstep: 763.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786 [2024-06-10 18:10:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.93 | bwd_microstep: 1647.77 | bwd_inner_microstep: 1647.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-10 18:10:25,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.94 | bwd_microstep: 1614.27 | bwd_inner_microstep: 1614.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 18:10:27,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1337.59 | bwd_inner_microstep: 1337.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 18:10:28,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 971.58 | bwd_inner_microstep: 971.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 18:10:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 18:10:32,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.79 | bwd_microstep: 1475.59 | bwd_inner_microstep: 1475.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651 [2024-06-10 18:10:34,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.17 | bwd_microstep: 1516.97 | bwd_inner_microstep: 1516.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 18:10:36,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.55 | bwd_microstep: 1372.81 | bwd_inner_microstep: 1372.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 18:10:38,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.26 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720 [2024-06-10 18:10:40,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.84 | bwd_microstep: 1335.94 | bwd_inner_microstep: 1335.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 18:10:42,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.81 | bwd_microstep: 1654.83 | bwd_inner_microstep: 1654.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565 [2024-06-10 18:10:44,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1599.02 | bwd_inner_microstep: 1599.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909 [2024-06-10 18:10:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.25 | bwd_microstep: 689.72 | bwd_inner_microstep: 689.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-10 18:10:47,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 1301.43 | bwd_inner_microstep: 1301.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447 [2024-06-10 18:10:49,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.45 | bwd_microstep: 1379.59 | bwd_inner_microstep: 1379.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 18:10:51,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.58 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-10 18:10:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1412.39 | bwd_inner_microstep: 1412.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932 [2024-06-10 18:10:54,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.69 | bwd_microstep: 696.99 | bwd_inner_microstep: 696.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 18:11:01,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.12 | optimizer_step: 6.64 [2024-06-10 18:11:01,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.49 | bwd_microstep: 6806.35 | bwd_inner_microstep: 2177.32 | bwd_allreduce_microstep: 4628.98 | step_microstep: 38.06 [2024-06-10 18:11:01,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15518.75 | bwd: 46637.33 | bwd_inner: 42007.45 | bwd_allreduce: 4629.20 | step: 39.64 {'loss': 1.188, 'learning_rate': 1.5241724994727933e-05, 'epoch': 0.59} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3386 [2024-06-10 18:11:03,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.38 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1231.84 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 18:11:04,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.10 | bwd_microstep: 788.11 | bwd_inner_microstep: 788.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857 [2024-06-10 18:11:07,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.68 | bwd_microstep: 1656.71 | bwd_inner_microstep: 1656.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2261 [2024-06-10 18:11:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.62 | bwd_microstep: 869.19 | bwd_inner_microstep: 869.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 18:11:10,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1376.83 | bwd_inner_microstep: 1376.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 18:11:12,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1545.74 | bwd_inner_microstep: 1545.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3586 [2024-06-10 18:11:13,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.62 | bwd_microstep: 1208.79 | bwd_inner_microstep: 1208.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 18:11:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.73 | bwd_microstep: 1628.95 | bwd_inner_microstep: 1628.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705 [2024-06-10 18:11:18,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.57 | bwd_microstep: 1526.46 | bwd_inner_microstep: 1526.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181 [2024-06-10 18:11:19,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.02 | bwd_microstep: 952.41 | bwd_inner_microstep: 952.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3903 [2024-06-10 18:11:21,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1558.02 | bwd_inner_microstep: 1557.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 18:11:23,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2130 [2024-06-10 18:11:24,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.82 | bwd_microstep: 928.95 | bwd_inner_microstep: 928.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 18:11:26,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.41 | bwd_microstep: 1444.84 | bwd_inner_microstep: 1444.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896 [2024-06-10 18:11:27,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 776.11 | bwd_inner_microstep: 776.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 18:11:29,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1482.28 | bwd_inner_microstep: 1482.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-10 18:11:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 699.48 | bwd_inner_microstep: 699.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 18:11:32,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1276.88 | bwd_inner_microstep: 1276.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287 [2024-06-10 18:11:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.71 | bwd_microstep: 877.81 | bwd_inner_microstep: 877.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 18:11:35,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.92 | bwd_microstep: 1258.11 | bwd_inner_microstep: 1258.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 18:11:37,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1295.76 | bwd_inner_microstep: 1295.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659 [2024-06-10 18:11:39,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.41 | bwd_microstep: 1416.97 | bwd_inner_microstep: 1416.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 18:11:41,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1383.01 | bwd_inner_microstep: 1382.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-10 18:11:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.88 | bwd_microstep: 708.74 | bwd_inner_microstep: 708.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 18:11:44,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.33 | bwd_microstep: 1256.13 | bwd_inner_microstep: 1256.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 18:11:46,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 1450.32 | bwd_inner_microstep: 1450.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-10 18:11:48,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.56 | bwd_microstep: 1543.44 | bwd_inner_microstep: 1543.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3608 [2024-06-10 18:11:50,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1534.35 | bwd_inner_microstep: 1534.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3776 [2024-06-10 18:11:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.52 | bwd_microstep: 1847.75 | bwd_inner_microstep: 1847.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3850 [2024-06-10 18:11:54,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.18 | bwd_microstep: 1519.94 | bwd_inner_microstep: 1519.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 18:11:56,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 18:12:01,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.64 [2024-06-10 18:12:01,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.74 | bwd_microstep: 4255.21 | bwd_inner_microstep: 1868.70 | bwd_allreduce_microstep: 2386.46 | step_microstep: 38.97 [2024-06-10 18:12:01,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.41 | bwd: 43797.38 | bwd_inner: 41409.81 | bwd_allreduce: 2386.78 | step: 40.61 {'loss': 1.1992, 'learning_rate': 1.5205277172528438e-05, 'epoch': 0.59} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950 [2024-06-10 18:12:02,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.96 | bwd_microstep: 884.31 | bwd_inner_microstep: 884.22 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946 [2024-06-10 18:12:05,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 1695.54 | bwd_inner_microstep: 1695.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978 [2024-06-10 18:12:07,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.01 | bwd_microstep: 1602.31 | bwd_inner_microstep: 1602.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 18:12:09,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1551.78 | bwd_inner_microstep: 1551.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3396 [2024-06-10 18:12:11,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.75 | bwd_microstep: 1306.41 | bwd_inner_microstep: 1306.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 18:12:13,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1383.06 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 18:12:15,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.64 | bwd_microstep: 1385.60 | bwd_inner_microstep: 1385.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 18:12:16,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1291.16 | bwd_inner_microstep: 1291.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-10 18:12:19,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.39 | bwd_microstep: 1641.91 | bwd_inner_microstep: 1641.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950 [2024-06-10 18:12:20,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.92 | bwd_microstep: 854.06 | bwd_inner_microstep: 854.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555 [2024-06-10 18:12:22,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1236.63 | bwd_inner_microstep: 1236.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875 [2024-06-10 18:12:23,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.72 | bwd_microstep: 709.27 | bwd_inner_microstep: 709.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-10 18:12:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1484.01 | bwd_inner_microstep: 1483.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 18:12:26,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1395.02 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 18:12:28,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.20 | bwd_microstep: 1342.17 | bwd_inner_microstep: 1342.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660 [2024-06-10 18:12:31,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.86 | bwd_microstep: 1718.87 | bwd_inner_microstep: 1718.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513 [2024-06-10 18:12:33,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.72 | bwd_microstep: 1436.73 | bwd_inner_microstep: 1436.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 18:12:34,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.81 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 18:12:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1420.09 | bwd_inner_microstep: 1420.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 18:12:38,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1292.36 | bwd_inner_microstep: 1292.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 18:12:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1399.24 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 18:12:42,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1257.52 | bwd_inner_microstep: 1257.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 18:12:44,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1286.42 | bwd_inner_microstep: 1286.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 18:12:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1408.18 | bwd_inner_microstep: 1408.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770 [2024-06-10 18:12:48,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.77 | bwd_microstep: 1347.05 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 18:12:50,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.24 | bwd_microstep: 1487.01 | bwd_inner_microstep: 1486.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 18:12:51,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.68 | bwd_microstep: 1305.58 | bwd_inner_microstep: 1305.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-10 18:12:53,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.45 | bwd_microstep: 1360.32 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570 [2024-06-10 18:12:55,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1591.57 | bwd_inner_microstep: 1591.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637 [2024-06-10 18:12:57,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1320.17 | bwd_inner_microstep: 1320.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819 [2024-06-10 18:12:59,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1366.55 | bwd_inner_microstep: 1366.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 18:13:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.61 [2024-06-10 18:13:01,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.08 | bwd_microstep: 1539.52 | bwd_inner_microstep: 1531.81 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.56 [2024-06-10 18:13:01,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16332.92 | bwd: 43608.71 | bwd_inner: 43600.09 | bwd_allreduce: 7.93 | step: 39.17 {'loss': 1.2296, 'learning_rate': 1.5168846237317814e-05, 'epoch': 0.59} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3493 [2024-06-10 18:13:03,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.26 | bwd_microstep: 1521.83 | bwd_inner_microstep: 1521.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 18:13:05,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1484.24 | bwd_inner_microstep: 1484.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-10 18:13:07,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.22 | bwd_microstep: 800.04 | bwd_inner_microstep: 800.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 18:13:09,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.69 | bwd_microstep: 1477.23 | bwd_inner_microstep: 1477.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 18:13:10,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 18:13:12,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.29 | bwd_microstep: 1390.19 | bwd_inner_microstep: 1390.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3517 [2024-06-10 18:13:14,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1435.71 | bwd_inner_microstep: 1435.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 18:13:16,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1285.63 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913 [2024-06-10 18:13:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.65 | bwd_microstep: 686.39 | bwd_inner_microstep: 686.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 18:13:19,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1398.58 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951 [2024-06-10 18:13:20,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 700.24 | bwd_inner_microstep: 700.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498 [2024-06-10 18:13:22,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1316.29 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 18:13:24,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.22 | bwd_microstep: 1621.66 | bwd_inner_microstep: 1621.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690 [2024-06-10 18:13:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.57 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2093 [2024-06-10 18:13:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 819.71 | bwd_inner_microstep: 819.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 18:13:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.68 | bwd_microstep: 1392.13 | bwd_inner_microstep: 1392.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-10 18:13:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.82 | bwd_microstep: 797.43 | bwd_inner_microstep: 797.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-10 18:13:32,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1340.91 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-10 18:13:33,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.60 | bwd_microstep: 1180.57 | bwd_inner_microstep: 1180.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2106 [2024-06-10 18:13:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.32 | bwd_microstep: 921.63 | bwd_inner_microstep: 921.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 18:13:37,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1468.80 | bwd_inner_microstep: 1468.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 18:13:39,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.19 | bwd_microstep: 1545.65 | bwd_inner_microstep: 1545.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 18:13:41,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.73 | bwd_microstep: 1169.54 | bwd_inner_microstep: 1169.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 18:13:43,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.28 | bwd_microstep: 1751.99 | bwd_inner_microstep: 1751.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 18:13:45,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.48 | bwd_microstep: 1280.70 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 18:13:47,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.68 | bwd_microstep: 1485.63 | bwd_inner_microstep: 1485.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 18:13:49,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1400.25 | bwd_inner_microstep: 1400.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 18:13:51,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1453.34 | bwd_inner_microstep: 1453.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3953 [2024-06-10 18:13:53,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.26 | bwd_microstep: 1506.66 | bwd_inner_microstep: 1506.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 18:13:55,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.60 | bwd_microstep: 1280.31 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2567 [2024-06-10 18:13:56,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.45 | bwd_microstep: 1161.08 | bwd_inner_microstep: 1161.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 18:14:03,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 18:14:03,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.76 | bwd_microstep: 5822.09 | bwd_inner_microstep: 1691.25 | bwd_allreduce_microstep: 4130.79 | step_microstep: 37.88 [2024-06-10 18:14:03,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15461.56 | bwd: 45499.80 | bwd_inner: 41368.10 | bwd_allreduce: 4131.02 | step: 39.33 {'loss': 1.2185, 'learning_rate': 1.5132432317405626e-05, 'epoch': 0.59} dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2639 [2024-06-10 18:14:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.01 | bwd_microstep: 1041.81 | bwd_inner_microstep: 1041.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4348 [2024-06-10 18:14:06,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.14 | bwd_microstep: 1700.57 | bwd_inner_microstep: 1700.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-10 18:14:08,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.61 | bwd_microstep: 874.10 | bwd_inner_microstep: 874.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 18:14:09,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.55 | bwd_microstep: 1341.18 | bwd_inner_microstep: 1341.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2679 [2024-06-10 18:14:11,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.37 | bwd_microstep: 1119.14 | bwd_inner_microstep: 1119.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 18:14:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.92 | bwd_microstep: 796.86 | bwd_inner_microstep: 796.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752 [2024-06-10 18:14:14,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1371.48 | bwd_inner_microstep: 1371.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 777 [2024-06-10 18:14:14,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 121.07 | bwd_microstep: 308.27 | bwd_inner_microstep: 308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 18:14:16,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.57 | bwd_microstep: 1354.91 | bwd_inner_microstep: 1354.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2157 [2024-06-10 18:14:17,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.95 | bwd_microstep: 851.07 | bwd_inner_microstep: 851.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975 [2024-06-10 18:14:19,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.19 | bwd_microstep: 736.19 | bwd_inner_microstep: 736.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 18:14:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 796.59 | bwd_inner_microstep: 796.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 18:14:22,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1488.82 | bwd_inner_microstep: 1488.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 18:14:24,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1351.35 | bwd_inner_microstep: 1351.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 18:14:25,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1376.17 | bwd_inner_microstep: 1376.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 18:14:27,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.99 | bwd_microstep: 1372.52 | bwd_inner_microstep: 1372.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2090 [2024-06-10 18:14:28,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 820.69 | bwd_inner_microstep: 820.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570 [2024-06-10 18:14:31,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1523.86 | bwd_inner_microstep: 1523.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 18:14:33,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.64 | bwd_microstep: 1390.61 | bwd_inner_microstep: 1390.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818 [2024-06-10 18:14:35,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1583.83 | bwd_inner_microstep: 1583.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 18:14:37,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 18:14:39,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.78 | bwd_microstep: 1441.44 | bwd_inner_microstep: 1441.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 18:14:41,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.76 | bwd_microstep: 1405.83 | bwd_inner_microstep: 1405.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 18:14:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.75 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 18:14:45,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.72 | bwd_microstep: 1656.72 | bwd_inner_microstep: 1656.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721 [2024-06-10 18:14:47,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1635.34 | bwd_inner_microstep: 1635.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 18:14:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1416.77 | bwd_inner_microstep: 1416.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 18:14:51,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1407.52 | bwd_inner_microstep: 1407.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431 [2024-06-10 18:14:53,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1543.82 | bwd_inner_microstep: 1543.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-10 18:14:55,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-10 18:14:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.79 | bwd_microstep: 1310.09 | bwd_inner_microstep: 1310.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 18:15:04,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.62 [2024-06-10 18:15:04,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.62 | bwd_microstep: 6709.89 | bwd_inner_microstep: 1883.33 | bwd_allreduce_microstep: 4826.51 | step_microstep: 38.05 [2024-06-10 18:15:04,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15306.29 | bwd: 45936.54 | bwd_inner: 41109.13 | bwd_allreduce: 4826.74 | step: 39.53 59%|█████▉ | 1016/1726 [17:32:36<12:17:24, 62.32s/it] 59%|█████▉ | 1017/1726 [17:33:38<12:17:02, 62.37s/it] 59%|█████▉ | 1017/1726 [17:33:38<12:17:02, 62.37s/it] 59%|█████▉ | 1018/1726 [17:34:38<12:06:03, 61.53s/it] 59%|█████▉ | 1018/1726 [17:34:38<12:06:03, 61.53s/it] 59%|█████▉ | 1019/1726 [17:35:38<12:00:36, 61.15s/it] 59%|█████▉ | 1019/1726 [17:35:38<12:00:36, 61.15s/it] 59%|█████▉ | 1020/1726 [17:36:39<12:00:03, 61.19s/it] 59%|█████▉ | 1020/1726 [17:36:39<12:00:03, 61.19s/it] 59%|█████▉ | 1021/1726 [17:37:41<12:00:21, 61.31s/it] {'loss': 1.222, 'learning_rate': 1.509603554104152e-05, 'epoch': 0.59} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3460 [2024-06-10 18:15:06,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1494.05 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3922 [2024-06-10 18:15:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.95 | bwd_microstep: 1390.40 | bwd_inner_microstep: 1390.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 18:15:10,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1377.16 | bwd_inner_microstep: 1377.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 18:15:12,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930 [2024-06-10 18:15:13,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 788.86 | bwd_inner_microstep: 788.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 18:15:15,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1279.69 | bwd_inner_microstep: 1279.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1947 [2024-06-10 18:15:16,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.69 | bwd_microstep: 821.02 | bwd_inner_microstep: 820.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:15:18,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 18:15:20,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1245.88 | bwd_inner_microstep: 1245.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 18:15:21,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1249.55 | bwd_inner_microstep: 1249.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 18:15:23,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 18:15:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1342.95 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438 [2024-06-10 18:15:27,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.25 | bwd_microstep: 1313.46 | bwd_inner_microstep: 1313.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 18:15:29,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.28 | bwd_inner_microstep: 1250.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 18:15:31,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1452.77 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 18:15:33,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1353.47 | bwd_inner_microstep: 1353.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098 [2024-06-10 18:15:34,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 852.06 | bwd_inner_microstep: 852.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 18:15:36,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 18:15:37,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122 [2024-06-10 18:15:39,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.44 | bwd_microstep: 927.13 | bwd_inner_microstep: 927.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069 [2024-06-10 18:15:40,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.81 | bwd_microstep: 914.40 | bwd_inner_microstep: 914.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 18:15:42,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1256.78 | bwd_inner_microstep: 1256.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 18:15:44,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1394.83 | bwd_inner_microstep: 1394.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3935 [2024-06-10 18:15:46,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.19 | bwd_microstep: 1620.78 | bwd_inner_microstep: 1620.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 18:15:47,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.64 | bwd_microstep: 973.73 | bwd_inner_microstep: 973.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 18:15:49,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1252.29 | bwd_inner_microstep: 1252.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3570 [2024-06-10 18:15:51,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.53 | bwd_microstep: 1663.50 | bwd_inner_microstep: 1663.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 18:15:53,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1448.16 | bwd_inner_microstep: 1448.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807 [2024-06-10 18:15:56,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.61 | bwd_microstep: 1752.93 | bwd_inner_microstep: 1752.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3589 [2024-06-10 18:15:58,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1464.21 | bwd_inner_microstep: 1464.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801 [2024-06-10 18:16:00,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.30 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039 [2024-06-10 18:16:07,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-10 18:16:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.43 | bwd_microstep: 6565.57 | bwd_inner_microstep: 926.31 | bwd_allreduce_microstep: 5639.21 | step_microstep: 38.12 [2024-06-10 18:16:07,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15404.04 | bwd: 46901.59 | bwd_inner: 41261.47 | bwd_allreduce: 5639.44 | step: 39.61 {'loss': 1.2151, 'learning_rate': 1.5059656036414738e-05, 'epoch': 0.59} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 18:16:09,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1474.60 | bwd_inner_microstep: 1474.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 18:16:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.44 | bwd_microstep: 695.97 | bwd_inner_microstep: 695.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3907 [2024-06-10 18:16:12,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.70 | bwd_microstep: 1587.33 | bwd_inner_microstep: 1587.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262 [2024-06-10 18:16:13,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.52 | bwd_microstep: 966.85 | bwd_inner_microstep: 966.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 18:16:15,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.92 | bwd_microstep: 1380.58 | bwd_inner_microstep: 1380.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 18:16:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1245.63 | bwd_inner_microstep: 1245.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960 [2024-06-10 18:16:18,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.06 | bwd_microstep: 702.50 | bwd_inner_microstep: 702.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 18:16:20,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1385.91 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1909 [2024-06-10 18:16:21,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.53 | bwd_microstep: 778.51 | bwd_inner_microstep: 778.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-10 18:16:23,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.48 | bwd_microstep: 1624.11 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4091 [2024-06-10 18:16:25,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.59 | bwd_microstep: 1525.46 | bwd_inner_microstep: 1525.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-10 18:16:27,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.85 | bwd_microstep: 1279.78 | bwd_inner_microstep: 1279.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3916 [2024-06-10 18:16:30,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.52 | bwd_microstep: 1790.23 | bwd_inner_microstep: 1790.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 18:16:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 18:16:33,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1439.18 | bwd_inner_microstep: 1439.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 18:16:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.32 | bwd_microstep: 1445.89 | bwd_inner_microstep: 1445.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 18:16:37,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1485.60 | bwd_inner_microstep: 1485.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3627 [2024-06-10 18:16:39,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1613.02 | bwd_inner_microstep: 1613.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 18:16:41,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1416.09 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 18:16:43,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1278.74 | bwd_inner_microstep: 1278.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529 [2024-06-10 18:16:44,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.99 | bwd_microstep: 1357.48 | bwd_inner_microstep: 1357.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 18:16:47,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.40 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 18:16:48,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.38 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-10 18:16:50,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-10 18:16:52,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.73 | bwd_microstep: 802.29 | bwd_inner_microstep: 802.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3706 [2024-06-10 18:16:53,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.56 | bwd_microstep: 1361.16 | bwd_inner_microstep: 1361.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 18:16:55,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3728 [2024-06-10 18:16:57,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.00 | bwd_microstep: 1682.27 | bwd_inner_microstep: 1682.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-10 18:17:00,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.95 | bwd_microstep: 1606.30 | bwd_inner_microstep: 1606.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3769 [2024-06-10 18:17:02,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.40 | bwd_microstep: 1744.03 | bwd_inner_microstep: 1744.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-10 18:17:04,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1312.13 | bwd_inner_microstep: 1312.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3646 [2024-06-10 18:17:11,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 18:17:11,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 6719.02 | bwd_inner_microstep: 1544.39 | bwd_allreduce_microstep: 5174.58 | step_microstep: 38.21 [2024-06-10 18:17:11,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15946.72 | bwd: 48055.99 | bwd_inner: 42880.51 | bwd_allreduce: 5174.81 | step: 39.63 {'loss': 1.2306, 'learning_rate': 1.5023293931653714e-05, 'epoch': 0.59} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2906 [2024-06-10 18:17:13,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.65 | bwd_microstep: 1176.98 | bwd_inner_microstep: 1176.91 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 18:17:15,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907 [2024-06-10 18:17:17,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.49 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 18:17:19,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.10 | bwd_microstep: 1652.79 | bwd_inner_microstep: 1652.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:17:21,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1382.66 | bwd_inner_microstep: 1382.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 18:17:23,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.41 | bwd_microstep: 1146.94 | bwd_inner_microstep: 1146.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403 [2024-06-10 18:17:24,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.36 | bwd_microstep: 1207.54 | bwd_inner_microstep: 1207.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 18:17:26,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.22 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2216 [2024-06-10 18:17:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.45 | bwd_microstep: 955.56 | bwd_inner_microstep: 955.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 18:17:29,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1242.44 | bwd_inner_microstep: 1242.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 18:17:31,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1286.61 | bwd_inner_microstep: 1286.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 18:17:33,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.87 | bwd_microstep: 1380.27 | bwd_inner_microstep: 1380.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3486 [2024-06-10 18:17:35,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1365.40 | bwd_inner_microstep: 1365.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497 [2024-06-10 18:17:37,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.88 | bwd_microstep: 1501.60 | bwd_inner_microstep: 1501.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443 [2024-06-10 18:17:39,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1448.25 | bwd_inner_microstep: 1448.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-10 18:17:41,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1449.62 | bwd_inner_microstep: 1449.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 18:17:43,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.37 | bwd_inner_microstep: 1405.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3439 [2024-06-10 18:17:44,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.66 | bwd_microstep: 1378.94 | bwd_inner_microstep: 1378.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2245 [2024-06-10 18:17:46,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.42 | bwd_microstep: 968.74 | bwd_inner_microstep: 968.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548 [2024-06-10 18:17:48,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1427.43 | bwd_inner_microstep: 1427.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659 [2024-06-10 18:17:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.18 | bwd_microstep: 1421.26 | bwd_inner_microstep: 1421.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3672 [2024-06-10 18:17:52,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.79 | bwd_microstep: 1529.28 | bwd_inner_microstep: 1529.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 18:17:54,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-10 18:17:56,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.48 | bwd_microstep: 1549.79 | bwd_inner_microstep: 1549.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 18:17:58,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484 [2024-06-10 18:18:00,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.64 | bwd_microstep: 1343.29 | bwd_inner_microstep: 1343.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2041 [2024-06-10 18:18:01,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.59 | bwd_microstep: 935.40 | bwd_inner_microstep: 935.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 18:18:03,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.51 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 18:18:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.26 | bwd_microstep: 1602.65 | bwd_inner_microstep: 1602.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 18:18:07,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.94 | bwd_microstep: 1643.51 | bwd_inner_microstep: 1643.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3425 [2024-06-10 18:18:09,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.89 | bwd_microstep: 1379.32 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-10 18:18:12,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.16 | optimizer_step: 6.58 [2024-06-10 18:18:12,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.63 | bwd_microstep: 2183.54 | bwd_inner_microstep: 1208.75 | bwd_allreduce_microstep: 974.74 | step_microstep: 37.50 [2024-06-10 18:18:12,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.59 | bwd: 44166.28 | bwd_inner: 43190.58 | bwd_allreduce: 975.00 | step: 39.00 {'loss': 1.2669, 'learning_rate': 1.498694935482559e-05, 'epoch': 0.59} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 18:18:14,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1375.56 | bwd_inner_microstep: 1375.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 18:18:15,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.23 | bwd_microstep: 1283.79 | bwd_inner_microstep: 1283.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2336 [2024-06-10 18:18:17,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.41 | bwd_microstep: 984.23 | bwd_inner_microstep: 984.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2304 [2024-06-10 18:18:18,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.31 | bwd_microstep: 908.29 | bwd_inner_microstep: 908.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4080 [2024-06-10 18:18:20,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.91 | bwd_microstep: 1719.52 | bwd_inner_microstep: 1719.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4102 [2024-06-10 18:18:23,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 712.31 | bwd_microstep: 1939.58 | bwd_inner_microstep: 1939.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 18:18:25,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.98 | bwd_microstep: 1479.91 | bwd_inner_microstep: 1479.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 18:18:27,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 1245.92 | bwd_inner_microstep: 1245.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2105 [2024-06-10 18:18:28,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.53 | bwd_microstep: 760.40 | bwd_inner_microstep: 760.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 18:18:30,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1427.68 | bwd_inner_microstep: 1427.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3592 [2024-06-10 18:18:32,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.86 | bwd_microstep: 1307.74 | bwd_inner_microstep: 1307.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 18:18:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1287.92 | bwd_inner_microstep: 1287.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2463 [2024-06-10 18:18:35,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 981.13 | bwd_inner_microstep: 981.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 18:18:37,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1349.48 | bwd_inner_microstep: 1349.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515 [2024-06-10 18:18:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1348.01 | bwd_inner_microstep: 1347.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-10 18:18:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.80 | bwd_microstep: 1610.36 | bwd_inner_microstep: 1610.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 18:18:43,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.22 | bwd_microstep: 1382.57 | bwd_inner_microstep: 1382.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3630 [2024-06-10 18:18:45,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.10 | bwd_microstep: 1776.63 | bwd_inner_microstep: 1776.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-10 18:18:47,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-10 18:18:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.49 | bwd_microstep: 1521.21 | bwd_inner_microstep: 1521.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 18:18:51,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.65 | bwd_microstep: 1298.42 | bwd_inner_microstep: 1298.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 18:18:53,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1312.30 | bwd_inner_microstep: 1312.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 18:18:55,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.98 | bwd_microstep: 1295.33 | bwd_inner_microstep: 1295.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 18:18:56,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.39 | bwd_microstep: 1178.07 | bwd_inner_microstep: 1178.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3565 [2024-06-10 18:18:58,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.34 | bwd_microstep: 1444.47 | bwd_inner_microstep: 1444.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 18:19:00,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1655.17 | bwd_inner_microstep: 1655.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-10 18:19:03,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1508.24 | bwd_inner_microstep: 1508.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732 [2024-06-10 18:19:05,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.78 | bwd_microstep: 1529.33 | bwd_inner_microstep: 1529.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559 [2024-06-10 18:19:06,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.64 | bwd_microstep: 1333.58 | bwd_inner_microstep: 1333.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 18:19:09,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1603.02 | bwd_inner_microstep: 1602.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3594 [2024-06-10 18:19:11,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1441.27 | bwd_inner_microstep: 1441.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-10 18:19:15,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-10 18:19:15,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 3491.50 | bwd_inner_microstep: 1696.01 | bwd_allreduce_microstep: 1795.45 | step_microstep: 37.76 [2024-06-10 18:19:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16538.89 | bwd: 46225.62 | bwd_inner: 44429.28 | bwd_allreduce: 1795.67 | step: 39.25 {'loss': 1.2366, 'learning_rate': 1.4950622433935786e-05, 'epoch': 0.59} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 18:19:17,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1468.66 | bwd_inner_microstep: 1468.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3935 [2024-06-10 18:19:19,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.17 | bwd_microstep: 1425.22 | bwd_inner_microstep: 1425.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 18:19:21,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.90 | bwd_microstep: 1480.12 | bwd_inner_microstep: 1480.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 18:19:23,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1248.20 | bwd_inner_microstep: 1248.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4170 [2024-06-10 18:19:25,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.85 | bwd_microstep: 1648.26 | bwd_inner_microstep: 1648.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 18:19:27,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.74 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738 [2024-06-10 18:19:29,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 18:19:30,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.58 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 18:19:32,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.74 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1901 [2024-06-10 18:19:33,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 685.72 | bwd_inner_microstep: 685.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 18:19:34,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 792.91 | bwd_inner_microstep: 792.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415 [2024-06-10 18:19:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.22 | bwd_microstep: 1442.00 | bwd_inner_microstep: 1441.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2387 [2024-06-10 18:19:37,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.67 | bwd_microstep: 934.68 | bwd_inner_microstep: 934.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3640 [2024-06-10 18:19:39,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.23 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 18:19:41,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700 [2024-06-10 18:19:43,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.06 | bwd_microstep: 1425.88 | bwd_inner_microstep: 1425.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3631 [2024-06-10 18:19:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1348.07 | bwd_inner_microstep: 1348.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 18:19:47,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.85 | bwd_microstep: 1559.64 | bwd_inner_microstep: 1559.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2375 [2024-06-10 18:19:49,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.74 | bwd_microstep: 934.53 | bwd_inner_microstep: 934.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442 [2024-06-10 18:19:50,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.02 | bwd_microstep: 1157.22 | bwd_inner_microstep: 1157.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 18:19:52,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1511.31 | bwd_inner_microstep: 1511.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 18:19:54,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.53 | bwd_microstep: 1313.68 | bwd_inner_microstep: 1313.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 18:19:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.70 | bwd_microstep: 1501.46 | bwd_inner_microstep: 1501.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 18:19:58,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.06 | bwd_microstep: 1158.60 | bwd_inner_microstep: 1158.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 18:20:00,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1251.09 | bwd_inner_microstep: 1251.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3467 [2024-06-10 18:20:01,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.13 | bwd_microstep: 1246.06 | bwd_inner_microstep: 1246.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434 [2024-06-10 18:20:03,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.77 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601 [2024-06-10 18:20:05,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.08 | bwd_microstep: 1706.96 | bwd_inner_microstep: 1706.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229 [2024-06-10 18:20:07,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.07 | bwd_microstep: 959.55 | bwd_inner_microstep: 959.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-10 18:20:08,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.72 | bwd_microstep: 968.79 | bwd_inner_microstep: 968.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3645 [2024-06-10 18:20:10,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.79 | bwd_microstep: 1680.35 | bwd_inner_microstep: 1680.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 18:20:16,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 18:20:16,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 5127.50 | bwd_inner_microstep: 1579.81 | bwd_allreduce_microstep: 3547.64 | step_microstep: 38.07 [2024-06-10 18:20:16,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15631.98 | bwd: 45394.43 | bwd_inner: 41845.89 | bwd_allreduce: 3547.87 | step: 39.57 {'loss': 1.185, 'learning_rate': 1.491431329692751e-05, 'epoch': 0.59} 59%|█████▉ | 1021/1726 [17:37:41<12:00:21, 61.31s/it] 59%|█████▉ | 1022/1726 [17:38:44<12:03:59, 61.70s/it] 59%|█████▉ | 1022/1726 [17:38:44<12:03:59, 61.70s/it] 59%|█████▉ | 1023/1726 [17:39:48<12:12:11, 62.49s/it] 59%|█████▉ | 1023/1726 [17:39:48<12:12:11, 62.49s/it] 59%|█████▉ | 1024/1726 [17:40:48<12:04:21, 61.91s/it] 59%|█████▉ | 1024/1726 [17:40:48<12:04:21, 61.91s/it] 59%|█████▉ | 1025/1726 [17:41:52<12:07:30, 62.27s/it] 59%|█████▉ | 1025/1726 [17:41:52<12:07:30, 62.27s/it] 59%|█████▉ | 1026/1726 [17:42:53<12:03:18, 62.00s/it] 59%|███�dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411 [2024-06-10 18:20:18,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.28 | bwd_microstep: 1368.69 | bwd_inner_microstep: 1368.54 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3923 [2024-06-10 18:20:20,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1488.22 | bwd_inner_microstep: 1488.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 18:20:21,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.52 | bwd_microstep: 786.22 | bwd_inner_microstep: 786.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 18:20:23,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1287.44 | bwd_inner_microstep: 1287.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 18:20:25,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1249.94 | bwd_inner_microstep: 1249.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 18:20:26,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1185.63 | bwd_inner_microstep: 1185.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2806 [2024-06-10 18:20:28,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.31 | bwd_microstep: 1203.32 | bwd_inner_microstep: 1203.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246 [2024-06-10 18:20:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.44 | bwd_microstep: 965.83 | bwd_inner_microstep: 965.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-10 18:20:31,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1243.07 | bwd_inner_microstep: 1243.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 18:20:33,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1385.74 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3692 [2024-06-10 18:20:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.73 | bwd_microstep: 1546.72 | bwd_inner_microstep: 1546.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3705 [2024-06-10 18:20:37,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1483.70 | bwd_inner_microstep: 1483.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3651 [2024-06-10 18:20:39,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.31 | bwd_microstep: 1541.43 | bwd_inner_microstep: 1541.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 18:20:41,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.77 | bwd_microstep: 1381.76 | bwd_inner_microstep: 1381.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 867 [2024-06-10 18:20:42,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.82 | bwd_microstep: 365.06 | bwd_inner_microstep: 365.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2435 [2024-06-10 18:20:43,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.51 | bwd_microstep: 944.92 | bwd_inner_microstep: 944.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 18:20:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.15 | bwd_microstep: 1286.67 | bwd_inner_microstep: 1286.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 18:20:47,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1350.65 | bwd_inner_microstep: 1350.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-10 18:20:48,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 800.24 | bwd_inner_microstep: 800.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2085 [2024-06-10 18:20:49,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.18 | bwd_microstep: 916.22 | bwd_inner_microstep: 916.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-10 18:20:51,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1402.39 | bwd_inner_microstep: 1402.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2057 [2024-06-10 18:20:52,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.44 | bwd_microstep: 940.97 | bwd_inner_microstep: 940.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2379 [2024-06-10 18:20:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.02 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3516 [2024-06-10 18:20:56,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.66 | bwd_microstep: 1545.34 | bwd_inner_microstep: 1545.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 18:20:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1488.54 | bwd_inner_microstep: 1488.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3594 [2024-06-10 18:21:00,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.53 | bwd_microstep: 1367.56 | bwd_inner_microstep: 1367.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830 [2024-06-10 18:21:02,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.50 | bwd_microstep: 1752.16 | bwd_inner_microstep: 1752.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3790 [2024-06-10 18:21:05,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.54 | bwd_microstep: 1848.86 | bwd_inner_microstep: 1848.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3777 [2024-06-10 18:21:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.76 | bwd_microstep: 1609.47 | bwd_inner_microstep: 1609.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1890 [2024-06-10 18:21:08,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.59 | bwd_microstep: 775.74 | bwd_inner_microstep: 775.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2193 [2024-06-10 18:21:09,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.78 | bwd_microstep: 954.10 | bwd_inner_microstep: 954.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-10 18:21:22,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.59 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 18:21:22,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.01 | bwd_microstep: 11924.49 | bwd_inner_microstep: 1456.81 | bwd_allreduce_microstep: 10467.62 | step_microstep: 39.11 [2024-06-10 18:21:22,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14890.35 | bwd: 50418.93 | bwd_inner: 39950.28 | bwd_allreduce: 10467.91 | step: 40.64 {'loss': 1.1816, 'learning_rate': 1.4878022071681368e-05, 'epoch': 0.59} dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3398 [2024-06-10 18:21:24,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1378.31 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442 [2024-06-10 18:21:26,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1400.42 | bwd_inner_microstep: 1400.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 18:21:28,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1395.23 | bwd_inner_microstep: 1395.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2348 [2024-06-10 18:21:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.42 | bwd_microstep: 918.69 | bwd_inner_microstep: 918.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 18:21:31,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1444.87 | bwd_inner_microstep: 1444.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 18:21:33,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.59 | bwd_microstep: 1374.41 | bwd_inner_microstep: 1374.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 18:21:35,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.88 | bwd_microstep: 1339.32 | bwd_inner_microstep: 1339.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 18:21:37,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1525.65 | bwd_inner_microstep: 1525.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 18:21:38,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.35 | bwd_microstep: 1146.26 | bwd_inner_microstep: 1146.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696 [2024-06-10 18:21:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.67 | bwd_microstep: 1427.30 | bwd_inner_microstep: 1427.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-10 18:21:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 18:21:43,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1342.78 | bwd_inner_microstep: 1342.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1468 [2024-06-10 18:21:44,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 199.22 | bwd_microstep: 514.87 | bwd_inner_microstep: 514.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 18:21:46,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.38 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 18:21:48,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 18:21:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1491.35 | bwd_inner_microstep: 1491.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 18:21:52,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1242.72 | bwd_inner_microstep: 1242.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 18:21:54,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1489.80 | bwd_inner_microstep: 1489.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3475 [2024-06-10 18:21:56,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1506.57 | bwd_inner_microstep: 1506.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1967 [2024-06-10 18:21:57,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 764.37 | bwd_inner_microstep: 764.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 18:21:59,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 527 [2024-06-10 18:21:59,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 96.71 | bwd_microstep: 241.94 | bwd_inner_microstep: 241.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 18:22:01,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1490.27 | bwd_inner_microstep: 1490.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3550 [2024-06-10 18:22:03,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1262.96 | bwd_inner_microstep: 1262.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957 [2024-06-10 18:22:04,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 702.52 | bwd_inner_microstep: 702.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579 [2024-06-10 18:22:06,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1426.99 | bwd_inner_microstep: 1426.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2292 [2024-06-10 18:22:07,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.20 | bwd_microstep: 1007.80 | bwd_inner_microstep: 1007.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3733 [2024-06-10 18:22:09,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1464.50 | bwd_inner_microstep: 1464.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 18:22:11,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.65 | bwd_microstep: 1481.93 | bwd_inner_microstep: 1481.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425 [2024-06-10 18:22:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.94 | bwd_microstep: 1042.35 | bwd_inner_microstep: 1042.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2585 [2024-06-10 18:22:15,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.52 | bwd_microstep: 1161.23 | bwd_inner_microstep: 1161.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3578 [2024-06-10 18:22:23,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.39 | optimizer_step: 6.62 [2024-06-10 18:22:23,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 7608.32 | bwd_inner_microstep: 1651.65 | bwd_allreduce_microstep: 5956.59 | step_microstep: 40.04 [2024-06-10 18:22:23,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14847.81 | bwd: 45744.91 | bwd_inner: 39787.38 | bwd_allreduce: 5956.83 | step: 41.61 {'loss': 1.2242, 'learning_rate': 1.4841748886014866e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 18:22:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1373.11 | bwd_inner_microstep: 1373.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882 [2024-06-10 18:22:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.64 | bwd_microstep: 679.80 | bwd_inner_microstep: 679.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2366 [2024-06-10 18:22:27,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.38 | bwd_microstep: 888.06 | bwd_inner_microstep: 888.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850 [2024-06-10 18:22:29,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1455.53 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 18:22:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.68 | bwd_microstep: 1276.19 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1760 [2024-06-10 18:22:31,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 243.11 | bwd_microstep: 624.43 | bwd_inner_microstep: 624.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 18:22:33,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1394.15 | bwd_inner_microstep: 1394.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 18:22:35,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1244.62 | bwd_inner_microstep: 1244.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 18:22:37,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 18:22:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.67 | bwd_microstep: 1380.96 | bwd_inner_microstep: 1380.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3441 [2024-06-10 18:22:41,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1319.37 | bwd_inner_microstep: 1319.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3673 [2024-06-10 18:22:43,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.47 | bwd_microstep: 1821.68 | bwd_inner_microstep: 1821.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3693 [2024-06-10 18:22:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1658.59 | bwd_inner_microstep: 1658.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2125 [2024-06-10 18:22:47,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.38 | bwd_microstep: 858.60 | bwd_inner_microstep: 858.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433 [2024-06-10 18:22:48,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1153.89 | bwd_inner_microstep: 1153.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 18:22:50,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1282.68 | bwd_inner_microstep: 1282.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638 [2024-06-10 18:22:52,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.54 | bwd_microstep: 1311.74 | bwd_inner_microstep: 1311.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518 [2024-06-10 18:22:54,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.55 | bwd_microstep: 1191.61 | bwd_inner_microstep: 1191.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3818 [2024-06-10 18:22:55,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.69 | bwd_microstep: 1258.90 | bwd_inner_microstep: 1258.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 18:22:56,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.78 | bwd_microstep: 696.82 | bwd_inner_microstep: 696.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 18:22:58,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1497.21 | bwd_inner_microstep: 1497.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 18:23:00,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.88 | bwd_microstep: 818.65 | bwd_inner_microstep: 818.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-10 18:23:01,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.75 | bwd_microstep: 1191.07 | bwd_inner_microstep: 1191.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 18:23:03,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.49 | bwd_microstep: 1657.10 | bwd_inner_microstep: 1657.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 18:23:05,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1459.30 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3608 [2024-06-10 18:23:07,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1310.70 | bwd_inner_microstep: 1310.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 18:23:09,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1396.83 | bwd_inner_microstep: 1396.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 18:23:11,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.37 | bwd_microstep: 1452.46 | bwd_inner_microstep: 1452.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033 [2024-06-10 18:23:12,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.00 | bwd_microstep: 809.97 | bwd_inner_microstep: 809.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2068 [2024-06-10 18:23:14,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.78 | bwd_microstep: 867.47 | bwd_inner_microstep: 867.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 18:23:15,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1352.95 | bwd_inner_microstep: 1352.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615 [2024-06-10 18:23:23,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.34 | optimizer_step: 6.61 [2024-06-10 18:23:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 6987.14 | bwd_inner_microstep: 1503.94 | bwd_allreduce_microstep: 5483.14 | step_microstep: 38.57 [2024-06-10 18:23:23,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14820.77 | bwd: 45057.04 | bwd_inner: 39572.99 | bwd_allreduce: 5483.37 | step: 40.03 {'loss': 1.2432, 'learning_rate': 1.4805493867681969e-05, 'epoch': 0.6} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 18:23:25,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1470.99 | bwd_inner_microstep: 1470.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2431 [2024-06-10 18:23:26,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 967.75 | bwd_inner_microstep: 967.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2433 [2024-06-10 18:23:28,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.40 | bwd_microstep: 974.20 | bwd_inner_microstep: 974.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 18:23:30,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.85 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2251 [2024-06-10 18:23:31,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.78 | bwd_microstep: 962.81 | bwd_inner_microstep: 962.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 18:23:33,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1243.88 | bwd_inner_microstep: 1243.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724 [2024-06-10 18:23:35,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1362.60 | bwd_inner_microstep: 1362.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-10 18:23:36,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1959 [2024-06-10 18:23:37,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 702.63 | bwd_inner_microstep: 702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 18:23:39,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1296.80 | bwd_inner_microstep: 1296.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1984 [2024-06-10 18:23:40,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 827.32 | bwd_inner_microstep: 827.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2456 [2024-06-10 18:23:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.29 | bwd_microstep: 948.12 | bwd_inner_microstep: 948.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1968 [2024-06-10 18:23:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.76 | bwd_microstep: 822.30 | bwd_inner_microstep: 822.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-10 18:23:45,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1578.76 | bwd_inner_microstep: 1578.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 18:23:47,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1508.85 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 18:23:48,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.00 | bwd_microstep: 698.92 | bwd_inner_microstep: 698.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632 [2024-06-10 18:23:50,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 18:23:52,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.04 | bwd_microstep: 1255.57 | bwd_inner_microstep: 1255.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104 [2024-06-10 18:23:53,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.30 | bwd_microstep: 826.26 | bwd_inner_microstep: 826.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 18:23:55,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3705 [2024-06-10 18:23:57,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.31 | bwd_microstep: 1267.10 | bwd_inner_microstep: 1267.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533 [2024-06-10 18:23:59,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.83 | bwd_microstep: 1417.05 | bwd_inner_microstep: 1417.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 18:24:00,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.05 | bwd_microstep: 1295.59 | bwd_inner_microstep: 1295.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797 [2024-06-10 18:24:03,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1549.80 | bwd_inner_microstep: 1549.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-10 18:24:05,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1557.61 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1931 [2024-06-10 18:24:06,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.05 | bwd_microstep: 727.58 | bwd_inner_microstep: 727.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068 [2024-06-10 18:24:07,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.29 | bwd_microstep: 913.47 | bwd_inner_microstep: 913.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3634 [2024-06-10 18:24:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.66 | bwd_microstep: 1639.07 | bwd_inner_microstep: 1639.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3604 [2024-06-10 18:24:11,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1463.63 | bwd_inner_microstep: 1463.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807 [2024-06-10 18:24:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.43 | bwd_microstep: 1614.07 | bwd_inner_microstep: 1614.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781 [2024-06-10 18:24:15,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.49 | bwd_microstep: 1449.58 | bwd_inner_microstep: 1449.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3578 [2024-06-10 18:24:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 18:24:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 6493.27 | bwd_inner_microstep: 1770.40 | bwd_allreduce_microstep: 4722.81 | step_microstep: 37.95 [2024-06-10 18:24:23,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14841.57 | bwd: 44495.22 | bwd_inner: 39771.50 | bwd_allreduce: 4723.04 | step: 39.46 {'loss': 1.1959, 'learning_rate': 1.4769257144372668e-05, 'epoch': 0.6} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473 [2024-06-10 18:24:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1402.31 | bwd_inner_microstep: 1402.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473 [2024-06-10 18:24:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1196.08 | bwd_inner_microstep: 1196.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3882 [2024-06-10 18:24:29,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.91 | bwd_microstep: 1678.81 | bwd_inner_microstep: 1678.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782 [2024-06-10 18:24:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.80 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-10 18:24:33,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.49 | bwd_microstep: 1543.21 | bwd_inner_microstep: 1543.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 18:24:34,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1380.29 | bwd_inner_microstep: 1380.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-10 18:24:36,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.94 | bwd_microstep: 1153.52 | bwd_inner_microstep: 1153.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4067 [2024-06-10 18:24:38,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.84 | bwd_microstep: 1625.31 | bwd_inner_microstep: 1625.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3701 [2024-06-10 18:24:40,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.29 | bwd_microstep: 1356.21 | bwd_inner_microstep: 1356.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 18:24:42,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.61 | bwd_microstep: 1388.23 | bwd_inner_microstep: 1388.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 18:24:44,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1527.22 | bwd_inner_microstep: 1527.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 18:24:46,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.01 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 18:24:48,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1477.81 | bwd_inner_microstep: 1477.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526 [2024-06-10 18:24:50,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1325.70 | bwd_inner_microstep: 1325.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421 [2024-06-10 18:24:52,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1443.23 | bwd_inner_microstep: 1443.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3701 [2024-06-10 18:24:54,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1471.85 | bwd_inner_microstep: 1471.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2575 [2024-06-10 18:24:55,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.85 | bwd_microstep: 972.59 | bwd_inner_microstep: 972.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 18:24:57,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1415.96 | bwd_inner_microstep: 1415.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 18:24:59,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.85 | bwd_microstep: 1251.71 | bwd_inner_microstep: 1251.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-10 18:25:01,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.31 | bwd_microstep: 1198.31 | bwd_inner_microstep: 1198.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 18:25:02,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.45 | bwd_microstep: 797.76 | bwd_inner_microstep: 797.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 18:25:03,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 18:25:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.99 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 18:25:07,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1297.65 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3675 [2024-06-10 18:25:09,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1374.71 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3429 [2024-06-10 18:25:11,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1511.99 | bwd_inner_microstep: 1511.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 18:25:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.30 | bwd_microstep: 1544.36 | bwd_inner_microstep: 1544.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3558 [2024-06-10 18:25:15,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.75 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2282 [2024-06-10 18:25:17,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1066.64 | bwd_inner_microstep: 1066.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-10 18:25:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.83 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2198 [2024-06-10 18:25:20,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.17 | bwd_microstep: 985.22 | bwd_inner_microstep: 985.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743 [2024-06-10 18:25:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.20 | optimizer_step: 6.58 [2024-06-10 18:25:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.12 | bwd_microstep: 4053.22 | bwd_inner_microstep: 2151.31 | bwd_allreduce_microstep: 1901.86 | step_microstep: 37.91 [2024-06-10 18:25:25,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16276.10 | bwd: 45826.63 | bwd_inner: 43923.82 | bwd_allreduce: 1902.11 | step: 39.48 {'loss': 1.2071, 'learning_rate': 1.4733038843712515e-05, 'epoch': 0.6} ��█▉ | 1026/1726 [17:42:53<12:03:18, 62.00s/it] 60%|█████▉ | 1027/1726 [17:43:59<12:15:03, 63.09s/it] 60%|█████▉ | 1027/1726 [17:43:59<12:15:03, 63.09s/it] 60%|█████▉ | 1028/1726 [17:44:59<12:06:25, 62.44s/it] 60%|█████▉ | 1028/1726 [17:44:59<12:06:25, 62.44s/it] 60%|█████▉ | 1029/1726 [17:46:00<11:57:33, 61.77s/it] 60%|█████▉ | 1029/1726 [17:46:00<11:57:33, 61.77s/it] 60%|█████▉ | 1030/1726 [17:46:59<11:49:11, 61.14s/it] 60%|█████▉ | 1030/1726 [17:46:59<11:49:11, 61.14s/it] 60%|█████▉ | 1031/1726 [17:48:02<11:52:43, 61.53s/it] 60%|█████▉ | 1031/1726 [17:4dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 18:25:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1335.51 | bwd_inner_microstep: 1335.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947 [2024-06-10 18:25:29,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.15 | bwd_microstep: 1689.53 | bwd_inner_microstep: 1689.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 18:25:31,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1395.84 | bwd_inner_microstep: 1395.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 18:25:33,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.51 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 18:25:35,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1387.03 | bwd_inner_microstep: 1387.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 18:25:37,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.13 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 18:25:38,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.82 | bwd_microstep: 1151.03 | bwd_inner_microstep: 1151.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 18:25:40,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.02 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768 [2024-06-10 18:25:42,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 1440.25 | bwd_inner_microstep: 1440.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 18:25:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.93 | bwd_microstep: 1282.53 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523 [2024-06-10 18:25:46,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1328.92 | bwd_inner_microstep: 1328.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 18:25:48,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.83 | bwd_microstep: 1590.92 | bwd_inner_microstep: 1590.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491 [2024-06-10 18:25:50,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1580.03 | bwd_inner_microstep: 1580.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3694 [2024-06-10 18:25:52,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.60 | bwd_microstep: 1390.40 | bwd_inner_microstep: 1390.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 18:25:54,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.17 | bwd_microstep: 1178.65 | bwd_inner_microstep: 1178.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2082 [2024-06-10 18:25:55,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 820.42 | bwd_inner_microstep: 820.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032 [2024-06-10 18:25:56,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.99 | bwd_microstep: 715.24 | bwd_inner_microstep: 715.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1987 [2024-06-10 18:25:57,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.47 | bwd_microstep: 862.53 | bwd_inner_microstep: 862.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2289 [2024-06-10 18:25:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.04 | bwd_microstep: 911.05 | bwd_inner_microstep: 911.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2311 [2024-06-10 18:26:00,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 885.86 | bwd_inner_microstep: 885.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3588 [2024-06-10 18:26:02,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.73 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3484 [2024-06-10 18:26:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.08 | bwd_microstep: 1347.88 | bwd_inner_microstep: 1347.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 18:26:06,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.13 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-10 18:26:08,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.47 | bwd_microstep: 1544.10 | bwd_inner_microstep: 1544.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 18:26:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.27 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199 [2024-06-10 18:26:11,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.77 | bwd_microstep: 957.12 | bwd_inner_microstep: 957.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-10 18:26:13,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.61 | bwd_microstep: 1488.16 | bwd_inner_microstep: 1488.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 18:26:15,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1561.13 | bwd_inner_microstep: 1561.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 18:26:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1375.32 | bwd_inner_microstep: 1375.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3458 [2024-06-10 18:26:19,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.81 | bwd_microstep: 1408.25 | bwd_inner_microstep: 1408.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 18:26:21,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.10 | bwd_microstep: 1346.96 | bwd_inner_microstep: 1346.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 18:26:26,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-10 18:26:26,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.54 | bwd_microstep: 4366.92 | bwd_inner_microstep: 1813.57 | bwd_allreduce_microstep: 2553.30 | step_microstep: 37.68 [2024-06-10 18:26:26,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15697.91 | bwd: 44619.25 | bwd_inner: 42065.05 | bwd_allreduce: 2553.54 | step: 39.24 {'loss': 1.1826, 'learning_rate': 1.469683909326217e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 18:26:28,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1330.63 | bwd_inner_microstep: 1330.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4074 [2024-06-10 18:26:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.54 | bwd_microstep: 1721.81 | bwd_inner_microstep: 1721.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3885 [2024-06-10 18:26:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1416.16 | bwd_inner_microstep: 1416.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-10 18:26:35,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.89 | bwd_microstep: 2436.50 | bwd_inner_microstep: 2436.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744 [2024-06-10 18:26:37,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.16 | bwd_microstep: 1461.93 | bwd_inner_microstep: 1461.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 18:26:39,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.31 | bwd_microstep: 1246.31 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 18:26:40,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.94 | bwd_microstep: 1281.66 | bwd_inner_microstep: 1281.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467 [2024-06-10 18:26:42,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1182.41 | bwd_inner_microstep: 1182.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:26:44,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1382.61 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 18:26:46,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1345.91 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668 [2024-06-10 18:26:48,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.97 | bwd_microstep: 1448.67 | bwd_inner_microstep: 1448.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443 [2024-06-10 18:26:50,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.37 | bwd_microstep: 1281.04 | bwd_inner_microstep: 1281.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659 [2024-06-10 18:26:52,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.05 | bwd_microstep: 1716.61 | bwd_inner_microstep: 1716.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 18:26:54,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.11 | bwd_microstep: 1346.16 | bwd_inner_microstep: 1346.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 18:26:56,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1473.78 | bwd_inner_microstep: 1473.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2095 [2024-06-10 18:26:57,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.94 | bwd_microstep: 927.74 | bwd_inner_microstep: 927.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105 [2024-06-10 18:26:58,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.00 | bwd_microstep: 920.71 | bwd_inner_microstep: 920.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687 [2024-06-10 18:27:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624 [2024-06-10 18:27:03,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1535.07 | bwd_inner_microstep: 1535.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-10 18:27:04,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.40 | bwd_microstep: 1352.64 | bwd_inner_microstep: 1352.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686 [2024-06-10 18:27:06,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.86 | bwd_microstep: 1429.17 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 18:27:08,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1449.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 18:27:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.22 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 18:27:12,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1277.66 | bwd_inner_microstep: 1277.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 18:27:13,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.86 | bwd_microstep: 802.48 | bwd_inner_microstep: 802.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3543 [2024-06-10 18:27:15,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.91 | bwd_microstep: 1442.84 | bwd_inner_microstep: 1442.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2221 [2024-06-10 18:27:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.85 | bwd_microstep: 960.17 | bwd_inner_microstep: 960.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 18:27:18,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1376.47 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 18:27:21,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1549.96 | bwd_inner_microstep: 1549.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3765 [2024-06-10 18:27:23,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.97 | bwd_microstep: 1681.94 | bwd_inner_microstep: 1681.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 18:27:25,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.04 | bwd_microstep: 1644.46 | bwd_inner_microstep: 1644.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 18:27:27,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.16 | optimizer_step: 6.63 [2024-06-10 18:27:27,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1325.19 | bwd_inner_microstep: 1316.62 | bwd_allreduce_microstep: 8.53 | step_microstep: 37.64 [2024-06-10 18:27:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16365.48 | bwd: 44665.08 | bwd_inner: 44655.65 | bwd_allreduce: 8.76 | step: 39.08 {'loss': 1.2214, 'learning_rate': 1.4660658020516966e-05, 'epoch': 0.6} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2462 [2024-06-10 18:27:28,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.31 | bwd_microstep: 1034.88 | bwd_inner_microstep: 1034.81 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 18:27:30,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.37 | bwd_microstep: 1274.35 | bwd_inner_microstep: 1274.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 18:27:32,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2231 [2024-06-10 18:27:33,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.33 | bwd_microstep: 769.01 | bwd_inner_microstep: 768.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-10 18:27:35,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1632.62 | bwd_inner_microstep: 1632.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-10 18:27:37,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.98 | bwd_microstep: 1187.59 | bwd_inner_microstep: 1187.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 18:27:39,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1393.70 | bwd_inner_microstep: 1393.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-10 18:27:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.92 | bwd_microstep: 1189.46 | bwd_inner_microstep: 1189.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 18:27:42,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-10 18:27:44,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1483.82 | bwd_inner_microstep: 1483.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 18:27:46,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1278.37 | bwd_inner_microstep: 1278.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428 [2024-06-10 18:27:48,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.67 | bwd_microstep: 1301.24 | bwd_inner_microstep: 1301.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 18:27:50,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1253.35 | bwd_inner_microstep: 1253.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3637 [2024-06-10 18:27:52,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1539.86 | bwd_inner_microstep: 1539.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 18:27:53,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.14 | bwd_microstep: 795.42 | bwd_inner_microstep: 795.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2490 [2024-06-10 18:27:54,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.06 | bwd_microstep: 954.16 | bwd_inner_microstep: 954.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-10 18:27:56,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.14 | bwd_microstep: 1514.05 | bwd_inner_microstep: 1514.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 18:27:59,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.08 | bwd_microstep: 1658.21 | bwd_inner_microstep: 1658.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 18:28:01,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1487.23 | bwd_inner_microstep: 1487.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 18:28:02,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.35 | bwd_microstep: 1284.60 | bwd_inner_microstep: 1284.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 18:28:04,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1395.38 | bwd_inner_microstep: 1395.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275 [2024-06-10 18:28:06,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.79 | bwd_microstep: 877.11 | bwd_inner_microstep: 877.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 18:28:07,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 803.56 | bwd_inner_microstep: 803.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-10 18:28:09,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1313.55 | bwd_inner_microstep: 1313.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2687 [2024-06-10 18:28:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.01 | bwd_microstep: 1222.09 | bwd_inner_microstep: 1222.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3559 [2024-06-10 18:28:12,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.76 | bwd_microstep: 1250.44 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1934 [2024-06-10 18:28:13,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.14 | bwd_microstep: 761.24 | bwd_inner_microstep: 761.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 18:28:15,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1493.54 | bwd_inner_microstep: 1493.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3425 [2024-06-10 18:28:17,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.98 | bwd_microstep: 1541.44 | bwd_inner_microstep: 1541.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572 [2024-06-10 18:28:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1594.27 | bwd_inner_microstep: 1594.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-10 18:28:21,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1506.27 | bwd_inner_microstep: 1506.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562 [2024-06-10 18:28:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-10 18:28:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 6697.75 | bwd_inner_microstep: 1713.46 | bwd_allreduce_microstep: 4984.22 | step_microstep: 38.86 [2024-06-10 18:28:29,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15354.46 | bwd: 46081.76 | bwd_inner: 41096.56 | bwd_allreduce: 4984.49 | step: 40.36 {'loss': 1.2168, 'learning_rate': 1.4624495752906472e-05, 'epoch': 0.6} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2013 [2024-06-10 18:28:30,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 890.37 | bwd_inner_microstep: 890.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3973 [2024-06-10 18:28:32,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.63 | bwd_microstep: 1306.55 | bwd_inner_microstep: 1306.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856 [2024-06-10 18:28:34,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.67 | bwd_microstep: 1553.70 | bwd_inner_microstep: 1553.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:28:36,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1376.02 | bwd_inner_microstep: 1376.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 18:28:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1480.30 | bwd_inner_microstep: 1480.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 18:28:40,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1244.06 | bwd_inner_microstep: 1244.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 18:28:41,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 18:28:43,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 18:28:45,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1290.08 | bwd_inner_microstep: 1290.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 18:28:47,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1288.79 | bwd_inner_microstep: 1288.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 18:28:49,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1344.16 | bwd_inner_microstep: 1344.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3448 [2024-06-10 18:28:51,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1410.91 | bwd_inner_microstep: 1410.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3569 [2024-06-10 18:28:53,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1457.52 | bwd_inner_microstep: 1457.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 18:28:54,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.92 | bwd_microstep: 1341.03 | bwd_inner_microstep: 1341.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2350 [2024-06-10 18:28:56,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.51 | bwd_microstep: 990.36 | bwd_inner_microstep: 990.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103 [2024-06-10 18:28:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.26 | bwd_microstep: 921.54 | bwd_inner_microstep: 921.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 18:28:59,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.52 | bwd_microstep: 1339.38 | bwd_inner_microstep: 1339.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3640 [2024-06-10 18:29:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1472.91 | bwd_inner_microstep: 1472.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 18:29:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1292.51 | bwd_inner_microstep: 1292.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 18:29:05,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1459.01 | bwd_inner_microstep: 1458.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 643 [2024-06-10 18:29:05,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.95 | bwd_microstep: 274.14 | bwd_inner_microstep: 274.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834 [2024-06-10 18:29:07,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.68 | bwd_microstep: 1660.00 | bwd_inner_microstep: 1659.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 18:29:10,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1555.94 | bwd_inner_microstep: 1555.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 18:29:12,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1657.47 | bwd_inner_microstep: 1657.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 18:29:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1557.30 | bwd_inner_microstep: 1557.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 18:29:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-10 18:29:17,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.43 | bwd_microstep: 809.15 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 18:29:19,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1423.65 | bwd_inner_microstep: 1423.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 18:29:21,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1494.94 | bwd_inner_microstep: 1494.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 18:29:23,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.34 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583 [2024-06-10 18:29:25,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.10 | bwd_microstep: 1524.41 | bwd_inner_microstep: 1524.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 18:29:30,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 18:29:30,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.90 | bwd_microstep: 3627.24 | bwd_inner_microstep: 1750.24 | bwd_allreduce_microstep: 1876.93 | step_microstep: 38.15 [2024-06-10 18:29:30,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15895.33 | bwd: 44520.06 | bwd_inner: 42642.18 | bwd_allreduce: 1877.17 | step: 39.74 {'loss': 1.1991, 'learning_rate': 1.4588352417793976e-05, 'epoch': 0.6} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 18:29:32,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.08 | bwd_microstep: 1488.81 | bwd_inner_microstep: 1488.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3984 [2024-06-10 18:29:34,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.39 | bwd_microstep: 1703.21 | bwd_inner_microstep: 1703.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476 [2024-06-10 18:29:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.98 | bwd_microstep: 1212.79 | bwd_inner_microstep: 1212.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 18:29:37,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.44 | bwd_microstep: 1313.08 | bwd_inner_microstep: 1313.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 18:29:39,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1380.38 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 18:29:41,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1389.99 | bwd_inner_microstep: 1389.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 18:29:43,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.44 | bwd_microstep: 1149.75 | bwd_inner_microstep: 1149.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 18:29:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-10 18:29:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.62 | bwd_microstep: 1153.56 | bwd_inner_microstep: 1153.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1900 [2024-06-10 18:29:47,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.31 | bwd_microstep: 748.49 | bwd_inner_microstep: 748.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3507 [2024-06-10 18:29:49,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1437.50 | bwd_inner_microstep: 1437.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950 [2024-06-10 18:29:50,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.43 | bwd_microstep: 728.42 | bwd_inner_microstep: 728.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2493 [2024-06-10 18:29:52,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.93 | bwd_microstep: 1119.14 | bwd_inner_microstep: 1119.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 18:29:54,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-10 18:29:56,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.65 | bwd_microstep: 1276.60 | bwd_inner_microstep: 1276.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 18:29:58,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1524.38 | bwd_inner_microstep: 1524.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3643 [2024-06-10 18:30:00,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.68 | bwd_microstep: 1680.91 | bwd_inner_microstep: 1680.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 18:30:02,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.67 | bwd_microstep: 1396.08 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 18:30:04,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.93 | bwd_microstep: 1525.91 | bwd_inner_microstep: 1525.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 18:30:06,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 18:30:08,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3036 [2024-06-10 18:30:10,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.79 | bwd_microstep: 1230.31 | bwd_inner_microstep: 1230.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3642 [2024-06-10 18:30:12,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1437.86 | bwd_inner_microstep: 1437.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3593 [2024-06-10 18:30:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.82 | bwd_microstep: 1355.04 | bwd_inner_microstep: 1355.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 18:30:16,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1497.86 | bwd_inner_microstep: 1497.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3032 [2024-06-10 18:30:17,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.61 | bwd_microstep: 1170.48 | bwd_inner_microstep: 1170.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 18:30:19,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.81 | bwd_microstep: 1470.94 | bwd_inner_microstep: 1470.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805 [2024-06-10 18:30:22,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.58 | bwd_microstep: 1752.40 | bwd_inner_microstep: 1752.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 18:30:23,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1345.86 | bwd_inner_microstep: 1345.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384 [2024-06-10 18:30:25,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1430.93 | bwd_inner_microstep: 1430.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 18:30:27,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1454.96 | bwd_inner_microstep: 1454.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 18:30:32,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:30:32,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 4132.50 | bwd_inner_microstep: 1699.53 | bwd_allreduce_microstep: 2432.92 | step_microstep: 37.95 [2024-06-10 18:30:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16250.98 | bwd: 46020.35 | bwd_inner: 43586.52 | bwd_allreduce: 2433.15 | step: 39.44 {'loss': 1.2325, 'learning_rate': 1.4552228142476138e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 18:30:34,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.22 | bwd_microstep: 1379.09 | bwd_inner_microstep: 1379.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3883 [2024-06-10 18:30:36,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.34 | bwd_microstep: 1578.19 | bwd_inner_microstep: 1578.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 18:30:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.30 | bwd_microstep: 1478.93 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 18:30:40,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-10 18:30:42,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.28 | bwd_microstep: 1436.46 | bwd_inner_microstep: 1436.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4075 [2024-06-10 18:30:44,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.36 | bwd_microstep: 1724.27 | bwd_inner_microstep: 1724.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 18:30:46,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.91 | bwd_microstep: 793.35 | bwd_inner_microstep: 793.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 18:30:47,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.51 | bwd_microstep: 1381.99 | bwd_inner_microstep: 1381.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3425 [2024-06-10 18:30:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.42 | bwd_microstep: 1201.97 | bwd_inner_microstep: 1201.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2897 [2024-06-10 18:30:51,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.56 | bwd_microstep: 1133.91 | bwd_inner_microstep: 1133.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-10 18:30:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 891.47 | bwd_inner_microstep: 891.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3383 [2024-06-10 18:30:54,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.69 | bwd_microstep: 1271.90 | bwd_inner_microstep: 1271.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-10 18:30:56,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.66 | bwd_microstep: 1624.72 | bwd_inner_microstep: 1624.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 18:30:58,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.08 | bwd_microstep: 1618.24 | bwd_inner_microstep: 1618.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3512 [2024-06-10 18:31:00,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.02 | bwd_microstep: 1549.56 | bwd_inner_microstep: 1549.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3631 [2024-06-10 18:31:03,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.44 | bwd_microstep: 1710.38 | bwd_inner_microstep: 1710.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 18:31:04,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.05 | bwd_microstep: 795.37 | bwd_inner_microstep: 795.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-10 18:31:05,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.39 | bwd_microstep: 1158.34 | bwd_inner_microstep: 1158.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 18:31:06,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 696.01 | bwd_inner_microstep: 695.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 18:31:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1378.43 | bwd_inner_microstep: 1378.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 18:31:10,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.98 | bwd_microstep: 1487.27 | bwd_inner_microstep: 1487.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 18:31:12,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1391.82 | bwd_inner_microstep: 1391.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3527 [2024-06-10 18:31:14,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1258.65 | bwd_inner_microstep: 1258.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-10 18:31:15,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.89 | bwd_microstep: 878.66 | bwd_inner_microstep: 878.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-10 18:31:17,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1286.82 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603 [2024-06-10 18:31:19,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1311.23 | bwd_inner_microstep: 1311.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064 [2024-06-10 18:31:20,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.83 | bwd_microstep: 818.03 | bwd_inner_microstep: 818.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3838 [2024-06-10 18:31:22,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.39 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 18:31:24,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.74 | bwd_microstep: 1300.32 | bwd_inner_microstep: 1300.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-10 18:31:26,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.04 | bwd_microstep: 1622.94 | bwd_inner_microstep: 1622.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 18:31:28,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1372.95 | bwd_inner_microstep: 1372.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583 [2024-06-10 18:31:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:31:35,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 6270.41 | bwd_inner_microstep: 1646.12 | bwd_allreduce_microstep: 4624.24 | step_microstep: 37.78 [2024-06-10 18:31:35,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15658.50 | bwd: 46573.98 | bwd_inner: 41948.79 | bwd_allreduce: 4624.48 | step: 39.37 8:02<11:52:43, 61.53s/it] 60%|█████▉ | 1032/1726 [17:49:02<11:48:38, 61.27s/it] 60%|█████▉ | 1032/1726 [17:49:02<11:48:38, 61.27s/it] 60%|█████▉ | 1033/1726 [17:50:04<11:47:58, 61.30s/it] 60%|█████▉ | 1033/1726 [17:50:04<11:47:58, 61.30s/it] 60%|█████▉ | 1034/1726 [17:51:06<11:48:33, 61.44s/it] 60%|█████▉ | 1034/1726 [17:51:06<11:48:33, 61.44s/it] 60%|█████▉ | 1035/1726 [17:52:06<11:45:10, 61.23s/it] 60%|█████▉ | 1035/1726 [17:52:06<11:45:10, 61.23s/it] 60%|██████ | 1036/1726 [17:53:09<11:48:53, 61.64s/it] 60%|██████ | 1036/1726 [17:53:09<11:48:53, 61.64s/it] 60{'loss': 1.2181, 'learning_rate': 1.4516123054182457e-05, 'epoch': 0.6} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939 [2024-06-10 18:31:36,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 836.43 | bwd_inner_microstep: 836.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 18:31:38,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 18:31:40,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1551.20 | bwd_inner_microstep: 1551.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509 [2024-06-10 18:31:42,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.08 | bwd_microstep: 1429.10 | bwd_inner_microstep: 1429.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 18:31:44,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 18:31:46,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1380.57 | bwd_inner_microstep: 1380.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181 [2024-06-10 18:31:47,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.29 | bwd_microstep: 949.42 | bwd_inner_microstep: 949.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 18:31:49,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1242.39 | bwd_inner_microstep: 1242.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 18:31:51,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1481.36 | bwd_inner_microstep: 1481.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 18:31:52,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 794.37 | bwd_inner_microstep: 794.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 18:31:54,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501 [2024-06-10 18:31:56,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.56 | bwd_microstep: 1405.96 | bwd_inner_microstep: 1405.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3433 [2024-06-10 18:31:57,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.12 | bwd_microstep: 1295.44 | bwd_inner_microstep: 1295.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2042 [2024-06-10 18:31:59,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 829.56 | bwd_inner_microstep: 829.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2955 [2024-06-10 18:32:00,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.87 | bwd_microstep: 1196.66 | bwd_inner_microstep: 1196.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3702 [2024-06-10 18:32:02,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1557.62 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 18:32:04,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1381.97 | bwd_inner_microstep: 1381.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538 [2024-06-10 18:32:06,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.63 | bwd_microstep: 1360.88 | bwd_inner_microstep: 1360.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3697 [2024-06-10 18:32:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1261.80 | bwd_inner_microstep: 1261.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 18:32:10,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.86 | bwd_microstep: 1495.75 | bwd_inner_microstep: 1495.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1049 [2024-06-10 18:32:11,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 155.53 | bwd_microstep: 401.85 | bwd_inner_microstep: 401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 18:32:13,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1399.36 | bwd_inner_microstep: 1399.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 18:32:14,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1392.82 | bwd_inner_microstep: 1392.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3523 [2024-06-10 18:32:16,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1227.44 | bwd_inner_microstep: 1227.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2133 [2024-06-10 18:32:17,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.58 | bwd_microstep: 831.58 | bwd_inner_microstep: 831.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 18:32:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.43 | bwd_microstep: 805.14 | bwd_inner_microstep: 805.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 18:32:20,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1402.31 | bwd_inner_microstep: 1402.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-10 18:32:22,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1508.65 | bwd_inner_microstep: 1508.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 18:32:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1406.38 | bwd_inner_microstep: 1406.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 18:32:26,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 1511.76 | bwd_inner_microstep: 1511.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3818 [2024-06-10 18:32:29,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3557 [2024-06-10 18:32:38,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-10 18:32:38,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.78 | bwd_microstep: 8639.44 | bwd_inner_microstep: 1841.57 | bwd_allreduce_microstep: 6797.81 | step_microstep: 37.80 [2024-06-10 18:32:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.14 | bwd: 47753.02 | bwd_inner: 40954.29 | bwd_allreduce: 6798.04 | step: 39.28 {'loss': 1.2485, 'learning_rate': 1.4480037280074876e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 18:32:40,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 18:32:42,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.80 | bwd_microstep: 1376.15 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352 [2024-06-10 18:32:43,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.81 | bwd_microstep: 984.11 | bwd_inner_microstep: 984.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4263 [2024-06-10 18:32:46,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.31 | bwd_microstep: 1663.87 | bwd_inner_microstep: 1663.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3401 [2024-06-10 18:32:47,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1303.61 | bwd_inner_microstep: 1303.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 18:32:48,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.91 | bwd_microstep: 791.20 | bwd_inner_microstep: 791.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 18:32:50,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3488 [2024-06-10 18:32:52,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.18 | bwd_microstep: 1215.59 | bwd_inner_microstep: 1215.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 18:32:54,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.56 | bwd_microstep: 1242.33 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426 [2024-06-10 18:32:55,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1250.55 | bwd_inner_microstep: 1250.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3500 [2024-06-10 18:32:57,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1441.59 | bwd_inner_microstep: 1441.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447 [2024-06-10 18:32:59,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.94 | bwd_microstep: 1374.99 | bwd_inner_microstep: 1374.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461 [2024-06-10 18:33:01,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1308.12 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 18:33:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.91 | bwd_microstep: 890.62 | bwd_inner_microstep: 890.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-10 18:33:04,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-10 18:33:07,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1623.05 | bwd_inner_microstep: 1623.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 18:33:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1347.10 | bwd_inner_microstep: 1347.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 18:33:10,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.62 | bwd_microstep: 1443.11 | bwd_inner_microstep: 1443.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 18:33:11,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.04 | bwd_microstep: 798.67 | bwd_inner_microstep: 798.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 18:33:13,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1409.49 | bwd_inner_microstep: 1409.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 18:33:15,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.09 | bwd_microstep: 1474.23 | bwd_inner_microstep: 1474.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-10 18:33:17,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.71 | bwd_microstep: 1159.66 | bwd_inner_microstep: 1159.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-10 18:33:19,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.99 | bwd_microstep: 1439.12 | bwd_inner_microstep: 1439.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 18:33:21,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1552.67 | bwd_inner_microstep: 1552.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2293 [2024-06-10 18:33:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.16 | bwd_microstep: 911.21 | bwd_inner_microstep: 911.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3802 [2024-06-10 18:33:25,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.69 | bwd_microstep: 1581.50 | bwd_inner_microstep: 1581.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025 [2024-06-10 18:33:26,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.69 | bwd_microstep: 901.53 | bwd_inner_microstep: 901.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3567 [2024-06-10 18:33:28,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1361.86 | bwd_inner_microstep: 1361.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537 [2024-06-10 18:33:30,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.46 | bwd_microstep: 1325.83 | bwd_inner_microstep: 1325.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2950 [2024-06-10 18:33:31,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.18 | bwd_microstep: 1198.21 | bwd_inner_microstep: 1198.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 18:33:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1339.79 | bwd_inner_microstep: 1339.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-10 18:33:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:33:39,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.25 | bwd_microstep: 5713.60 | bwd_inner_microstep: 1379.00 | bwd_allreduce_microstep: 4334.55 | step_microstep: 37.95 [2024-06-10 18:33:39,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15398.69 | bwd: 45529.43 | bwd_inner: 41193.97 | bwd_allreduce: 4334.78 | step: 39.38 {'loss': 1.2828, 'learning_rate': 1.4443970947247308e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 18:33:41,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1374.36 | bwd_inner_microstep: 1374.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3996 [2024-06-10 18:33:43,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.12 | bwd_microstep: 1600.50 | bwd_inner_microstep: 1600.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 18:33:45,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1372.81 | bwd_inner_microstep: 1372.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933 [2024-06-10 18:33:46,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.85 | bwd_microstep: 819.32 | bwd_inner_microstep: 819.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3554 [2024-06-10 18:33:48,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.32 | bwd_microstep: 1199.94 | bwd_inner_microstep: 1199.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 18:33:50,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.96 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 18:33:51,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 791.93 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 18:33:53,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.33 | bwd_microstep: 1244.52 | bwd_inner_microstep: 1244.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4030 [2024-06-10 18:33:55,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1609.24 | bwd_inner_microstep: 1609.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 18:33:57,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1514.80 | bwd_inner_microstep: 1514.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3487 [2024-06-10 18:33:59,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.86 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659 [2024-06-10 18:34:01,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1440.12 | bwd_inner_microstep: 1440.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 18:34:03,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.73 | bwd_microstep: 1309.70 | bwd_inner_microstep: 1309.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645 [2024-06-10 18:34:05,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1432.78 | bwd_inner_microstep: 1432.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 18:34:06,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 698.29 | bwd_inner_microstep: 698.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 18:34:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1339.95 | bwd_inner_microstep: 1339.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2458 [2024-06-10 18:34:09,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.53 | bwd_microstep: 949.70 | bwd_inner_microstep: 949.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 18:34:11,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1552.92 | bwd_inner_microstep: 1552.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 18:34:13,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.87 | bwd_microstep: 1385.69 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 18:34:15,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1276.20 | bwd_inner_microstep: 1276.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3510 [2024-06-10 18:34:17,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1190.77 | bwd_inner_microstep: 1190.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-10 18:34:18,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.60 | bwd_microstep: 1153.12 | bwd_inner_microstep: 1153.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177 [2024-06-10 18:34:19,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.40 | bwd_microstep: 854.80 | bwd_inner_microstep: 854.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547 [2024-06-10 18:34:21,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.63 | bwd_microstep: 1295.75 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3662 [2024-06-10 18:34:23,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.84 | bwd_microstep: 1324.44 | bwd_inner_microstep: 1324.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279 [2024-06-10 18:34:24,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 906.64 | bwd_inner_microstep: 906.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2074 [2024-06-10 18:34:26,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.16 | bwd_microstep: 974.20 | bwd_inner_microstep: 974.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 18:34:28,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.17 | bwd_microstep: 1495.81 | bwd_inner_microstep: 1495.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3757 [2024-06-10 18:34:30,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 1608.98 | bwd_inner_microstep: 1608.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 18:34:32,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1498.46 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419 [2024-06-10 18:34:34,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-10 18:34:41,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.26 | optimizer_step: 6.60 [2024-06-10 18:34:41,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 6643.52 | bwd_inner_microstep: 1641.36 | bwd_allreduce_microstep: 5002.10 | step_microstep: 38.21 [2024-06-10 18:34:41,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15322.72 | bwd: 45948.93 | bwd_inner: 40945.92 | bwd_allreduce: 5002.33 | step: 39.64 {'loss': 1.1659, 'learning_rate': 1.4407924182725168e-05, 'epoch': 0.6} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474 [2024-06-10 18:34:43,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1568.07 | bwd_inner_microstep: 1568.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840 [2024-06-10 18:34:45,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.90 | bwd_microstep: 1652.06 | bwd_inner_microstep: 1652.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3945 [2024-06-10 18:34:48,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.93 | bwd_microstep: 1593.24 | bwd_inner_microstep: 1593.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4400 [2024-06-10 18:34:50,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.51 | bwd_microstep: 1712.10 | bwd_inner_microstep: 1712.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406 [2024-06-10 18:34:52,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.73 | bwd_microstep: 1179.20 | bwd_inner_microstep: 1179.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-10 18:34:54,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1540.26 | bwd_inner_microstep: 1540.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 18:34:56,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.58 | bwd_microstep: 1646.94 | bwd_inner_microstep: 1646.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 18:34:58,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.71 | bwd_microstep: 1151.95 | bwd_inner_microstep: 1151.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 18:34:59,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.53 | bwd_microstep: 1402.76 | bwd_inner_microstep: 1402.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4093 [2024-06-10 18:35:02,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1526.16 | bwd_inner_microstep: 1526.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 18:35:04,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1486.35 | bwd_inner_microstep: 1486.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 18:35:06,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1450.41 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030 [2024-06-10 18:35:07,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.26 | bwd_microstep: 906.88 | bwd_inner_microstep: 906.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 18:35:09,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1399.82 | bwd_inner_microstep: 1399.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3560 [2024-06-10 18:35:11,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.86 | bwd_microstep: 1590.81 | bwd_inner_microstep: 1590.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 18:35:13,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.56 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645 [2024-06-10 18:35:15,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 18:35:17,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1245.44 | bwd_inner_microstep: 1245.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3498 [2024-06-10 18:35:19,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.27 | bwd_microstep: 1508.19 | bwd_inner_microstep: 1508.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3730 [2024-06-10 18:35:21,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1367.13 | bwd_inner_microstep: 1367.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2927 [2024-06-10 18:35:22,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.22 | bwd_microstep: 1228.87 | bwd_inner_microstep: 1228.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 18:35:24,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1488.81 | bwd_inner_microstep: 1488.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3824 [2024-06-10 18:35:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.32 | bwd_microstep: 1690.68 | bwd_inner_microstep: 1690.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3631 [2024-06-10 18:35:29,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1475.09 | bwd_inner_microstep: 1475.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 18:35:30,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.51 | bwd_microstep: 1183.22 | bwd_inner_microstep: 1183.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 18:35:32,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.00 | bwd_microstep: 1157.53 | bwd_inner_microstep: 1157.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 18:35:34,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 18:35:36,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.94 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574 [2024-06-10 18:35:38,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.59 | bwd_microstep: 1331.72 | bwd_inner_microstep: 1331.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 18:35:40,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1350.32 | bwd_inner_microstep: 1350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-10 18:35:41,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1299.21 | bwd_inner_microstep: 1299.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 18:35:43,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.03 | optimizer_step: 6.66 [2024-06-10 18:35:43,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1326.58 | bwd_inner_microstep: 1317.69 | bwd_allreduce_microstep: 8.85 | step_microstep: 37.64 [2024-06-10 18:35:43,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16866.51 | bwd: 45089.46 | bwd_inner: 45079.71 | bwd_allreduce: 9.07 | step: 39.07 {'loss': 1.2037, 'learning_rate': 1.4371897113464992e-05, 'epoch': 0.6} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458 [2024-06-10 18:35:45,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.51 | bwd_microstep: 1568.06 | bwd_inner_microstep: 1568.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3956 [2024-06-10 18:35:47,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1494.29 | bwd_inner_microstep: 1494.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1935 [2024-06-10 18:35:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.38 | bwd_microstep: 851.75 | bwd_inner_microstep: 851.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 18:35:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.24 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:35:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1378.28 | bwd_inner_microstep: 1378.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744 [2024-06-10 18:35:55,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.02 | bwd_microstep: 1632.64 | bwd_inner_microstep: 1632.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 18:35:57,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3508 [2024-06-10 18:35:58,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1223.15 | bwd_inner_microstep: 1223.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 18:36:01,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.11 | bwd_microstep: 1637.35 | bwd_inner_microstep: 1637.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 18:36:02,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1251.69 | bwd_inner_microstep: 1251.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 18:36:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.79 | bwd_microstep: 1187.43 | bwd_inner_microstep: 1187.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507 [2024-06-10 18:36:06,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1446.31 | bwd_inner_microstep: 1446.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2202 [2024-06-10 18:36:07,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.96 | bwd_microstep: 960.59 | bwd_inner_microstep: 960.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 18:36:09,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1389.53 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 18:36:11,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.83 | bwd_microstep: 1614.18 | bwd_inner_microstep: 1614.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 18:36:13,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1493.10 | bwd_inner_microstep: 1493.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651 [2024-06-10 18:36:16,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.87 | bwd_microstep: 1625.47 | bwd_inner_microstep: 1625.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431 [2024-06-10 18:36:18,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.78 | bwd_microstep: 1314.90 | bwd_inner_microstep: 1314.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 18:36:19,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.09 | bwd_microstep: 1352.43 | bwd_inner_microstep: 1352.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1937 [2024-06-10 18:36:21,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.90 | bwd_microstep: 819.22 | bwd_inner_microstep: 819.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3672 [2024-06-10 18:36:23,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.66 | bwd_microstep: 1785.38 | bwd_inner_microstep: 1785.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 18:36:25,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.99 | bwd_microstep: 1539.77 | bwd_inner_microstep: 1539.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268 [2024-06-10 18:36:26,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.37 | bwd_microstep: 972.11 | bwd_inner_microstep: 972.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 18:36:29,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1506.83 | bwd_inner_microstep: 1506.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 18:36:30,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1402.55 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 18:36:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1286.84 | bwd_inner_microstep: 1286.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 18:36:34,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1258.15 | bwd_inner_microstep: 1258.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479 [2024-06-10 18:36:36,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.94 | bwd_microstep: 1187.76 | bwd_inner_microstep: 1187.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3749 [2024-06-10 18:36:38,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1373.34 | bwd_inner_microstep: 1373.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093 [2024-06-10 18:36:39,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 921.48 | bwd_inner_microstep: 921.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3768 [2024-06-10 18:36:41,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.91 | bwd_microstep: 1676.79 | bwd_inner_microstep: 1676.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-10 18:36:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 18:36:45,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.08 | bwd_microstep: 3710.82 | bwd_inner_microstep: 929.19 | bwd_allreduce_microstep: 2781.58 | step_microstep: 38.08 [2024-06-10 18:36:45,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15965.27 | bwd: 45647.29 | bwd_inner: 42864.81 | bwd_allreduce: 2781.81 | step: 39.53 %|██████ | 1037/1726 [17:54:11<11:51:04, 61.92s/it] 60%|██████ | 1037/1726 [17:54:11<11:51:04, 61.92s/it] 60%|██████ | 1038/1726 [17:55:15<11:54:56, 62.35s/it] 60%|██████ | 1038/1726 [17:55:15<11:54:56, 62.35s/it] 60%|██████ | 1039/1726 [17:56:16<11:50:07, 62.02s/it] 60%|██████ | 1039/1726 [17:56:16<11:50:07, 62.02s/it] 60%|██████ | 1040/1726 [17:57:18<11:47:37, 61.89s/it] 60%|██████ | 1040/1726 [17:57:18<11:47:37, 61.89s/it] 60%|██████ | 1041/1726 [17:58:20<11:47:57, 62.01s/it] 60%|██████ | 1041/1726 [17:58:20<11:47:57, 62.01s/it] 60%|██████ | 104{'loss': 1.1925, 'learning_rate': 1.433588986635392e-05, 'epoch': 0.6} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-10 18:36:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.65 | bwd_microstep: 1330.66 | bwd_inner_microstep: 1330.47 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-10 18:36:48,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.66 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 18:36:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1245.23 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-10 18:36:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 1543.21 | bwd_inner_microstep: 1543.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 18:36:54,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 18:36:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 18:36:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.36 | bwd_microstep: 1273.07 | bwd_inner_microstep: 1273.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 18:36:59,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1414.41 | bwd_inner_microstep: 1414.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689 [2024-06-10 18:37:02,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.71 | bwd_microstep: 1551.26 | bwd_inner_microstep: 1551.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 886 [2024-06-10 18:37:02,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.81 | bwd_microstep: 369.59 | bwd_inner_microstep: 369.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414 [2024-06-10 18:37:04,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.70 | bwd_microstep: 1307.63 | bwd_inner_microstep: 1307.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3539 [2024-06-10 18:37:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2656 [2024-06-10 18:37:07,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.60 | bwd_microstep: 923.76 | bwd_inner_microstep: 923.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 18:37:09,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.12 | bwd_microstep: 1489.97 | bwd_inner_microstep: 1489.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 18:37:11,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1353.21 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049 [2024-06-10 18:37:12,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.49 | bwd_microstep: 817.70 | bwd_inner_microstep: 817.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 18:37:14,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1390.35 | bwd_inner_microstep: 1390.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 18:37:16,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.51 | bwd_microstep: 1584.00 | bwd_inner_microstep: 1583.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819 [2024-06-10 18:37:19,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.64 | bwd_microstep: 1684.73 | bwd_inner_microstep: 1684.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 18:37:21,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1455.76 | bwd_inner_microstep: 1455.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1940 [2024-06-10 18:37:22,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.81 | bwd_microstep: 729.35 | bwd_inner_microstep: 729.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 18:37:24,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.66 | bwd_microstep: 1449.85 | bwd_inner_microstep: 1449.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 18:37:26,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.18 | bwd_microstep: 1553.54 | bwd_inner_microstep: 1553.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 18:37:28,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1398.66 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3602 [2024-06-10 18:37:30,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.51 | bwd_microstep: 1555.16 | bwd_inner_microstep: 1555.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3824 [2024-06-10 18:37:32,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.71 | bwd_microstep: 1723.05 | bwd_inner_microstep: 1723.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817 [2024-06-10 18:37:34,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.27 | bwd_microstep: 1384.52 | bwd_inner_microstep: 1384.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654 [2024-06-10 18:37:36,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1426.38 | bwd_inner_microstep: 1426.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3595 [2024-06-10 18:37:38,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.69 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 18:37:40,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.82 | bwd_microstep: 1607.01 | bwd_inner_microstep: 1606.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 18:37:42,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1523.82 | bwd_inner_microstep: 1523.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 18:37:46,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.55 | optimizer_step: 6.62 [2024-06-10 18:37:46,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.17 | bwd_microstep: 3188.91 | bwd_inner_microstep: 1690.95 | bwd_allreduce_microstep: 1497.88 | step_microstep: 44.91 [2024-06-10 18:37:46,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16064.90 | bwd: 44618.66 | bwd_inner: 43119.71 | bwd_allreduce: 1498.20 | step: 46.48 {'loss': 1.2195, 'learning_rate': 1.4299902568209297e-05, 'epoch': 0.6} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962 [2024-06-10 18:37:47,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.15 | bwd_microstep: 882.51 | bwd_inner_microstep: 882.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 18:37:49,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1252.64 | bwd_inner_microstep: 1252.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872 [2024-06-10 18:37:51,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.75 | bwd_microstep: 1466.43 | bwd_inner_microstep: 1466.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2310 [2024-06-10 18:37:52,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.78 | bwd_microstep: 790.04 | bwd_inner_microstep: 790.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 18:37:53,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.37 | bwd_microstep: 788.21 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3787 [2024-06-10 18:37:55,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.42 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 18:37:57,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 18:37:59,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1280.89 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 18:38:01,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1279.71 | bwd_inner_microstep: 1279.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 18:38:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1371 [2024-06-10 18:38:03,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 199.91 | bwd_microstep: 521.08 | bwd_inner_microstep: 521.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 18:38:05,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.11 | bwd_microstep: 1625.76 | bwd_inner_microstep: 1625.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3499 [2024-06-10 18:38:07,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.33 | bwd_microstep: 1221.73 | bwd_inner_microstep: 1221.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-10 18:38:08,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 789.97 | bwd_inner_microstep: 789.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1965 [2024-06-10 18:38:09,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 823.88 | bwd_inner_microstep: 823.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666 [2024-06-10 18:38:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.26 | bwd_microstep: 1418.13 | bwd_inner_microstep: 1418.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513 [2024-06-10 18:38:13,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.30 | bwd_microstep: 1430.77 | bwd_inner_microstep: 1430.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 18:38:15,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1313.31 | bwd_inner_microstep: 1313.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 18:38:17,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.73 | bwd_microstep: 1271.69 | bwd_inner_microstep: 1271.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747 [2024-06-10 18:38:19,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.98 | bwd_microstep: 1539.08 | bwd_inner_microstep: 1539.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399 [2024-06-10 18:38:21,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1367.09 | bwd_inner_microstep: 1367.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 18:38:23,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.98 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 18:38:25,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.38 | bwd_microstep: 1314.73 | bwd_inner_microstep: 1314.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2545 [2024-06-10 18:38:26,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.45 | bwd_microstep: 993.98 | bwd_inner_microstep: 993.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449 [2024-06-10 18:38:28,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1409.94 | bwd_inner_microstep: 1409.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-10 18:38:30,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1648.75 | bwd_inner_microstep: 1648.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791 [2024-06-10 18:38:32,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1544.27 | bwd_inner_microstep: 1544.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3466 [2024-06-10 18:38:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1501.36 | bwd_inner_microstep: 1501.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3760 [2024-06-10 18:38:36,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771 [2024-06-10 18:38:38,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.96 | bwd_microstep: 1572.37 | bwd_inner_microstep: 1572.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 18:38:40,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1395.90 | bwd_inner_microstep: 1395.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178 [2024-06-10 18:38:49,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.50 | optimizer_step: 6.60 [2024-06-10 18:38:49,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.82 | bwd_microstep: 8388.72 | bwd_inner_microstep: 974.32 | bwd_allreduce_microstep: 7414.33 | step_microstep: 40.78 [2024-06-10 18:38:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15026.58 | bwd: 47561.11 | bwd_inner: 40145.85 | bwd_allreduce: 7414.58 | step: 42.21 {'loss': 1.2184, 'learning_rate': 1.4263935345778202e-05, 'epoch': 0.6} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454 [2024-06-10 18:38:51,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.07 | bwd_microstep: 1378.37 | bwd_inner_microstep: 1378.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4159 [2024-06-10 18:38:53,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.09 | bwd_microstep: 1734.23 | bwd_inner_microstep: 1734.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 18:38:55,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.16 | bwd_microstep: 787.15 | bwd_inner_microstep: 787.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-10 18:38:57,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1456.18 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 18:38:58,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1241.76 | bwd_inner_microstep: 1241.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491 [2024-06-10 18:39:00,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.15 | bwd_microstep: 1412.35 | bwd_inner_microstep: 1412.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 18:39:02,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1246.53 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 18:39:04,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1384.93 | bwd_inner_microstep: 1384.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 18:39:06,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1386.13 | bwd_inner_microstep: 1386.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 18:39:08,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.98 | bwd_microstep: 1543.91 | bwd_inner_microstep: 1543.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 18:39:10,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.61 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 4133 [2024-06-10 18:39:12,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.49 | bwd_microstep: 1773.59 | bwd_inner_microstep: 1773.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 18:39:14,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143 [2024-06-10 18:39:15,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.77 | bwd_microstep: 834.13 | bwd_inner_microstep: 834.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 655 [2024-06-10 18:39:16,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.64 | bwd_microstep: 275.81 | bwd_inner_microstep: 275.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103 [2024-06-10 18:39:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.94 | bwd_microstep: 921.46 | bwd_inner_microstep: 921.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 18:39:19,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 1395.79 | bwd_inner_microstep: 1395.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 18:39:21,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1295.38 | bwd_inner_microstep: 1295.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 18:39:23,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1514.02 | bwd_inner_microstep: 1513.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 18:39:25,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.28 | bwd_microstep: 1490.66 | bwd_inner_microstep: 1490.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556 [2024-06-10 18:39:27,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1364.65 | bwd_inner_microstep: 1364.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834 [2024-06-10 18:39:28,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.22 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 18:39:30,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1303.02 | bwd_inner_microstep: 1303.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 18:39:32,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 1401.31 | bwd_inner_microstep: 1401.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3829 [2024-06-10 18:39:34,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.63 | bwd_microstep: 1620.45 | bwd_inner_microstep: 1620.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290 [2024-06-10 18:39:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.62 | bwd_microstep: 908.18 | bwd_inner_microstep: 908.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567 [2024-06-10 18:39:38,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.92 | bwd_microstep: 1423.93 | bwd_inner_microstep: 1423.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 18:39:40,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 18:39:42,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.27 | bwd_inner_microstep: 1478.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 18:39:43,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.95 | bwd_microstep: 907.24 | bwd_inner_microstep: 907.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 18:39:45,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3633 [2024-06-10 18:39:51,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.25 | optimizer_gradients: 4.33 | optimizer_step: 6.58 [2024-06-10 18:39:51,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 5020.89 | bwd_inner_microstep: 1778.00 | bwd_allreduce_microstep: 3242.83 | step_microstep: 40.00 [2024-06-10 18:39:51,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15761.37 | bwd: 45542.64 | bwd_inner: 42298.89 | bwd_allreduce: 3243.07 | step: 41.50 {'loss': 1.2385, 'learning_rate': 1.4227988325736991e-05, 'epoch': 0.61} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4828 [2024-06-10 18:39:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 708.22 | bwd_microstep: 1874.07 | bwd_inner_microstep: 1873.89 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3953 [2024-06-10 18:39:55,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1398.45 | bwd_inner_microstep: 1398.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856 [2024-06-10 18:39:58,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.56 | bwd_microstep: 1662.53 | bwd_inner_microstep: 1662.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 18:40:00,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1396.26 | bwd_inner_microstep: 1396.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 18:40:02,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4194 [2024-06-10 18:40:04,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.41 | bwd_microstep: 1563.60 | bwd_inner_microstep: 1563.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 18:40:05,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 790.74 | bwd_inner_microstep: 790.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503 [2024-06-10 18:40:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.66 | bwd_microstep: 1191.39 | bwd_inner_microstep: 1191.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 18:40:08,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1249.91 | bwd_inner_microstep: 1249.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137 [2024-06-10 18:40:09,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.36 | bwd_microstep: 832.86 | bwd_inner_microstep: 832.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3536 [2024-06-10 18:40:11,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1451.66 | bwd_inner_microstep: 1451.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 18:40:14,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487 [2024-06-10 18:40:16,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.07 | bwd_microstep: 1440.96 | bwd_inner_microstep: 1440.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636 [2024-06-10 18:40:18,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.83 | bwd_microstep: 1612.29 | bwd_inner_microstep: 1612.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3695 [2024-06-10 18:40:20,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.40 | bwd_microstep: 1487.14 | bwd_inner_microstep: 1487.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 18:40:22,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1479.97 | bwd_inner_microstep: 1479.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3508 [2024-06-10 18:40:24,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.99 | bwd_microstep: 1352.72 | bwd_inner_microstep: 1352.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 640 [2024-06-10 18:40:24,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.41 | bwd_microstep: 264.71 | bwd_inner_microstep: 264.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 18:40:26,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-10 18:40:28,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1409.17 | bwd_inner_microstep: 1409.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 18:40:29,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.91 | bwd_microstep: 698.10 | bwd_inner_microstep: 698.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3619 [2024-06-10 18:40:31,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1343.47 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924 [2024-06-10 18:40:32,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.47 | bwd_microstep: 696.88 | bwd_inner_microstep: 696.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 18:40:34,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1382.85 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 18:40:36,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181 [2024-06-10 18:40:37,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 958.31 | bwd_inner_microstep: 958.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3814 [2024-06-10 18:40:39,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.12 | bwd_microstep: 1263.03 | bwd_inner_microstep: 1263.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2186 [2024-06-10 18:40:40,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.88 | bwd_microstep: 796.44 | bwd_inner_microstep: 796.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 18:40:42,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1372.49 | bwd_inner_microstep: 1372.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3779 [2024-06-10 18:40:44,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1476.49 | bwd_inner_microstep: 1476.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3574 [2024-06-10 18:40:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.36 | bwd_microstep: 1593.97 | bwd_inner_microstep: 1593.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2037 [2024-06-10 18:40:50,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-10 18:40:50,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.19 | bwd_microstep: 3465.65 | bwd_inner_microstep: 1073.20 | bwd_allreduce_microstep: 2392.38 | step_microstep: 38.92 [2024-06-10 18:40:50,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15362.88 | bwd: 43433.64 | bwd_inner: 41040.20 | bwd_allreduce: 2392.70 | step: 40.49 {'loss': 1.1524, 'learning_rate': 1.4192061634690892e-05, 'epoch': 0.61} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 18:40:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1470.03 | bwd_inner_microstep: 1470.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 18:40:53,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.53 | bwd_microstep: 727.41 | bwd_inner_microstep: 727.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3808 [2024-06-10 18:40:55,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.09 | bwd_microstep: 1301.44 | bwd_inner_microstep: 1301.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 18:40:57,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 18:40:58,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 18:41:00,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1252.77 | bwd_inner_microstep: 1252.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 18:41:02,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1255.49 | bwd_inner_microstep: 1255.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699 [2024-06-10 18:41:04,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.94 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 18:41:06,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967 [2024-06-10 18:41:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.99 | bwd_microstep: 889.09 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 18:41:09,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.09 | bwd_microstep: 1487.95 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506 [2024-06-10 18:41:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.97 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 18:41:13,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1344.68 | bwd_inner_microstep: 1344.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517 [2024-06-10 18:41:15,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.97 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3884 [2024-06-10 18:41:17,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.45 | bwd_microstep: 1680.85 | bwd_inner_microstep: 1680.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 18:41:19,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1383.53 | bwd_inner_microstep: 1383.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710 [2024-06-10 18:41:21,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.15 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2685 [2024-06-10 18:41:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.53 | bwd_microstep: 1125.22 | bwd_inner_microstep: 1125.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 18:41:24,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.57 | bwd_microstep: 800.32 | bwd_inner_microstep: 800.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 18:41:26,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.78 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 18:41:28,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1350.61 | bwd_inner_microstep: 1350.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-10 18:41:30,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.36 | bwd_microstep: 1455.16 | bwd_inner_microstep: 1455.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 18:41:32,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1295.93 | bwd_inner_microstep: 1295.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 18:41:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.76 | bwd_microstep: 1503.73 | bwd_inner_microstep: 1503.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2434 [2024-06-10 18:41:35,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.61 | bwd_microstep: 1041.57 | bwd_inner_microstep: 1041.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 18:41:37,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 18:41:39,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1396.75 | bwd_inner_microstep: 1396.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-10 18:41:41,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.99 | bwd_microstep: 1538.89 | bwd_inner_microstep: 1538.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 18:41:43,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1351.69 | bwd_inner_microstep: 1351.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2618 [2024-06-10 18:41:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.87 | bwd_microstep: 1013.63 | bwd_inner_microstep: 1013.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3599 [2024-06-10 18:41:46,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1567.98 | bwd_inner_microstep: 1567.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054 [2024-06-10 18:41:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.26 | optimizer_step: 6.60 [2024-06-10 18:41:50,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.14 | bwd_microstep: 2967.69 | bwd_inner_microstep: 1040.40 | bwd_allreduce_microstep: 1927.23 | step_microstep: 39.10 [2024-06-10 18:41:50,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.76 | bwd: 43861.85 | bwd_inner: 41933.70 | bwd_allreduce: 1927.46 | step: 40.60 2/1726 [17:59:22<11:46:43, 61.99s/it] 60%|██████ | 1042/1726 [17:59:22<11:46:43, 61.99s/it] 60%|██████ | 1043/1726 [18:00:23<11:42:24, 61.71s/it] 60%|██████ | 1043/1726 [18:00:23<11:42:24, 61.71s/it] 60%|██████ | 1044/1726 [18:01:26<11:45:30, 62.07s/it] 60%|██████ | 1044/1726 [18:01:26<11:45:30, 62.07s/it] 61%|██████ | 1045/1726 [18:02:28<11:43:00, 61.94s/it] 61%|██████ | 1045/1726 [18:02:28<11:43:00, 61.94s/it] 61%|██████ | 1046/1726 [18:03:27<11:32:26, 61.10s/it] 61%|██████ | 1046/1726 [18:03:27<11:32:26, 61.10s/it] 61%|██████ | 1047/1726 [18:04:26<11:27:09, 60{'loss': 1.1369, 'learning_rate': 1.4156155399173526e-05, 'epoch': 0.61} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 18:41:52,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.48 | bwd_microstep: 1378.28 | bwd_inner_microstep: 1378.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2416 [2024-06-10 18:41:53,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.51 | bwd_microstep: 906.68 | bwd_inner_microstep: 906.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3827 [2024-06-10 18:41:55,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1403.76 | bwd_inner_microstep: 1403.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3877 [2024-06-10 18:41:57,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.74 | bwd_microstep: 1681.07 | bwd_inner_microstep: 1681.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736 [2024-06-10 18:41:59,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1428.75 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 18:42:00,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.83 | bwd_microstep: 698.10 | bwd_inner_microstep: 698.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 18:42:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.03 | bwd_inner_microstep: 1479.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637 [2024-06-10 18:42:04,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.80 | bwd_microstep: 1114.96 | bwd_inner_microstep: 1114.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 18:42:06,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 18:42:07,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1249.50 | bwd_inner_microstep: 1249.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-10 18:42:10,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.50 | bwd_microstep: 1620.85 | bwd_inner_microstep: 1620.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-10 18:42:11,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.91 | bwd_microstep: 1362.28 | bwd_inner_microstep: 1362.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 18:42:13,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.83 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 18:42:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1494.65 | bwd_inner_microstep: 1494.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 18:42:17,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.37 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-10 18:42:19,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.09 | bwd_microstep: 1589.38 | bwd_inner_microstep: 1589.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 18:42:22,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1521.38 | bwd_inner_microstep: 1521.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3532 [2024-06-10 18:42:24,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.89 | bwd_microstep: 1434.33 | bwd_inner_microstep: 1434.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3999 [2024-06-10 18:42:26,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.70 | bwd_microstep: 1814.51 | bwd_inner_microstep: 1814.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2105 [2024-06-10 18:42:27,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.40 | bwd_microstep: 730.59 | bwd_inner_microstep: 730.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292 [2024-06-10 18:42:28,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 880.53 | bwd_inner_microstep: 880.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 18:42:30,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1395.14 | bwd_inner_microstep: 1395.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 18:42:32,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.30 | bwd_microstep: 1560.29 | bwd_inner_microstep: 1560.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 18:42:34,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.62 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3558 [2024-06-10 18:42:36,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.08 | bwd_microstep: 1583.76 | bwd_inner_microstep: 1583.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 18:42:38,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1508.78 | bwd_inner_microstep: 1508.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235 [2024-06-10 18:42:40,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.12 | bwd_microstep: 966.65 | bwd_inner_microstep: 966.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-10 18:42:42,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1592.46 | bwd_inner_microstep: 1592.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461 [2024-06-10 18:42:44,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.14 | bwd_microstep: 1576.33 | bwd_inner_microstep: 1576.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 18:42:46,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.92 | bwd_microstep: 1476.37 | bwd_inner_microstep: 1476.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 18:42:48,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.19 | bwd_microstep: 1447.54 | bwd_inner_microstep: 1447.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 18:42:51,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.62 [2024-06-10 18:42:51,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.18 | bwd_microstep: 2686.08 | bwd_inner_microstep: 1569.24 | bwd_allreduce_microstep: 1116.78 | step_microstep: 38.99 [2024-06-10 18:42:51,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16321.74 | bwd: 44977.39 | bwd_inner: 43859.70 | bwd_allreduce: 1117.01 | step: 41.51 {'loss': 1.2202, 'learning_rate': 1.4120269745646469e-05, 'epoch': 0.61} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 18:42:53,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.20 | bwd_microstep: 1471.84 | bwd_inner_microstep: 1471.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 18:42:55,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1376.55 | bwd_inner_microstep: 1376.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 18:42:57,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1347.48 | bwd_inner_microstep: 1347.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 18:42:59,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.20 | bwd_microstep: 1451.49 | bwd_inner_microstep: 1451.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 18:43:01,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1389.51 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 18:43:03,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 18:43:04,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.28 | bwd_microstep: 1149.24 | bwd_inner_microstep: 1149.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912 [2024-06-10 18:43:07,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.55 | bwd_microstep: 1695.07 | bwd_inner_microstep: 1695.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 18:43:09,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.16 | bwd_microstep: 1290.94 | bwd_inner_microstep: 1290.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 18:43:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.46 | bwd_microstep: 1403.02 | bwd_inner_microstep: 1403.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 18:43:13,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.41 | bwd_microstep: 1630.98 | bwd_inner_microstep: 1630.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687 [2024-06-10 18:43:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3383 [2024-06-10 18:43:17,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1310.93 | bwd_inner_microstep: 1310.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507 [2024-06-10 18:43:19,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1408.94 | bwd_inner_microstep: 1408.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351 [2024-06-10 18:43:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.33 | bwd_microstep: 991.20 | bwd_inner_microstep: 991.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 18:43:22,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1395.72 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525 [2024-06-10 18:43:24,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1323.72 | bwd_inner_microstep: 1323.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3005 [2024-06-10 18:43:25,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.00 | bwd_microstep: 1110.15 | bwd_inner_microstep: 1110.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 18:43:27,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1295.97 | bwd_inner_microstep: 1295.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618 [2024-06-10 18:43:29,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.90 | bwd_microstep: 1615.95 | bwd_inner_microstep: 1615.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3810 [2024-06-10 18:43:31,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.00 | bwd_microstep: 1620.75 | bwd_inner_microstep: 1620.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-10 18:43:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.87 | bwd_microstep: 1582.95 | bwd_inner_microstep: 1582.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 18:43:36,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1537.35 | bwd_inner_microstep: 1537.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3820 [2024-06-10 18:43:38,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.71 | bwd_microstep: 1690.74 | bwd_inner_microstep: 1690.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3436 [2024-06-10 18:43:40,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1398.59 | bwd_inner_microstep: 1398.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 18:43:41,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 792.34 | bwd_inner_microstep: 792.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 18:43:43,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1292.64 | bwd_inner_microstep: 1292.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 18:43:45,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.93 | bwd_microstep: 1453.07 | bwd_inner_microstep: 1453.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 18:43:47,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.05 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 18:43:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.79 | bwd_microstep: 1441.11 | bwd_inner_microstep: 1441.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 18:43:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.07 | bwd_microstep: 1638.18 | bwd_inner_microstep: 1638.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 18:43:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.68 [2024-06-10 18:43:53,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.85 | bwd_microstep: 1537.39 | bwd_inner_microstep: 1529.42 | bwd_allreduce_microstep: 7.92 | step_microstep: 37.85 [2024-06-10 18:43:53,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16734.55 | bwd: 44866.39 | bwd_inner: 44857.56 | bwd_allreduce: 8.14 | step: 39.41 {'loss': 1.214, 'learning_rate': 1.4084404800498796e-05, 'epoch': 0.61} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3388 [2024-06-10 18:43:55,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1300.77 | bwd_inner_microstep: 1300.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-10 18:43:57,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.72 | bwd_microstep: 1243.11 | bwd_inner_microstep: 1243.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 18:43:59,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1378.24 | bwd_inner_microstep: 1378.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3789 [2024-06-10 18:44:01,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1647.34 | bwd_inner_microstep: 1647.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 18:44:02,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.58 | bwd_microstep: 681.91 | bwd_inner_microstep: 681.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 18:44:04,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.46 | bwd_microstep: 1152.28 | bwd_inner_microstep: 1152.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-10 18:44:06,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.22 | bwd_microstep: 1626.48 | bwd_inner_microstep: 1626.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3407 [2024-06-10 18:44:07,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.07 | bwd_microstep: 1180.76 | bwd_inner_microstep: 1180.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 18:44:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 18:44:11,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.93 | bwd_microstep: 1289.33 | bwd_inner_microstep: 1289.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 18:44:13,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1255.24 | bwd_inner_microstep: 1255.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3413 [2024-06-10 18:44:15,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1439.05 | bwd_inner_microstep: 1439.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3615 [2024-06-10 18:44:17,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1607.06 | bwd_inner_microstep: 1607.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 18:44:19,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1349.74 | bwd_inner_microstep: 1349.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498 [2024-06-10 18:44:21,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.32 | bwd_microstep: 1547.86 | bwd_inner_microstep: 1547.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3428 [2024-06-10 18:44:23,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.31 | bwd_microstep: 1396.30 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 18:44:25,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1396.57 | bwd_inner_microstep: 1396.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 18:44:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1283.29 | bwd_inner_microstep: 1283.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 18:44:29,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352 [2024-06-10 18:44:30,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.05 | bwd_microstep: 988.76 | bwd_inner_microstep: 988.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 18:44:32,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1283.83 | bwd_inner_microstep: 1283.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-10 18:44:34,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.56 | bwd_microstep: 1296.96 | bwd_inner_microstep: 1296.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 754 [2024-06-10 18:44:34,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.95 | bwd_microstep: 302.80 | bwd_inner_microstep: 302.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 18:44:36,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1397.02 | bwd_inner_microstep: 1396.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3847 [2024-06-10 18:44:38,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.39 | bwd_microstep: 1696.84 | bwd_inner_microstep: 1696.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659 [2024-06-10 18:44:40,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1455.00 | bwd_inner_microstep: 1454.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3598 [2024-06-10 18:44:42,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1476.43 | bwd_inner_microstep: 1476.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2078 [2024-06-10 18:44:44,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 786.68 | bwd_inner_microstep: 786.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-10 18:44:46,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.12 | bwd_microstep: 1500.11 | bwd_inner_microstep: 1500.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 18:44:48,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1494.84 | bwd_inner_microstep: 1494.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 18:44:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.79 | bwd_microstep: 1504.06 | bwd_inner_microstep: 1504.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3574 [2024-06-10 18:44:56,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.30 | optimizer_step: 6.62 [2024-06-10 18:44:56,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 5714.86 | bwd_inner_microstep: 1533.92 | bwd_allreduce_microstep: 4180.88 | step_microstep: 39.12 [2024-06-10 18:44:56,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15803.46 | bwd: 46514.47 | bwd_inner: 42332.69 | bwd_allreduce: 4181.10 | step: 40.68 {'loss': 1.2603, 'learning_rate': 1.4048560690046661e-05, 'epoch': 0.61} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533 [2024-06-10 18:44:58,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1317.74 | bwd_inner_microstep: 1317.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4000 [2024-06-10 18:45:00,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.95 | bwd_microstep: 1531.44 | bwd_inner_microstep: 1531.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433 [2024-06-10 18:45:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1447.48 | bwd_inner_microstep: 1447.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 18:45:04,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1378.44 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485 [2024-06-10 18:45:06,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.44 | bwd_microstep: 1346.74 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 18:45:07,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.27 | bwd_microstep: 1279.55 | bwd_inner_microstep: 1279.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 842 [2024-06-10 18:45:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.39 | bwd_microstep: 344.83 | bwd_inner_microstep: 344.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916 [2024-06-10 18:45:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.75 | bwd_microstep: 689.05 | bwd_inner_microstep: 689.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 18:45:10,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.23 | bwd_microstep: 795.59 | bwd_inner_microstep: 795.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 18:45:12,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1256.53 | bwd_inner_microstep: 1256.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3720 [2024-06-10 18:45:14,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.86 | bwd_microstep: 1681.62 | bwd_inner_microstep: 1681.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3671 [2024-06-10 18:45:16,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.03 | bwd_microstep: 1658.29 | bwd_inner_microstep: 1658.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 18:45:18,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1390.70 | bwd_inner_microstep: 1390.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3485 [2024-06-10 18:45:20,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.40 | bwd_microstep: 1581.17 | bwd_inner_microstep: 1581.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1932 [2024-06-10 18:45:22,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.97 | bwd_microstep: 886.40 | bwd_inner_microstep: 886.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 18:45:24,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1491.54 | bwd_inner_microstep: 1491.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3529 [2024-06-10 18:45:26,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1589.76 | bwd_inner_microstep: 1589.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3375 [2024-06-10 18:45:28,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1431.22 | bwd_inner_microstep: 1431.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434 [2024-06-10 18:45:30,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1185.19 | bwd_inner_microstep: 1185.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3441 [2024-06-10 18:45:31,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.67 | bwd_microstep: 1159.53 | bwd_inner_microstep: 1159.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 18:45:33,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1255.15 | bwd_inner_microstep: 1255.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564 [2024-06-10 18:45:35,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.83 | bwd_microstep: 1203.07 | bwd_inner_microstep: 1203.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 18:45:36,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.05 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 18:45:38,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.51 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925 [2024-06-10 18:45:39,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 696.13 | bwd_inner_microstep: 696.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-10 18:45:40,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.50 | bwd_microstep: 910.91 | bwd_inner_microstep: 910.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 18:45:42,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1382.56 | bwd_inner_microstep: 1382.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 18:45:44,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1277.62 | bwd_inner_microstep: 1277.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 18:45:46,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.86 | bwd_microstep: 1645.08 | bwd_inner_microstep: 1645.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 18:45:48,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1399.55 | bwd_inner_microstep: 1399.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2036 [2024-06-10 18:45:50,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.77 | bwd_microstep: 903.38 | bwd_inner_microstep: 903.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587 [2024-06-10 18:46:00,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.41 | optimizer_step: 6.60 [2024-06-10 18:46:00,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 10328.34 | bwd_inner_microstep: 1722.17 | bwd_allreduce_microstep: 8606.10 | step_microstep: 39.99 [2024-06-10 18:46:00,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15064.65 | bwd: 49090.67 | bwd_inner: 40483.64 | bwd_allreduce: 8606.34 | step: 41.59 {'loss': 1.1862, 'learning_rate': 1.4012737540532842e-05, 'epoch': 0.61} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 18:46:03,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1465.82 | bwd_inner_microstep: 1465.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4005 [2024-06-10 18:46:05,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.07 | bwd_microstep: 1536.63 | bwd_inner_microstep: 1536.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3487 [2024-06-10 18:46:06,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1329.40 | bwd_inner_microstep: 1329.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4232 [2024-06-10 18:46:09,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.40 | bwd_microstep: 1659.52 | bwd_inner_microstep: 1659.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 18:46:11,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1476.98 | bwd_inner_microstep: 1476.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 18:46:12,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.26 | bwd_microstep: 678.71 | bwd_inner_microstep: 678.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897 [2024-06-10 18:46:13,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.66 | bwd_microstep: 681.61 | bwd_inner_microstep: 681.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 18:46:15,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.93 | bwd_microstep: 1382.38 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922 [2024-06-10 18:46:16,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.03 | bwd_microstep: 697.11 | bwd_inner_microstep: 697.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 18:46:17,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.80 | bwd_microstep: 1150.53 | bwd_inner_microstep: 1150.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 18:46:19,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.28 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 18:46:20,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.40 | bwd_microstep: 791.93 | bwd_inner_microstep: 791.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683 [2024-06-10 18:46:22,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.01 | bwd_microstep: 1422.91 | bwd_inner_microstep: 1422.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3517 [2024-06-10 18:46:24,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.35 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 18:46:26,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1383.66 | bwd_inner_microstep: 1383.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3381 [2024-06-10 18:46:28,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1499.01 | bwd_inner_microstep: 1498.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 18:46:30,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.43 | bwd_microstep: 1340.63 | bwd_inner_microstep: 1340.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103 [2024-06-10 18:46:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.71 | bwd_microstep: 922.10 | bwd_inner_microstep: 922.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556 [2024-06-10 18:46:33,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.81 | bwd_microstep: 1446.21 | bwd_inner_microstep: 1446.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 18:46:35,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1392.42 | bwd_inner_microstep: 1392.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 18:46:37,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.52 | bwd_microstep: 1354.49 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 18:46:39,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1494.23 | bwd_inner_microstep: 1494.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975 [2024-06-10 18:46:40,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.04 | bwd_microstep: 834.10 | bwd_inner_microstep: 834.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 18:46:42,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.33 | bwd_microstep: 1284.55 | bwd_inner_microstep: 1284.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822 [2024-06-10 18:46:44,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1360.21 | bwd_inner_microstep: 1360.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 18:46:46,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.19 | bwd_microstep: 1548.98 | bwd_inner_microstep: 1548.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3706 [2024-06-10 18:46:48,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1439.54 | bwd_inner_microstep: 1439.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 18:46:50,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1509.49 | bwd_inner_microstep: 1509.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 18:46:52,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1507.80 | bwd_inner_microstep: 1507.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-10 18:46:54,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1493.50 | bwd_inner_microstep: 1493.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193 [2024-06-10 18:46:56,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.89 | bwd_microstep: 862.49 | bwd_inner_microstep: 862.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 18:47:02,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.61 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:47:02,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.94 | bwd_microstep: 6228.84 | bwd_inner_microstep: 1816.13 | bwd_allreduce_microstep: 4412.66 | step_microstep: 39.25 [2024-06-10 18:47:02,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15474.39 | bwd: 46107.48 | bwd_inner: 41693.92 | bwd_allreduce: 4412.89 | step: 40.86 .72s/it] 61%|██████ | 1047/1726 [18:04:26<11:27:09, 60.72s/it] 61%|██████ | 1048/1726 [18:05:28<11:29:17, 61.00s/it] 61%|██████ | 1048/1726 [18:05:28<11:29:17, 61.00s/it] 61%|██████ | 1049/1726 [18:06:30<11:31:30, 61.29s/it] 61%|██████ | 1049/1726 [18:06:30<11:31:30, 61.29s/it] 61%|██████ | 1050/1726 [18:07:33<11:35:07, 61.70s/it] 61%|██████ | 1050/1726 [18:07:33<11:35:07, 61.70s/it] 61%|██████ | 1051/1726 [18:08:37<11:43:30, 62.53s/it] 61%|██████ | 1051/1726 [18:08:37<11:43:30, 62.53s/it] 61%|██████ | 1052/1726 [18:09:39<11:40:23, 62.35s/it] {'loss': 1.2246, 'learning_rate': 1.3976935478126281e-05, 'epoch': 0.61} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 18:47:04,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1370.91 | bwd_inner_microstep: 1370.64 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3934 [2024-06-10 18:47:07,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1591.94 | bwd_inner_microstep: 1591.80 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3865 [2024-06-10 18:47:09,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.37 | bwd_microstep: 1465.31 | bwd_inner_microstep: 1465.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 18:47:10,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.65 | bwd_microstep: 1373.61 | bwd_inner_microstep: 1373.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480 [2024-06-10 18:47:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.74 | bwd_microstep: 1343.69 | bwd_inner_microstep: 1343.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 18:47:14,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-10 18:47:16,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.11 | bwd_microstep: 1532.82 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-10 18:47:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.07 | bwd_microstep: 1527.68 | bwd_inner_microstep: 1527.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706 [2024-06-10 18:47:20,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1525.02 | bwd_inner_microstep: 1524.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 18:47:22,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1342.04 | bwd_inner_microstep: 1342.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 18:47:24,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.69 | bwd_microstep: 1630.28 | bwd_inner_microstep: 1630.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064 [2024-06-10 18:47:26,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.37 | bwd_microstep: 817.55 | bwd_inner_microstep: 817.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-10 18:47:27,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1285.28 | bwd_inner_microstep: 1285.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663 [2024-06-10 18:47:30,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.15 | bwd_microstep: 1714.44 | bwd_inner_microstep: 1714.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-10 18:47:32,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1475.83 | bwd_inner_microstep: 1475.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464 [2024-06-10 18:47:34,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1403.37 | bwd_inner_microstep: 1403.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3658 [2024-06-10 18:47:35,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.45 | bwd_microstep: 1226.60 | bwd_inner_microstep: 1226.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3503 [2024-06-10 18:47:37,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.86 | bwd_microstep: 1192.72 | bwd_inner_microstep: 1192.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 18:47:39,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.00 | bwd_microstep: 1403.38 | bwd_inner_microstep: 1403.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-10 18:47:41,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.24 | bwd_microstep: 1405.21 | bwd_inner_microstep: 1405.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2653 [2024-06-10 18:47:42,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.64 | bwd_microstep: 1022.78 | bwd_inner_microstep: 1022.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536 [2024-06-10 18:47:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1326.12 | bwd_inner_microstep: 1326.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.15 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456 [2024-06-10 18:47:46,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.46 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457 [2024-06-10 18:47:48,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.82 | bwd_microstep: 1344.78 | bwd_inner_microstep: 1344.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 18:47:50,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1401.28 | bwd_inner_microstep: 1401.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3551 [2024-06-10 18:47:52,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1326.86 | bwd_inner_microstep: 1326.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 18:47:54,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1659.04 | bwd_inner_microstep: 1659.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 18:47:56,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.80 | bwd_microstep: 1296.74 | bwd_inner_microstep: 1296.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3822 [2024-06-10 18:47:58,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.73 | bwd_microstep: 1825.90 | bwd_inner_microstep: 1825.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812 [2024-06-10 18:48:00,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.76 | bwd_microstep: 1515.39 | bwd_inner_microstep: 1515.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3568 [2024-06-10 18:48:02,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1445.36 | bwd_inner_microstep: 1445.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-10 18:48:05,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.64 [2024-06-10 18:48:05,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.21 | bwd_microstep: 1635.38 | bwd_inner_microstep: 1627.36 | bwd_allreduce_microstep: 7.96 | step_microstep: 38.75 [2024-06-10 18:48:05,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16864.44 | bwd: 45055.34 | bwd_inner: 45046.10 | bwd_allreduce: 8.35 | step: 41.68 {'loss': 1.1998, 'learning_rate': 1.3941154628921654e-05, 'epoch': 0.61} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3415 [2024-06-10 18:48:07,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.69 | bwd_microstep: 1470.19 | bwd_inner_microstep: 1470.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 18:48:09,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1484.50 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2354 [2024-06-10 18:48:10,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.44 | bwd_microstep: 891.41 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 18:48:12,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.19 | bwd_microstep: 1482.90 | bwd_inner_microstep: 1482.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-10 18:48:14,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.81 | bwd_microstep: 1315.33 | bwd_inner_microstep: 1315.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2208 [2024-06-10 18:48:15,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 955.55 | bwd_inner_microstep: 955.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 18:48:17,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 18:48:19,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1510.22 | bwd_inner_microstep: 1510.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 18:48:21,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1343.91 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 18:48:23,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1391.14 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3720 [2024-06-10 18:48:25,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1395.97 | bwd_inner_microstep: 1395.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492 [2024-06-10 18:48:27,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1435.50 | bwd_inner_microstep: 1435.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 18:48:29,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1382.24 | bwd_inner_microstep: 1382.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-10 18:48:31,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 1488.06 | bwd_inner_microstep: 1488.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3674 [2024-06-10 18:48:33,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.88 | bwd_microstep: 1786.51 | bwd_inner_microstep: 1786.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686 [2024-06-10 18:48:36,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.63 | bwd_microstep: 1719.37 | bwd_inner_microstep: 1719.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-10 18:48:37,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.55 | bwd_microstep: 1156.68 | bwd_inner_microstep: 1156.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 18:48:39,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628 [2024-06-10 18:48:41,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1436.13 | bwd_inner_microstep: 1436.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 18:48:43,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1460.49 | bwd_inner_microstep: 1460.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-10 18:48:44,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1160.86 | bwd_inner_microstep: 1160.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3520 [2024-06-10 18:48:46,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.09 | bwd_microstep: 1192.75 | bwd_inner_microstep: 1192.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 18:48:48,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.58 | bwd_microstep: 974.69 | bwd_inner_microstep: 974.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 18:48:49,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1393.88 | bwd_inner_microstep: 1393.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705 [2024-06-10 18:48:52,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.81 | bwd_microstep: 1624.40 | bwd_inner_microstep: 1624.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826 [2024-06-10 18:48:54,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1514.70 | bwd_inner_microstep: 1514.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2094 [2024-06-10 18:48:55,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 885.56 | bwd_inner_microstep: 885.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 18:48:57,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.99 | bwd_microstep: 1530.93 | bwd_inner_microstep: 1530.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2042 [2024-06-10 18:48:58,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.47 | bwd_microstep: 1003.27 | bwd_inner_microstep: 1003.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3588 [2024-06-10 18:49:00,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1464.74 | bwd_inner_microstep: 1464.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 18:49:03,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1496.82 | bwd_inner_microstep: 1496.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3781 [2024-06-10 18:49:06,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-10 18:49:06,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.17 | bwd_microstep: 2482.88 | bwd_inner_microstep: 1686.59 | bwd_allreduce_microstep: 796.23 | step_microstep: 39.48 [2024-06-10 18:49:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16225.28 | bwd: 44402.57 | bwd_inner: 43605.43 | bwd_allreduce: 796.47 | step: 41.07 {'loss': 1.2779, 'learning_rate': 1.3905395118938929e-05, 'epoch': 0.61} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456 [2024-06-10 18:49:08,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1546.69 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392 [2024-06-10 18:49:10,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.91 | bwd_microstep: 1244.14 | bwd_inner_microstep: 1244.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-10 18:49:11,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.44 | bwd_microstep: 788.11 | bwd_inner_microstep: 788.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867 [2024-06-10 18:49:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.52 | bwd_microstep: 1661.14 | bwd_inner_microstep: 1661.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 18:49:15,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1351.10 | bwd_inner_microstep: 1351.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414 [2024-06-10 18:49:17,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.23 | bwd_microstep: 1311.20 | bwd_inner_microstep: 1311.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 18:49:18,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.62 | bwd_microstep: 1341.47 | bwd_inner_microstep: 1341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 18:49:20,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433 [2024-06-10 18:49:22,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1155.30 | bwd_inner_microstep: 1155.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531 [2024-06-10 18:49:24,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1355.19 | bwd_inner_microstep: 1355.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2891 [2024-06-10 18:49:25,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.41 | bwd_microstep: 1025.99 | bwd_inner_microstep: 1025.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 18:49:27,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500 [2024-06-10 18:49:29,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.74 | bwd_microstep: 1584.90 | bwd_inner_microstep: 1584.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 18:49:31,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 18:49:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 18:49:35,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 1606.00 | bwd_inner_microstep: 1605.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294 [2024-06-10 18:49:37,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.13 | bwd_microstep: 878.27 | bwd_inner_microstep: 878.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 18:49:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 796.82 | bwd_inner_microstep: 796.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113 [2024-06-10 18:49:39,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.98 | bwd_microstep: 827.48 | bwd_inner_microstep: 827.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 18:49:40,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.05 | bwd_microstep: 799.52 | bwd_inner_microstep: 799.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475 [2024-06-10 18:49:42,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1316.60 | bwd_inner_microstep: 1316.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 18:49:44,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 18:49:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1389.83 | bwd_inner_microstep: 1389.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2173 [2024-06-10 18:49:47,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.97 | bwd_microstep: 824.45 | bwd_inner_microstep: 824.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-10 18:49:48,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.23 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 18:49:51,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1644.20 | bwd_inner_microstep: 1644.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2239 [2024-06-10 18:49:52,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.24 | bwd_microstep: 995.84 | bwd_inner_microstep: 995.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803 [2024-06-10 18:49:54,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1474.35 | bwd_inner_microstep: 1474.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3770 [2024-06-10 18:49:56,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.13 | bwd_microstep: 1581.04 | bwd_inner_microstep: 1581.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 18:49:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.53 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444 [2024-06-10 18:50:00,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.62 | bwd_microstep: 1411.24 | bwd_inner_microstep: 1411.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3628 [2024-06-10 18:50:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 18:50:07,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.81 | bwd_microstep: 5907.03 | bwd_inner_microstep: 1734.02 | bwd_allreduce_microstep: 4172.96 | step_microstep: 38.05 [2024-06-10 18:50:07,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15355.64 | bwd: 45397.74 | bwd_inner: 41223.86 | bwd_allreduce: 4173.19 | step: 39.64 {'loss': 1.2234, 'learning_rate': 1.3869657074122906e-05, 'epoch': 0.61} dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 3967 [2024-06-10 18:50:08,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 1196.89 | bwd_inner_microstep: 1196.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3971 [2024-06-10 18:50:11,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.61 | bwd_microstep: 1703.69 | bwd_inner_microstep: 1703.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 18:50:12,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.97 | bwd_microstep: 789.48 | bwd_inner_microstep: 789.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461 [2024-06-10 18:50:14,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.96 | bwd_microstep: 1337.03 | bwd_inner_microstep: 1337.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 18:50:16,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1341.12 | bwd_inner_microstep: 1341.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1966 [2024-06-10 18:50:17,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 764.42 | bwd_inner_microstep: 764.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794 [2024-06-10 18:50:19,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.73 | bwd_microstep: 1648.74 | bwd_inner_microstep: 1648.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 18:50:21,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1246.72 | bwd_inner_microstep: 1246.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 18:50:22,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504 [2024-06-10 18:50:25,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.82 | bwd_microstep: 1580.95 | bwd_inner_microstep: 1580.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109 [2024-06-10 18:50:26,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 923.15 | bwd_inner_microstep: 923.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3419 [2024-06-10 18:50:28,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1410.63 | bwd_inner_microstep: 1410.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3709 [2024-06-10 18:50:30,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.09 | bwd_microstep: 1671.23 | bwd_inner_microstep: 1671.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 18:50:32,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 18:50:34,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1246.82 | bwd_inner_microstep: 1246.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 18:50:36,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 18:50:38,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1342.82 | bwd_inner_microstep: 1342.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3646 [2024-06-10 18:50:40,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1444.14 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 18:50:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1411.19 | bwd_inner_microstep: 1411.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3643 [2024-06-10 18:50:43,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1283.02 | bwd_inner_microstep: 1283.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 18:50:45,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 802.77 | bwd_inner_microstep: 802.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 18:50:46,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1399.13 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815 [2024-06-10 18:50:48,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.76 | bwd_microstep: 1387.11 | bwd_inner_microstep: 1387.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 18:50:50,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1393.50 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-10 18:50:52,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1297.60 | bwd_inner_microstep: 1297.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 18:50:54,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1412.41 | bwd_inner_microstep: 1412.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769 [2024-06-10 18:50:56,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.51 | bwd_microstep: 1350.06 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 18:50:58,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.49 | bwd_microstep: 1425.36 | bwd_inner_microstep: 1425.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430 [2024-06-10 18:51:00,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.07 | bwd_microstep: 1373.03 | bwd_inner_microstep: 1373.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777 [2024-06-10 18:51:02,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.57 | bwd_microstep: 1640.37 | bwd_inner_microstep: 1640.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593 [2024-06-10 18:51:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1433.39 | bwd_inner_microstep: 1433.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 18:51:08,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 18:51:08,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.67 | bwd_microstep: 3182.56 | bwd_inner_microstep: 901.62 | bwd_allreduce_microstep: 2280.88 | step_microstep: 39.09 [2024-06-10 18:51:08,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15818.38 | bwd: 44657.89 | bwd_inner: 42376.10 | bwd_allreduce: 2281.11 | step: 40.69 {'loss': 1.1897, 'learning_rate': 1.3833940620342803e-05, 'epoch': 0.61} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3467 [2024-06-10 18:51:09,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1331.10 | bwd_inner_microstep: 1331.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2384 [2024-06-10 18:51:11,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.12 | bwd_microstep: 994.65 | bwd_inner_microstep: 994.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 18:51:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1480.03 | bwd_inner_microstep: 1480.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.63 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1928 [2024-06-10 18:51:14,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.58 | bwd_microstep: 819.77 | bwd_inner_microstep: 819.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863 [2024-06-10 18:51:16,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.06 | bwd_microstep: 1666.24 | bwd_inner_microstep: 1666.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 18:51:18,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864 [2024-06-10 18:51:19,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.15 | bwd_microstep: 674.83 | bwd_inner_microstep: 674.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3680 [2024-06-10 18:51:21,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.05 | bwd_microstep: 1721.71 | bwd_inner_microstep: 1721.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 18:51:22,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.27 | bwd_microstep: 791.61 | bwd_inner_microstep: 791.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-10 18:51:24,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.89 | bwd_microstep: 794.29 | bwd_inner_microstep: 794.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 18:51:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.48 | bwd_microstep: 1314.34 | bwd_inner_microstep: 1314.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702 [2024-06-10 18:51:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1628.44 | bwd_inner_microstep: 1628.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3628 [2024-06-10 18:51:29,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1281.77 | bwd_inner_microstep: 1281.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 18:51:31,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.14 | bwd_microstep: 1289.00 | bwd_inner_microstep: 1288.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 18:51:33,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976 [2024-06-10 18:51:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.87 | bwd_microstep: 887.73 | bwd_inner_microstep: 887.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 18:51:36,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1391.69 | bwd_inner_microstep: 1391.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 18:51:37,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.44 | bwd_microstep: 796.12 | bwd_inner_microstep: 796.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 18:51:39,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1511.89 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837 [2024-06-10 18:51:41,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.90 | bwd_microstep: 1455.12 | bwd_inner_microstep: 1455.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2039 [2024-06-10 18:51:43,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.91 | bwd_microstep: 875.05 | bwd_inner_microstep: 875.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 18:51:45,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.03 | bwd_microstep: 1529.43 | bwd_inner_microstep: 1529.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3713 [2024-06-10 18:51:47,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.80 | bwd_microstep: 1562.83 | bwd_inner_microstep: 1562.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268 [2024-06-10 18:51:48,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.49 | bwd_microstep: 974.28 | bwd_inner_microstep: 974.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830 [2024-06-10 18:51:50,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1363.74 | bwd_inner_microstep: 1363.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3428 [2024-06-10 18:51:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 18:51:54,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 18:51:56,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.21 | bwd_microstep: 1310.19 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 18:51:58,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.31 | bwd_microstep: 1542.73 | bwd_inner_microstep: 1542.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 18:52:00,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1377.88 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597 [2024-06-10 18:52:02,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.47 | bwd_microstep: 1703.38 | bwd_inner_microstep: 1703.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 18:52:10,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 18:52:10,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.69 | bwd_microstep: 6845.80 | bwd_inner_microstep: 1678.71 | bwd_allreduce_microstep: 5167.03 | step_microstep: 37.93 [2024-06-10 18:52:10,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15315.33 | bwd: 46303.31 | bwd_inner: 41135.37 | bwd_allreduce: 5167.26 | step: 40.07 61%|██████ | 1052/1726 [18:09:39<11:40:23, 62.35s/it] 61%|██████ | 1053/1726 [18:10:41<11:39:05, 62.33s/it] 61%|██████ | 1053/1726 [18:10:41<11:39:05, 62.33s/it] 61%|██████ | 1054/1726 [18:11:42<11:33:32, 61.92s/it] 61%|██████ | 1054/1726 [18:11:42<11:33:32, 61.92s/it] 61%|██████ | 1055/1726 [18:12:44<11:29:45, 61.68s/it] 61%|██████ | 1055/1726 [18:12:44<11:29:45, 61.68s/it] 61%|██████ | 1056/1726 [18:13:44<11:25:51, 61.42s/it] 61%|██████ | 1056/1726 [18:13:44<11:25:51, 61.42s/it] 61%|██████ | 1057/1726 [18:14:46<11:26:39, 61.58s/it] {'loss': 1.19, 'learning_rate': 1.3798245883391788e-05, 'epoch': 0.61} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 18:52:12,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.50 | bwd_microstep: 1478.16 | bwd_inner_microstep: 1478.00 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 18:52:13,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.16 | bwd_microstep: 788.33 | bwd_inner_microstep: 788.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475 [2024-06-10 18:52:15,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.43 | bwd_microstep: 1406.35 | bwd_inner_microstep: 1406.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 18:52:17,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1358.02 | bwd_inner_microstep: 1357.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 18:52:18,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3839 [2024-06-10 18:52:20,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1486.90 | bwd_inner_microstep: 1486.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-10 18:52:22,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.35 | bwd_microstep: 1308.01 | bwd_inner_microstep: 1307.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 18:52:24,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1246.75 | bwd_inner_microstep: 1246.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 18:52:26,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1282.39 | bwd_inner_microstep: 1282.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496 [2024-06-10 18:52:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.18 | bwd_microstep: 1221.48 | bwd_inner_microstep: 1221.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 18:52:29,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-10 18:52:32,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.66 | bwd_microstep: 1527.75 | bwd_inner_microstep: 1527.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3654 [2024-06-10 18:52:34,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.04 | bwd_microstep: 1582.58 | bwd_inner_microstep: 1582.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401 [2024-06-10 18:52:35,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.76 | bwd_microstep: 1179.43 | bwd_inner_microstep: 1179.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 18:52:37,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 18:52:39,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128 [2024-06-10 18:52:40,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.14 | bwd_microstep: 926.81 | bwd_inner_microstep: 926.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-10 18:52:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1708.34 | bwd_inner_microstep: 1708.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 18:52:45,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1493.42 | bwd_inner_microstep: 1493.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-10 18:52:47,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1513.00 | bwd_inner_microstep: 1512.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3506 [2024-06-10 18:52:48,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.57 | bwd_microstep: 1221.45 | bwd_inner_microstep: 1221.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 18:52:51,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1657.43 | bwd_inner_microstep: 1657.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 18:52:53,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1301.00 | bwd_inner_microstep: 1300.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 18:52:55,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.13 | bwd_microstep: 1457.20 | bwd_inner_microstep: 1457.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193 [2024-06-10 18:52:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.95 | bwd_microstep: 864.37 | bwd_inner_microstep: 864.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3452 [2024-06-10 18:52:58,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.39 | bwd_microstep: 1381.52 | bwd_inner_microstep: 1381.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 18:53:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.52 | bwd_microstep: 1458.84 | bwd_inner_microstep: 1458.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2073 [2024-06-10 18:53:01,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.50 | bwd_microstep: 851.56 | bwd_inner_microstep: 851.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 18:53:03,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.12 | bwd_microstep: 1546.68 | bwd_inner_microstep: 1546.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 18:53:05,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.05 | bwd_microstep: 1651.36 | bwd_inner_microstep: 1651.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 18:53:07,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 1450.93 | bwd_inner_microstep: 1450.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 18:53:11,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-10 18:53:11,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.95 | bwd_microstep: 3451.88 | bwd_inner_microstep: 1689.14 | bwd_allreduce_microstep: 1762.69 | step_microstep: 38.33 [2024-06-10 18:53:11,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16198.80 | bwd: 45199.59 | bwd_inner: 43435.87 | bwd_allreduce: 1762.98 | step: 39.88 {'loss': 1.1974, 'learning_rate': 1.3762572988986522e-05, 'epoch': 0.61} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 18:53:13,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1390.88 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 18:53:15,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1390.88 | bwd_inner_microstep: 1390.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3877 [2024-06-10 18:53:17,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.82 | bwd_microstep: 1582.39 | bwd_inner_microstep: 1582.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 18:53:18,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 794.48 | bwd_inner_microstep: 794.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3848 [2024-06-10 18:53:21,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.06 | bwd_microstep: 1592.59 | bwd_inner_microstep: 1592.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 18:53:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1346.99 | bwd_inner_microstep: 1346.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3427 [2024-06-10 18:53:24,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.58 | bwd_microstep: 1154.64 | bwd_inner_microstep: 1154.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3783 [2024-06-10 18:53:26,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1510.65 | bwd_inner_microstep: 1510.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 18:53:28,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.28 | bwd_microstep: 1631.82 | bwd_inner_microstep: 1631.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 18:53:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 18:53:32,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1391.93 | bwd_inner_microstep: 1391.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 18:53:34,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.25 | bwd_microstep: 1161.78 | bwd_inner_microstep: 1161.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2510 [2024-06-10 18:53:35,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.36 | bwd_microstep: 989.25 | bwd_inner_microstep: 989.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913 [2024-06-10 18:53:36,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.41 | bwd_microstep: 694.78 | bwd_inner_microstep: 694.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495 [2024-06-10 18:53:38,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1578.28 | bwd_inner_microstep: 1578.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 18:53:40,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.89 | bwd_microstep: 1400.63 | bwd_inner_microstep: 1400.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-10 18:53:42,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.52 | bwd_microstep: 1622.16 | bwd_inner_microstep: 1622.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 18:53:44,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3842 [2024-06-10 18:53:47,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.62 | bwd_microstep: 1832.36 | bwd_inner_microstep: 1832.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 18:53:49,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1349.48 | bwd_inner_microstep: 1349.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3753 [2024-06-10 18:53:51,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.85 | bwd_microstep: 1392.47 | bwd_inner_microstep: 1392.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3701 [2024-06-10 18:53:53,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.85 | bwd_microstep: 1435.59 | bwd_inner_microstep: 1435.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3564 [2024-06-10 18:53:54,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.10 | bwd_microstep: 1203.38 | bwd_inner_microstep: 1203.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 18:53:56,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.37 | bwd_microstep: 1497.34 | bwd_inner_microstep: 1497.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3821 [2024-06-10 18:53:58,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.76 | bwd_microstep: 1626.22 | bwd_inner_microstep: 1626.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279 [2024-06-10 18:54:00,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.72 | bwd_microstep: 909.11 | bwd_inner_microstep: 909.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823 [2024-06-10 18:54:02,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.85 | bwd_microstep: 1419.55 | bwd_inner_microstep: 1419.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 18:54:04,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.79 | bwd_microstep: 1551.05 | bwd_inner_microstep: 1551.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773 [2024-06-10 18:54:06,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.32 | bwd_microstep: 1739.21 | bwd_inner_microstep: 1739.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816 [2024-06-10 18:54:09,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.02 | bwd_microstep: 1751.58 | bwd_inner_microstep: 1751.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 18:54:11,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.61 | bwd_microstep: 1502.48 | bwd_inner_microstep: 1502.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 18:54:13,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.65 [2024-06-10 18:54:13,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1436.28 | bwd_inner_microstep: 1428.33 | bwd_allreduce_microstep: 7.90 | step_microstep: 37.84 [2024-06-10 18:54:13,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16594.72 | bwd: 44488.89 | bwd_inner: 44479.94 | bwd_allreduce: 8.21 | step: 39.35 {'loss': 1.2312, 'learning_rate': 1.3726922062766765e-05, 'epoch': 0.61} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 18:54:15,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.61 | bwd_microstep: 1336.57 | bwd_inner_microstep: 1336.44 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940 [2024-06-10 18:54:17,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.55 | bwd_microstep: 1598.15 | bwd_inner_microstep: 1598.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 18:54:19,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1398.65 | bwd_inner_microstep: 1398.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 18:54:21,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.11 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1995 [2024-06-10 18:54:22,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 798.26 | bwd_inner_microstep: 798.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3822 [2024-06-10 18:54:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1355.00 | bwd_inner_microstep: 1354.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4037 [2024-06-10 18:54:26,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1516.05 | bwd_inner_microstep: 1516.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945 [2024-06-10 18:54:27,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.74 | bwd_microstep: 728.50 | bwd_inner_microstep: 728.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 18:54:29,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1424.87 | bwd_inner_microstep: 1424.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 18:54:30,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1281.50 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1473 [2024-06-10 18:54:31,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 209.80 | bwd_microstep: 543.65 | bwd_inner_microstep: 543.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501 [2024-06-10 18:54:33,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911 [2024-06-10 18:54:34,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.09 | bwd_microstep: 779.26 | bwd_inner_microstep: 779.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 18:54:35,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.11 | bwd_microstep: 884.64 | bwd_inner_microstep: 884.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-10 18:54:37,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1342.51 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-10 18:54:38,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 797.65 | bwd_inner_microstep: 797.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 18:54:40,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832 [2024-06-10 18:54:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1388.60 | bwd_inner_microstep: 1388.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 18:54:44,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1373.16 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3568 [2024-06-10 18:54:46,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.84 | bwd_microstep: 1236.47 | bwd_inner_microstep: 1236.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2401 [2024-06-10 18:54:47,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 937.61 | bwd_inner_microstep: 937.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3472 [2024-06-10 18:54:49,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1360.07 | bwd_inner_microstep: 1360.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3820 [2024-06-10 18:54:51,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.25 | bwd_microstep: 1705.08 | bwd_inner_microstep: 1705.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-10 18:54:53,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.90 | bwd_microstep: 806.03 | bwd_inner_microstep: 806.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654 [2024-06-10 18:54:55,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.46 | bwd_microstep: 1448.23 | bwd_inner_microstep: 1448.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587 [2024-06-10 18:54:56,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1339.11 | bwd_inner_microstep: 1339.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3706 [2024-06-10 18:54:58,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.67 | bwd_microstep: 1460.69 | bwd_inner_microstep: 1460.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 18:55:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1474.84 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3791 [2024-06-10 18:55:02,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1414.49 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 18:55:05,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.42 | bwd_microstep: 1655.52 | bwd_inner_microstep: 1655.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421 [2024-06-10 18:55:07,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1407.09 | bwd_inner_microstep: 1407.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3773 [2024-06-10 18:55:14,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-10 18:55:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.23 | bwd_microstep: 6287.43 | bwd_inner_microstep: 1912.64 | bwd_allreduce_microstep: 4374.73 | step_microstep: 37.95 [2024-06-10 18:55:14,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15251.51 | bwd: 45255.39 | bwd_inner: 40879.64 | bwd_allreduce: 4375.02 | step: 39.49 {'loss': 1.1915, 'learning_rate': 1.369129323029489e-05, 'epoch': 0.61} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 18:55:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.13 | bwd_microstep: 1329.25 | bwd_inner_microstep: 1329.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2427 [2024-06-10 18:55:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.38 | bwd_microstep: 905.76 | bwd_inner_microstep: 905.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 18:55:18,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.32 | bwd_microstep: 1241.86 | bwd_inner_microstep: 1241.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800 [2024-06-10 18:55:21,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.87 | bwd_microstep: 1648.43 | bwd_inner_microstep: 1648.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 18:55:23,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.93 | bwd_microstep: 1553.17 | bwd_inner_microstep: 1553.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 18:55:25,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416 [2024-06-10 18:55:26,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.48 | bwd_microstep: 1152.30 | bwd_inner_microstep: 1152.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 18:55:27,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.75 | bwd_microstep: 792.53 | bwd_inner_microstep: 792.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 18:55:29,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.97 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 18:55:31,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.58 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 18:55:33,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1413.02 | bwd_inner_microstep: 1412.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 18:55:34,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1250.61 | bwd_inner_microstep: 1250.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-10 18:55:36,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.29 | bwd_microstep: 1411.71 | bwd_inner_microstep: 1411.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3679 [2024-06-10 18:55:39,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.72 | bwd_microstep: 1657.85 | bwd_inner_microstep: 1657.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2605 [2024-06-10 18:55:40,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.58 | bwd_microstep: 964.05 | bwd_inner_microstep: 964.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3519 [2024-06-10 18:55:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1650.88 | bwd_inner_microstep: 1650.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 18:55:44,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1340.31 | bwd_inner_microstep: 1340.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947 [2024-06-10 18:55:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.84 | bwd_microstep: 1199.93 | bwd_inner_microstep: 1199.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3647 [2024-06-10 18:55:48,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.62 | bwd_microstep: 1578.76 | bwd_inner_microstep: 1578.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744 [2024-06-10 18:55:50,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1471.51 | bwd_inner_microstep: 1471.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2279 [2024-06-10 18:55:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.34 | bwd_microstep: 924.66 | bwd_inner_microstep: 924.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-10 18:55:53,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.47 | bwd_microstep: 1358.01 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 18:55:55,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1309.53 | bwd_inner_microstep: 1309.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3466 [2024-06-10 18:55:57,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.37 | bwd_microstep: 1308.87 | bwd_inner_microstep: 1308.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511 [2024-06-10 18:55:59,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.15 | bwd_microstep: 1355.36 | bwd_inner_microstep: 1355.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3636 [2024-06-10 18:56:01,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1437.60 | bwd_inner_microstep: 1437.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816 [2024-06-10 18:56:03,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.14 | bwd_microstep: 1621.45 | bwd_inner_microstep: 1621.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825 [2024-06-10 18:56:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.56 | bwd_microstep: 1751.57 | bwd_inner_microstep: 1751.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 18:56:07,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1393.70 | bwd_inner_microstep: 1393.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939 [2024-06-10 18:56:08,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.52 | bwd_microstep: 819.66 | bwd_inner_microstep: 819.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3720 [2024-06-10 18:56:11,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.52 | bwd_microstep: 1629.87 | bwd_inner_microstep: 1629.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2889 [2024-06-10 18:56:15,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 18:56:15,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.70 | bwd_microstep: 3424.28 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 2086.49 | step_microstep: 37.97 [2024-06-10 18:56:15,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15897.82 | bwd: 44753.34 | bwd_inner: 42665.96 | bwd_allreduce: 2086.72 | step: 39.53 {'loss': 1.2001, 'learning_rate': 1.3655686617055466e-05, 'epoch': 0.61} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-10 18:56:16,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.12 | bwd_microstep: 1276.49 | bwd_inner_microstep: 1276.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532 [2024-06-10 18:56:18,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.70 | bwd_microstep: 1452.28 | bwd_inner_microstep: 1452.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848 [2024-06-10 18:56:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.85 | bwd_microstep: 1558.17 | bwd_inner_microstep: 1558.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476 [2024-06-10 18:56:22,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.08 | bwd_microstep: 1341.75 | bwd_inner_microstep: 1341.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4126 [2024-06-10 18:56:24,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1438.41 | bwd_inner_microstep: 1438.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750 [2024-06-10 18:56:26,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1533.74 | bwd_inner_microstep: 1533.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 18:56:28,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1244.07 | bwd_inner_microstep: 1244.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 18:56:30,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416 [2024-06-10 18:56:32,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.16 | bwd_microstep: 1277.91 | bwd_inner_microstep: 1277.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 18:56:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.94 | bwd_microstep: 1189.69 | bwd_inner_microstep: 1189.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1968 [2024-06-10 18:56:35,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.18 | bwd_microstep: 839.23 | bwd_inner_microstep: 839.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2124 [2024-06-10 18:56:36,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.01 | bwd_microstep: 921.54 | bwd_inner_microstep: 921.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491 [2024-06-10 18:56:38,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1439.75 | bwd_inner_microstep: 1439.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1934 [2024-06-10 18:56:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.53 | bwd_microstep: 884.69 | bwd_inner_microstep: 884.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3536 [2024-06-10 18:56:41,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1452.47 | bwd_inner_microstep: 1452.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2468 [2024-06-10 18:56:42,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.58 | bwd_microstep: 980.26 | bwd_inner_microstep: 980.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 18:56:44,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.45 | bwd_microstep: 1285.17 | bwd_inner_microstep: 1285.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 18:56:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.42 | bwd_microstep: 1289.80 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-10 18:56:48,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.00 | bwd_microstep: 1611.22 | bwd_inner_microstep: 1611.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821 [2024-06-10 18:56:50,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1485.15 | bwd_inner_microstep: 1485.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-10 18:56:51,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.67 | bwd_microstep: 799.61 | bwd_inner_microstep: 799.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 18:56:54,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.58 | bwd_microstep: 1654.97 | bwd_inner_microstep: 1654.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 18:56:55,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1287.95 | bwd_inner_microstep: 1287.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 18:56:57,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.69 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2236 [2024-06-10 18:56:59,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 962.48 | bwd_inner_microstep: 962.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1512 [2024-06-10 18:56:59,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 226.38 | bwd_microstep: 589.68 | bwd_inner_microstep: 589.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808 [2024-06-10 18:57:02,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.59 | bwd_microstep: 1580.56 | bwd_inner_microstep: 1580.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 18:57:04,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1392.96 | bwd_inner_microstep: 1392.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2107 [2024-06-10 18:57:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.87 | bwd_microstep: 855.55 | bwd_inner_microstep: 855.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3605 [2024-06-10 18:57:07,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.72 | bwd_microstep: 1754.09 | bwd_inner_microstep: 1754.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 18:57:09,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 18:57:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.61 [2024-06-10 18:57:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 6437.13 | bwd_inner_microstep: 1574.82 | bwd_allreduce_microstep: 4862.25 | step_microstep: 37.85 [2024-06-10 18:57:16,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15362.57 | bwd: 46018.64 | bwd_inner: 41155.47 | bwd_allreduce: 4862.49 | step: 39.31 {'loss': 1.2175, 'learning_rate': 1.3620102348454802e-05, 'epoch': 0.62} 61%|██████ | 1057/1726 [18:14:46<11:26:39, 61.58s/it] 61%|██████▏ | 1058/1726 [18:15:48<11:26:10, 61.63s/it] 61%|██████▏ | 1058/1726 [18:15:48<11:26:10, 61.63s/it] 61%|██████▏ | 1059/1726 [18:16:49<11:24:26, 61.57s/it] 61%|██████▏ | 1059/1726 [18:16:49<11:24:26, 61.57s/it] 61%|██████▏ | 1060/1726 [18:17:50<11:20:57, 61.35s/it] 61%|██████▏ | 1060/1726 [18:17:50<11:20:57, 61.35s/it] 61%|██████▏ | 1061/1726 [18:18:51<11:18:44, 61.24s/it] 61%|██████▏ | 1061/1726 [18:18:51<11:18:44, 61.24s/it] 62%|██████▏ | 1062/1726 [18:19:53<11:19:16, 61.38s/it] dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3962 [2024-06-10 18:57:19,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.91 | bwd_microstep: 1776.49 | bwd_inner_microstep: 1776.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 18:57:21,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.67 | bwd_microstep: 1381.57 | bwd_inner_microstep: 1381.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 18:57:22,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.61 | bwd_microstep: 1242.07 | bwd_inner_microstep: 1242.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3760 [2024-06-10 18:57:24,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1338.07 | bwd_inner_microstep: 1338.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 18:57:26,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1243.31 | bwd_inner_microstep: 1243.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 18:57:28,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1281.65 | bwd_inner_microstep: 1281.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 18:57:29,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.68 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 18:57:31,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1398.01 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 18:57:33,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.76 | bwd_microstep: 1284.41 | bwd_inner_microstep: 1284.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418 [2024-06-10 18:57:35,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1307.84 | bwd_inner_microstep: 1307.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 18:57:37,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1483.98 | bwd_inner_microstep: 1483.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2398 [2024-06-10 18:57:39,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.98 | bwd_microstep: 1097.16 | bwd_inner_microstep: 1097.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645 [2024-06-10 18:57:41,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.54 | bwd_microstep: 1604.30 | bwd_inner_microstep: 1604.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529 [2024-06-10 18:57:43,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.79 | bwd_microstep: 1413.70 | bwd_inner_microstep: 1413.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 18:57:45,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.54 | bwd_microstep: 1510.93 | bwd_inner_microstep: 1510.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 18:57:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.46 | bwd_microstep: 803.42 | bwd_inner_microstep: 803.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2291 [2024-06-10 18:57:47,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.47 | bwd_microstep: 878.79 | bwd_inner_microstep: 878.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 18:57:49,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.83 | bwd_microstep: 1341.71 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 18:57:51,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3464 [2024-06-10 18:57:52,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.19 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 18:57:55,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 1493.91 | bwd_inner_microstep: 1493.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 18:57:57,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1497.19 | bwd_inner_microstep: 1497.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612 [2024-06-10 18:57:59,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.66 | bwd_microstep: 1609.04 | bwd_inner_microstep: 1609.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990 [2024-06-10 18:58:00,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.29 | bwd_microstep: 861.83 | bwd_inner_microstep: 861.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3569 [2024-06-10 18:58:02,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.82 | bwd_microstep: 1544.27 | bwd_inner_microstep: 1544.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-10 18:58:04,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1538.58 | bwd_inner_microstep: 1538.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 18:58:07,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.11 | bwd_microstep: 1647.66 | bwd_inner_microstep: 1647.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3381 [2024-06-10 18:58:08,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.83 | bwd_microstep: 1273.62 | bwd_inner_microstep: 1273.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2281 [2024-06-10 18:58:09,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.39 | bwd_microstep: 782.86 | bwd_inner_microstep: 782.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575 [2024-06-10 18:58:12,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1698.07 | bwd_inner_microstep: 1698.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3810 [2024-06-10 18:58:14,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.42 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2229 [2024-06-10 18:58:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.20 | optimizer_step: 6.56 [2024-06-10 18:58:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.44 | bwd_microstep: 4184.16 | bwd_inner_microstep: 981.22 | bwd_allreduce_microstep: 3202.89 | step_microstep: 37.94 [2024-06-10 18:58:18,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15955.16 | bwd: 45922.17 | bwd_inner: 42718.37 | bwd_allreduce: 3203.12 | step: 39.42 {'loss': 1.1969, 'learning_rate': 1.3584540549820493e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 18:58:20,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1375.73 | bwd_inner_microstep: 1375.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 18:58:22,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 18:58:24,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.79 | bwd_microstep: 1338.20 | bwd_inner_microstep: 1338.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3044 [2024-06-10 18:58:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.82 | bwd_microstep: 1131.82 | bwd_inner_microstep: 1131.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 18:58:28,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.55 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017 [2024-06-10 18:58:30,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.10 | bwd_microstep: 1709.50 | bwd_inner_microstep: 1709.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 18:58:32,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1251.39 | bwd_inner_microstep: 1251.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956 [2024-06-10 18:58:33,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.05 | bwd_microstep: 824.84 | bwd_inner_microstep: 824.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 18:58:35,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.01 | bwd_microstep: 1381.98 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 18:58:37,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.63 | bwd_inner_microstep: 1377.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3463 [2024-06-10 18:58:39,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.02 | bwd_microstep: 1362.16 | bwd_inner_microstep: 1362.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2432 [2024-06-10 18:58:40,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.02 | bwd_microstep: 1035.95 | bwd_inner_microstep: 1035.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967 [2024-06-10 18:58:41,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.46 | bwd_microstep: 892.09 | bwd_inner_microstep: 892.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 18:58:43,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.04 | bwd_microstep: 1611.05 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3649 [2024-06-10 18:58:46,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.89 | bwd_microstep: 1652.84 | bwd_inner_microstep: 1652.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3642 [2024-06-10 18:58:48,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1512.04 | bwd_inner_microstep: 1512.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3705 [2024-06-10 18:58:50,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.44 | bwd_microstep: 1297.27 | bwd_inner_microstep: 1297.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537 [2024-06-10 18:58:52,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1414.66 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 18:58:54,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.07 | bwd_microstep: 1489.08 | bwd_inner_microstep: 1489.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2289 [2024-06-10 18:58:55,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.38 | bwd_microstep: 942.36 | bwd_inner_microstep: 942.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3916 [2024-06-10 18:58:57,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.38 | bwd_microstep: 1792.75 | bwd_inner_microstep: 1792.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 18:59:00,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.92 | bwd_microstep: 1645.72 | bwd_inner_microstep: 1645.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 18:59:01,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.47 | bwd_microstep: 1350.43 | bwd_inner_microstep: 1350.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 18:59:03,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.03 | bwd_microstep: 975.94 | bwd_inner_microstep: 975.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3722 [2024-06-10 18:59:05,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.05 | bwd_microstep: 1781.16 | bwd_inner_microstep: 1781.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-10 18:59:07,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.53 | bwd_microstep: 1359.80 | bwd_inner_microstep: 1359.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-10 18:59:09,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.62 | bwd_microstep: 1522.43 | bwd_inner_microstep: 1522.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3603 [2024-06-10 18:59:11,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.99 | bwd_microstep: 1641.63 | bwd_inner_microstep: 1641.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 18:59:14,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1556.72 | bwd_inner_microstep: 1556.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 18:59:16,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1556.74 | bwd_inner_microstep: 1556.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 18:59:18,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 18:59:47,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 18:59:47,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.32 | bwd_microstep: 29101.22 | bwd_inner_microstep: 1951.10 | bwd_allreduce_microstep: 27150.05 | step_microstep: 38.86 [2024-06-10 18:59:47,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16565.85 | bwd: 72021.23 | bwd_inner: 44870.25 | bwd_allreduce: 27150.29 | step: 40.33 {'loss': 1.2115, 'learning_rate': 1.3549001346401017e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 18:59:49,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.19 | bwd_microstep: 1366.78 | bwd_inner_microstep: 1366.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3906 [2024-06-10 18:59:51,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.91 | bwd_microstep: 1479.69 | bwd_inner_microstep: 1479.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-10 18:59:53,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.78 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 18:59:55,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.19 | bwd_microstep: 1273.27 | bwd_inner_microstep: 1273.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 18:59:57,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1474.44 | bwd_inner_microstep: 1474.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 18:59:59,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.69 | bwd_inner_microstep: 1256.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 19:00:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.64 | bwd_microstep: 1626.15 | bwd_inner_microstep: 1626.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2087 [2024-06-10 19:00:02,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.60 | bwd_microstep: 787.79 | bwd_inner_microstep: 787.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 19:00:46,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 696.55 | bwd_inner_microstep: 696.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3626 [2024-06-10 19:00:48,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062 [2024-06-10 19:00:53,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.07 | bwd_microstep: 810.25 | bwd_inner_microstep: 810.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-10 19:00:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.72 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3503 [2024-06-10 19:00:58,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.50 | bwd_microstep: 1666.28 | bwd_inner_microstep: 1666.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 19:00:59,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1335.23 | bwd_inner_microstep: 1335.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:01:01,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1371.11 | bwd_inner_microstep: 1371.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2133 [2024-06-10 19:01:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.19 | bwd_microstep: 857.02 | bwd_inner_microstep: 856.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 19:01:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.00 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974 [2024-06-10 19:01:05,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.68 | bwd_microstep: 764.22 | bwd_inner_microstep: 764.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 19:01:07,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1388.71 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001 [2024-06-10 19:01:08,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.92 | bwd_microstep: 798.49 | bwd_inner_microstep: 798.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829 [2024-06-10 19:01:09,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.21 | bwd_microstep: 1257.58 | bwd_inner_microstep: 1257.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545 [2024-06-10 19:01:11,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1318.43 | bwd_inner_microstep: 1318.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 19:01:13,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3088 [2024-06-10 19:01:15,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.02 | bwd_microstep: 1241.67 | bwd_inner_microstep: 1241.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-10 19:01:17,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.55 | bwd_microstep: 1449.99 | bwd_inner_microstep: 1449.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 19:01:19,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.43 | bwd_microstep: 1296.00 | bwd_inner_microstep: 1295.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 19:01:21,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1491.85 | bwd_inner_microstep: 1491.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 19:01:23,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1639.71 | bwd_inner_microstep: 1639.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2104 [2024-06-10 19:01:24,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.01 | bwd_microstep: 1014.82 | bwd_inner_microstep: 1014.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3440 [2024-06-10 19:01:26,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.45 | bwd_microstep: 1496.60 | bwd_inner_microstep: 1496.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-10 19:01:28,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.01 | bwd_microstep: 1592.23 | bwd_inner_microstep: 1592.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3777 [2024-06-10 19:01:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 19:01:42,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.39 | bwd_microstep: 13206.18 | bwd_inner_microstep: 2301.03 | bwd_allreduce_microstep: 10905.08 | step_microstep: 38.06 [2024-06-10 19:01:42,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15398.48 | bwd: 52458.83 | bwd_inner: 41552.83 | bwd_allreduce: 10905.31 | step: 39.52 {'loss': 1.2288, 'learning_rate': 1.3513484863365265e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 19:01:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.54 | bwd_microstep: 1330.32 | bwd_inner_microstep: 1330.23 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 19:01:46,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.21 | bwd_microstep: 1542.35 | bwd_inner_microstep: 1542.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 19:01:48,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1349.59 | bwd_inner_microstep: 1349.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 19:01:50,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.48 | bwd_microstep: 1146.97 | bwd_inner_microstep: 1146.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-10 19:01:52,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.85 | bwd_microstep: 1632.19 | bwd_inner_microstep: 1632.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 19:01:54,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.34 | bwd_microstep: 1290.75 | bwd_inner_microstep: 1290.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-10 19:01:55,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.21 | bwd_microstep: 1152.07 | bwd_inner_microstep: 1152.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 19:01:57,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1380.53 | bwd_inner_microstep: 1380.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 19:01:59,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 19:02:01,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 19:02:03,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1286.48 | bwd_inner_microstep: 1286.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625 [2024-06-10 19:02:05,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1441.58 | bwd_inner_microstep: 1441.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911 [2024-06-10 19:02:07,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.65 | bwd_microstep: 1679.98 | bwd_inner_microstep: 1679.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483 [2024-06-10 19:02:09,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.65 | bwd_microstep: 1310.58 | bwd_inner_microstep: 1310.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3879 [2024-06-10 19:02:11,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.29 | bwd_microstep: 1673.51 | bwd_inner_microstep: 1673.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3463 [2024-06-10 19:02:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.74 | bwd_microstep: 1568.40 | bwd_inner_microstep: 1568.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3569 [2024-06-10 19:02:15,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.14 | bwd_microstep: 1331.01 | bwd_inner_microstep: 1330.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3462 [2024-06-10 19:02:17,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.83 | bwd_microstep: 1180.14 | bwd_inner_microstep: 1180.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-10 19:02:19,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.21 | bwd_microstep: 1199.76 | bwd_inner_microstep: 1199.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032 [2024-06-10 19:02:20,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 808.72 | bwd_inner_microstep: 808.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-10 19:02:21,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.08 | bwd_microstep: 799.21 | bwd_inner_microstep: 799.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 19:02:23,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1252.62 | bwd_inner_microstep: 1252.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 19:02:24,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1394.77 | bwd_inner_microstep: 1394.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 19:02:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1293.38 | bwd_inner_microstep: 1293.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3995 [2024-06-10 19:02:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.92 | bwd_microstep: 1916.74 | bwd_inner_microstep: 1916.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 19:02:31,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1282.86 | bwd_inner_microstep: 1282.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735 [2024-06-10 19:02:33,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1436.84 | bwd_inner_microstep: 1436.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3718 [2024-06-10 19:02:34,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.66 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3423 [2024-06-10 19:02:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.96 | bwd_microstep: 1376.86 | bwd_inner_microstep: 1376.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2296 [2024-06-10 19:02:38,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.32 | bwd_microstep: 875.67 | bwd_inner_microstep: 875.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 19:02:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.55 | bwd_microstep: 1346.82 | bwd_inner_microstep: 1346.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 19:03:06,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.36 | optimizer_step: 6.62 [2024-06-10 19:03:06,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 26410.54 | bwd_inner_microstep: 1800.07 | bwd_allreduce_microstep: 24610.40 | step_microstep: 38.98 [2024-06-10 19:03:06,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16079.66 | bwd: 67659.66 | bwd_inner: 43048.27 | bwd_allreduce: 24610.68 | step: 40.47 {'loss': 1.2629, 'learning_rate': 1.3477991225802103e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 19:03:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.49 | bwd_microstep: 1366.55 | bwd_inner_microstep: 1366.36 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3970 [2024-06-10 19:03:11,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.65 | bwd_microstep: 1596.09 | bwd_inner_microstep: 1596.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 19:03:13,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.84 | bwd_microstep: 1641.29 | bwd_inner_microstep: 1641.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 19:03:15,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1372.90 | bwd_inner_microstep: 1372.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3496 [2024-06-10 19:03:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.21 | bwd_microstep: 1217.12 | bwd_inner_microstep: 1217.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 19:03:18,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1395.10 | bwd_inner_microstep: 1395.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 19:03:20,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.88 | bwd_microstep: 1277.67 | bwd_inner_microstep: 1277.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 19:03:22,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.58 | bwd_microstep: 1307.36 | bwd_inner_microstep: 1307.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449 [2024-06-10 19:04:05,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.06 | bwd_microstep: 1183.95 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 19:04:08,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.56 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509 [2024-06-10 19:04:10,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1423.33 | bwd_inner_microstep: 1423.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3547 [2024-06-10 19:04:11,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.31 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 19:04:13,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1475.98 | bwd_inner_microstep: 1475.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457 [2024-06-10 19:04:15,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.30 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3627 [2024-06-10 19:04:17,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.77 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559 [2024-06-10 19:04:20,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-10 19:04:22,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1431.41 | bwd_inner_microstep: 1431.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 19:04:24,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.68 | bwd_microstep: 1483.13 | bwd_inner_microstep: 1483.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829 [2024-06-10 19:04:52,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.77 | bwd_microstep: 1740.23 | bwd_inner_microstep: 1740.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3661 [2024-06-10 19:04:54,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1409.15 | bwd_inner_microstep: 1409.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 19:04:56,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1383.95 | bwd_inner_microstep: 1383.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-10 19:04:57,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.16 | bwd_microstep: 695.80 | bwd_inner_microstep: 695.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 19:04:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1411.59 | bwd_inner_microstep: 1411.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 19:05:01,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1345.84 | bwd_inner_microstep: 1345.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2035 [2024-06-10 19:05:02,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.89 | bwd_microstep: 838.61 | bwd_inner_microstep: 838.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 19:05:03,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1280.70 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 19:05:06,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1528.49 | bwd_inner_microstep: 1528.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 19:05:08,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.34 | bwd_inner_microstep: 1504.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915 [2024-06-10 19:05:39,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.14 | bwd_microstep: 1477.64 | bwd_inner_microstep: 1477.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-10 19:05:58,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1631.05 | bwd_inner_microstep: 1631.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2237 [2024-06-10 19:06:00,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.79 | bwd_microstep: 1052.08 | bwd_inner_microstep: 1052.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450 [2024-06-10 19:06:14,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.43 | optimizer_step: 6.60 [2024-06-10 19:06:14,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 13512.61 | bwd_inner_microstep: 1572.32 | bwd_allreduce_microstep: 11940.22 | step_microstep: 41.05 [2024-06-10 19:06:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16512.88 | bwd: 56151.37 | bwd_inner: 44210.07 | bwd_allreduce: 11940.54 | step: 42.63 {'loss': 1.1741, 'learning_rate': 1.3442520558719944e-05, 'epoch': 0.62} 62%|██████▏ | 1062/1726 [18:19:53<11:19:16, 61.38s/it] 62%|██████▏ | 1063/1726 [18:20:55<11:21:01, 61.63s/it] 62%|██████▏ | 1063/1726 [18:20:55<11:21:01, 61.63s/it] 62%|██████▏ | 1064/1726 [18:22:24<12:50:21, 69.82s/it] 62%|██████▏ | 1064/1726 [18:22:24<12:50:21, 69.82s/it] 62%|██████▏ | 1065/1726 [18:24:19<15:18:29, 83.37s/it] 62%|██████▏ | 1065/1726 [18:24:19<15:18:29, 83.37s/it] 62%|██████▏ | 1066/1726 [18:25:43<15:19:23, 83.58s/it] 62%|██████▏ | 1066/1726 [18:25:43<15:19:23, 83.58s/it] 62%|██████▏ | 1067/1726 [18:28:50<20:59:39, 114.69s/it] 62%|█dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3417 [2024-06-10 19:06:15,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.45 | bwd_microstep: 1195.22 | bwd_inner_microstep: 1195.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 19:06:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1368.07 | bwd_inner_microstep: 1368.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 19:06:19,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 19:06:21,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.88 | bwd_microstep: 1238.24 | bwd_inner_microstep: 1238.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 19:06:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.34 | bwd_microstep: 1625.50 | bwd_inner_microstep: 1625.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 19:06:25,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.06 | bwd_microstep: 1374.94 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 19:06:27,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.03 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 19:06:29,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.46 | bwd_microstep: 1276.85 | bwd_inner_microstep: 1276.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 19:07:05,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1241.43 | bwd_inner_microstep: 1241.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3448 [2024-06-10 19:07:06,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.19 | bwd_microstep: 1183.38 | bwd_inner_microstep: 1183.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 19:07:08,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1382.46 | bwd_inner_microstep: 1382.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3452 [2024-06-10 19:07:10,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1402.50 | bwd_inner_microstep: 1402.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-10 19:07:11,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.43 | bwd_microstep: 873.28 | bwd_inner_microstep: 873.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2483 [2024-06-10 19:07:13,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.19 | bwd_microstep: 1070.96 | bwd_inner_microstep: 1070.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 19:07:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1408.06 | bwd_inner_microstep: 1408.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 19:07:17,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1370.61 | bwd_inner_microstep: 1370.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 19:07:19,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1386.89 | bwd_inner_microstep: 1386.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2950 [2024-06-10 19:07:20,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.84 | bwd_microstep: 1005.48 | bwd_inner_microstep: 1005.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 19:07:22,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1395.02 | bwd_inner_microstep: 1394.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2083 [2024-06-10 19:07:23,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.52 | bwd_microstep: 946.68 | bwd_inner_microstep: 946.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 19:07:25,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.12 | bwd_microstep: 1656.50 | bwd_inner_microstep: 1656.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-10 19:07:28,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.05 | bwd_microstep: 1520.48 | bwd_inner_microstep: 1520.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605 [2024-06-10 19:07:29,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1305.51 | bwd_inner_microstep: 1305.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 19:07:31,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1397.64 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 19:07:33,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1408.97 | bwd_inner_microstep: 1408.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3713 [2024-06-10 19:07:35,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.54 | bwd_microstep: 1490.29 | bwd_inner_microstep: 1490.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 19:07:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1414.70 | bwd_inner_microstep: 1414.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059 [2024-06-10 19:07:38,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.61 | bwd_microstep: 912.65 | bwd_inner_microstep: 912.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 19:07:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.09 | bwd_microstep: 1453.41 | bwd_inner_microstep: 1453.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 19:07:42,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 973.96 | bwd_inner_microstep: 973.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-10 19:07:44,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.06 | bwd_microstep: 1540.11 | bwd_inner_microstep: 1540.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 19:08:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 19:08:02,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 17137.64 | bwd_inner_microstep: 1551.76 | bwd_allreduce_microstep: 15585.81 | step_microstep: 38.34 [2024-06-10 19:08:02,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15728.26 | bwd: 57623.92 | bwd_inner: 42037.21 | bwd_allreduce: 15586.04 | step: 39.84 {'loss': 1.1831, 'learning_rate': 1.3407072987046283e-05, 'epoch': 0.62} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 19:08:03,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1230.56 | bwd_inner_microstep: 1230.38 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 19:08:05,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1378.54 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 19:08:07,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.07 | bwd_microstep: 1496.86 | bwd_inner_microstep: 1496.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782 [2024-06-10 19:08:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1465.36 | bwd_inner_microstep: 1465.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3762 [2024-06-10 19:08:11,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.45 | bwd_microstep: 1494.96 | bwd_inner_microstep: 1494.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 19:08:13,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.09 | bwd_microstep: 786.65 | bwd_inner_microstep: 786.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 19:08:14,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.00 | bwd_microstep: 1246.61 | bwd_inner_microstep: 1246.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 19:08:15,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.43 | bwd_microstep: 678.27 | bwd_inner_microstep: 678.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699 [2024-06-10 19:08:17,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.46 | bwd_microstep: 1620.14 | bwd_inner_microstep: 1620.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1956 [2024-06-10 19:08:19,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.72 | bwd_microstep: 839.77 | bwd_inner_microstep: 839.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 19:08:20,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1375.42 | bwd_inner_microstep: 1375.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 19:08:23,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1489.21 | bwd_inner_microstep: 1489.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 19:08:25,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.97 | bwd_microstep: 1489.38 | bwd_inner_microstep: 1489.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2919 [2024-06-10 19:08:26,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.49 | bwd_microstep: 1030.67 | bwd_inner_microstep: 1030.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 19:08:28,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1391.70 | bwd_inner_microstep: 1391.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3524 [2024-06-10 19:08:30,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.71 | bwd_microstep: 1228.50 | bwd_inner_microstep: 1228.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469 [2024-06-10 19:08:32,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.60 | bwd_microstep: 1433.48 | bwd_inner_microstep: 1433.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996 [2024-06-10 19:08:33,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.50 | bwd_microstep: 801.77 | bwd_inner_microstep: 801.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 19:08:35,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1276.74 | bwd_inner_microstep: 1276.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 19:08:37,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.74 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 19:08:38,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.39 | bwd_microstep: 1402.51 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 19:08:40,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 19:08:42,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.72 | bwd_microstep: 1278.44 | bwd_inner_microstep: 1278.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 19:08:44,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.50 | bwd_microstep: 1290.01 | bwd_inner_microstep: 1289.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 19:08:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.25 | bwd_microstep: 1442.37 | bwd_inner_microstep: 1442.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-10 19:08:48,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1607.89 | bwd_inner_microstep: 1607.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 19:08:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.72 | bwd_microstep: 1415.26 | bwd_inner_microstep: 1415.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 19:08:52,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1377.93 | bwd_inner_microstep: 1377.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3575 [2024-06-10 19:08:54,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1663.66 | bwd_inner_microstep: 1663.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2957 [2024-06-10 19:08:56,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.94 | bwd_microstep: 1137.40 | bwd_inner_microstep: 1137.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596 [2024-06-10 19:08:58,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.56 | bwd_microstep: 1702.74 | bwd_inner_microstep: 1702.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-10 19:09:02,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 19:09:02,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.63 | bwd_microstep: 3104.41 | bwd_inner_microstep: 1831.49 | bwd_allreduce_microstep: 1272.87 | step_microstep: 38.03 [2024-06-10 19:09:02,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15850.40 | bwd: 43992.16 | bwd_inner: 42718.25 | bwd_allreduce: 1273.17 | step: 39.65 {'loss': 1.1749, 'learning_rate': 1.3371648635627285e-05, 'epoch': 0.62} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3465 [2024-06-10 19:09:04,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.34 | bwd_microstep: 1321.52 | bwd_inner_microstep: 1321.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 19:09:05,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1242.61 | bwd_inner_microstep: 1242.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 19:09:07,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.20 | bwd_microstep: 1377.81 | bwd_inner_microstep: 1377.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 19:09:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 19:09:12,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1552.28 | bwd_inner_microstep: 1552.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 19:09:13,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1384.23 | bwd_inner_microstep: 1384.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 19:09:15,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1247.92 | bwd_inner_microstep: 1247.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3712 [2024-06-10 19:09:17,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.51 | bwd_microstep: 1460.46 | bwd_inner_microstep: 1460.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 19:09:19,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.01 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-10 19:09:21,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2499 [2024-06-10 19:09:22,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 957.32 | bwd_inner_microstep: 957.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-10 19:09:25,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.49 | bwd_inner_microstep: 1633.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719 [2024-06-10 19:09:27,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.04 | bwd_microstep: 1625.05 | bwd_inner_microstep: 1625.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3529 [2024-06-10 19:09:29,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1354.31 | bwd_inner_microstep: 1354.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 19:09:30,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.82 | bwd_microstep: 699.73 | bwd_inner_microstep: 699.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 19:09:31,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534 [2024-06-10 19:09:33,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1324.97 | bwd_inner_microstep: 1324.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469 [2024-06-10 19:09:35,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.11 | bwd_microstep: 1182.74 | bwd_inner_microstep: 1182.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3654 [2024-06-10 19:09:37,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1425.24 | bwd_inner_microstep: 1425.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 19:09:39,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1295.59 | bwd_inner_microstep: 1295.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3467 [2024-06-10 19:09:41,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1364.04 | bwd_inner_microstep: 1364.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2007 [2024-06-10 19:09:42,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.99 | bwd_microstep: 837.13 | bwd_inner_microstep: 837.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-10 19:09:43,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.77 | bwd_microstep: 806.99 | bwd_inner_microstep: 806.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1976 [2024-06-10 19:09:44,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.86 | bwd_microstep: 735.49 | bwd_inner_microstep: 735.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 19:09:46,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.45 | bwd_microstep: 1310.08 | bwd_inner_microstep: 1310.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 19:09:48,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 19:09:50,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.75 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598 [2024-06-10 19:09:52,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.05 | bwd_microstep: 1600.30 | bwd_inner_microstep: 1600.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 19:09:54,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1390.90 | bwd_inner_microstep: 1390.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 19:09:56,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1394.59 | bwd_inner_microstep: 1394.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 19:09:58,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1503.32 | bwd_inner_microstep: 1503.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3808 [2024-06-10 19:10:01,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.05 | optimizer_step: 6.61 [2024-06-10 19:10:01,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.26 | bwd_microstep: 2430.25 | bwd_inner_microstep: 1899.62 | bwd_allreduce_microstep: 530.58 | step_microstep: 37.58 [2024-06-10 19:10:01,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15883.26 | bwd: 42975.10 | bwd_inner: 42443.62 | bwd_allreduce: 530.81 | step: 39.10 {'loss': 1.2407, 'learning_rate': 1.3336247629227339e-05, 'epoch': 0.62} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 19:10:03,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1499.92 | bwd_inner_microstep: 1499.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4603 [2024-06-10 19:10:06,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.11 | bwd_microstep: 1760.47 | bwd_inner_microstep: 1760.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4298 [2024-06-10 19:10:08,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.07 | bwd_microstep: 1679.24 | bwd_inner_microstep: 1679.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3766 [2024-06-10 19:10:10,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.19 | bwd_microstep: 1603.91 | bwd_inner_microstep: 1603.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 19:10:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1384.27 | bwd_inner_microstep: 1384.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2206 [2024-06-10 19:10:13,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.63 | bwd_microstep: 959.55 | bwd_inner_microstep: 959.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 19:10:15,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1878 [2024-06-10 19:10:16,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.58 | bwd_microstep: 712.98 | bwd_inner_microstep: 712.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3406 [2024-06-10 19:10:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1309.62 | bwd_inner_microstep: 1309.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503 [2024-06-10 19:10:20,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.83 | bwd_microstep: 1328.33 | bwd_inner_microstep: 1328.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1915 [2024-06-10 19:10:21,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.47 | bwd_microstep: 879.09 | bwd_inner_microstep: 879.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3671 [2024-06-10 19:10:24,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.94 | bwd_microstep: 1664.26 | bwd_inner_microstep: 1664.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 19:10:25,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1340.62 | bwd_inner_microstep: 1340.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 19:10:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1251.74 | bwd_inner_microstep: 1251.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 19:10:29,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1614.10 | bwd_inner_microstep: 1614.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3398 [2024-06-10 19:10:31,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1439.70 | bwd_inner_microstep: 1439.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873 [2024-06-10 19:10:34,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.26 | bwd_microstep: 1583.81 | bwd_inner_microstep: 1583.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 19:10:35,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.76 | bwd_microstep: 1375.79 | bwd_inner_microstep: 1375.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3668 [2024-06-10 19:10:38,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.77 | bwd_microstep: 1487.49 | bwd_inner_microstep: 1487.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2158 [2024-06-10 19:10:39,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.74 | bwd_microstep: 758.89 | bwd_inner_microstep: 758.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 19:10:41,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.81 | bwd_microstep: 1657.13 | bwd_inner_microstep: 1657.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-10 19:10:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.68 | bwd_microstep: 1427.75 | bwd_inner_microstep: 1427.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 19:10:45,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1283.70 | bwd_inner_microstep: 1283.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 19:10:47,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.64 | bwd_microstep: 1556.51 | bwd_inner_microstep: 1556.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.28 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 19:10:49,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.87 | bwd_microstep: 1494.92 | bwd_inner_microstep: 1494.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4058 [2024-06-10 19:10:51,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.73 | bwd_microstep: 1621.99 | bwd_inner_microstep: 1621.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 19:10:53,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1286.22 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547 [2024-06-10 19:10:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1519.88 | bwd_inner_microstep: 1519.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3598 [2024-06-10 19:10:57,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.62 | bwd_microstep: 1701.49 | bwd_inner_microstep: 1701.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2970 [2024-06-10 19:10:59,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.59 | bwd_microstep: 1201.21 | bwd_inner_microstep: 1201.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3413 [2024-06-10 19:11:01,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.76 | bwd_microstep: 1308.26 | bwd_inner_microstep: 1308.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2264 [2024-06-10 19:11:03,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 19:11:03,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.29 | bwd_microstep: 2356.07 | bwd_inner_microstep: 1028.12 | bwd_allreduce_microstep: 1327.90 | step_microstep: 37.98 [2024-06-10 19:11:03,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16492.77 | bwd: 45603.93 | bwd_inner: 44275.13 | bwd_allreduce: 1328.13 | step: 41.78 {'loss': 1.1883, 'learning_rate': 1.3300870092528607e-05, 'epoch': 0.62} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 19:11:05,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 794.36 | bwd_inner_microstep: 794.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3964 [2024-06-10 19:11:07,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.43 | bwd_microstep: 1599.05 | bwd_inner_microstep: 1599.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3927 [2024-06-10 19:11:09,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1392.51 | bwd_inner_microstep: 1392.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 19:11:11,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1479.92 | bwd_inner_microstep: 1479.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 19:11:13,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1276.84 | bwd_inner_microstep: 1276.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1860 [2024-06-10 19:11:13,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 677.74 | bwd_inner_microstep: 677.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433 [2024-06-10 19:11:15,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.45 | bwd_microstep: 1312.26 | bwd_inner_microstep: 1312.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 19:11:17,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1251.16 | bwd_inner_microstep: 1251.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 19:11:19,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 19:11:21,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 19:11:23,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2404 [2024-06-10 19:11:24,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.97 | bwd_microstep: 871.68 | bwd_inner_microstep: 871.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 19:11:26,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1320.78 | bwd_inner_microstep: 1320.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1985 [2024-06-10 19:11:27,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 784.36 | bwd_inner_microstep: 784.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 19:11:29,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 1353.31 | bwd_inner_microstep: 1353.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 19:11:31,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1490.90 | bwd_inner_microstep: 1490.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3665 [2024-06-10 19:11:33,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1373.38 | bwd_inner_microstep: 1373.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652 [2024-06-10 19:11:35,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.91 | bwd_microstep: 1445.66 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 19:11:37,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1437.13 | bwd_inner_microstep: 1437.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 19:11:38,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1295.52 | bwd_inner_microstep: 1295.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829 [2024-06-10 19:11:40,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.17 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608 [2024-06-10 19:11:42,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.15 | bwd_microstep: 1215.27 | bwd_inner_microstep: 1215.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994 [2024-06-10 19:11:43,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.29 | bwd_microstep: 835.74 | bwd_inner_microstep: 835.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1908 [2024-06-10 19:11:44,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.32 | bwd_microstep: 733.63 | bwd_inner_microstep: 733.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2449 [2024-06-10 19:11:46,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.37 | bwd_microstep: 1110.49 | bwd_inner_microstep: 1110.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2060 [2024-06-10 19:11:47,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.72 | bwd_microstep: 1009.90 | bwd_inner_microstep: 1009.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 19:11:48,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.88 | bwd_inner_microstep: 792.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3777 [2024-06-10 19:11:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.41 | bwd_microstep: 1815.67 | bwd_inner_microstep: 1815.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3766 [2024-06-10 19:11:53,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.18 | bwd_microstep: 1740.19 | bwd_inner_microstep: 1740.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 19:11:55,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1453.42 | bwd_inner_microstep: 1453.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800 [2024-06-10 19:11:57,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 1645.56 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 19:12:04,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-10 19:12:04,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 6036.77 | bwd_inner_microstep: 1526.23 | bwd_allreduce_microstep: 4510.49 | step_microstep: 38.45 [2024-06-10 19:12:04,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15078.38 | bwd: 44927.21 | bwd_inner: 40415.72 | bwd_allreduce: 4510.76 | step: 39.96 {'loss': 1.2107, 'learning_rate': 1.3265516150130577e-05, 'epoch': 0.62} █████▏ | 1067/1726 [18:28:50<20:59:39, 114.69s/it] 62%|██████▏ | 1068/1726 [18:30:38<20:35:35, 112.67s/it] 62%|██████▏ | 1068/1726 [18:30:38<20:35:35, 112.67s/it] 62%|██████▏ | 1069/1726 [18:31:39<17:41:17, 96.92s/it] 62%|██████▏ | 1069/1726 [18:31:39<17:41:17, 96.92s/it] 62%|██████▏ | 1070/1726 [18:32:38<15:35:55, 85.60s/it] 62%|██████▏ | 1070/1726 [18:32:38<15:35:55, 85.60s/it] 62%|██████▏ | 1071/1726 [18:33:40<14:18:39, 78.66s/it] 62%|██████▏ | 1071/1726 [18:33:40<14:18:39, 78.66s/it] 62%|██████▏ | 1072/1726 [18:34:41<13:17:27, 73.16s/it] 62%|██�dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 19:12:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.87 | bwd_microstep: 788.81 | bwd_inner_microstep: 788.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-10 19:12:06,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.77 | bwd_microstep: 676.98 | bwd_inner_microstep: 676.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848 [2024-06-10 19:12:08,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1461.39 | bwd_inner_microstep: 1461.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 19:12:10,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.61 | bwd_microstep: 1450.36 | bwd_inner_microstep: 1450.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755 [2024-06-10 19:12:12,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.73 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 19:12:14,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1247.01 | bwd_inner_microstep: 1246.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 19:12:16,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1344.24 | bwd_inner_microstep: 1344.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-10 19:12:17,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.40 | bwd_microstep: 701.01 | bwd_inner_microstep: 700.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 19:12:18,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.96 | bwd_microstep: 1344.08 | bwd_inner_microstep: 1344.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888 [2024-06-10 19:12:20,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.92 | bwd_microstep: 1090.02 | bwd_inner_microstep: 1089.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 19:12:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.04 | bwd_microstep: 1524.08 | bwd_inner_microstep: 1524.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512 [2024-06-10 19:12:24,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1436.80 | bwd_inner_microstep: 1436.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510 [2024-06-10 19:12:26,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.13 | bwd_microstep: 1584.46 | bwd_inner_microstep: 1584.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 19:12:28,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.65 | bwd_microstep: 1254.26 | bwd_inner_microstep: 1254.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-10 19:12:30,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1425.05 | bwd_inner_microstep: 1425.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512 [2024-06-10 19:12:32,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.58 | bwd_microstep: 1452.68 | bwd_inner_microstep: 1452.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825 [2024-06-10 19:12:34,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.14 | bwd_microstep: 1684.33 | bwd_inner_microstep: 1684.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 19:12:36,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.03 | bwd_microstep: 1167.01 | bwd_inner_microstep: 1166.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2117 [2024-06-10 19:12:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.29 | bwd_microstep: 955.76 | bwd_inner_microstep: 955.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519 [2024-06-10 19:12:39,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1448.26 | bwd_inner_microstep: 1448.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3616 [2024-06-10 19:12:41,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.86 | bwd_microstep: 1582.14 | bwd_inner_microstep: 1582.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3837 [2024-06-10 19:12:43,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1320.89 | bwd_inner_microstep: 1320.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-10 19:12:45,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.83 | bwd_microstep: 978.69 | bwd_inner_microstep: 978.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 19:12:46,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.39 | bwd_microstep: 975.07 | bwd_inner_microstep: 975.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695 [2024-06-10 19:12:48,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1329.27 | bwd_inner_microstep: 1329.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-10 19:12:50,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.69 | bwd_microstep: 1440.35 | bwd_inner_microstep: 1440.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 19:12:52,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1386.25 | bwd_inner_microstep: 1386.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2510 [2024-06-10 19:12:53,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.04 | bwd_microstep: 932.87 | bwd_inner_microstep: 932.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766 [2024-06-10 19:12:55,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.64 | bwd_microstep: 1449.76 | bwd_inner_microstep: 1449.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 19:12:57,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1296.05 | bwd_inner_microstep: 1296.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-10 19:12:59,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1438.29 | bwd_inner_microstep: 1438.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3561 [2024-06-10 19:13:07,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 19:13:07,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.65 | bwd_microstep: 8071.91 | bwd_inner_microstep: 1741.99 | bwd_allreduce_microstep: 6329.88 | step_microstep: 37.85 [2024-06-10 19:13:07,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15452.31 | bwd: 47773.94 | bwd_inner: 41443.13 | bwd_allreduce: 6330.11 | step: 39.31 {'loss': 1.2141, 'learning_rate': 1.3230185926549654e-05, 'epoch': 0.62} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 19:13:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1276.35 | bwd_inner_microstep: 1276.23 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 19:13:11,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.02 | bwd_microstep: 1236.80 | bwd_inner_microstep: 1236.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 19:13:13,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1379.96 | bwd_inner_microstep: 1379.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 19:13:15,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1342.75 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 19:13:16,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.18 | bwd_microstep: 1243.62 | bwd_inner_microstep: 1243.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 19:13:18,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1376.62 | bwd_inner_microstep: 1376.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 19:13:20,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1280.11 | bwd_inner_microstep: 1280.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 19:13:22,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.98 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-10 19:13:24,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1309.36 | bwd_inner_microstep: 1309.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 19:13:26,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.61 | bwd_microstep: 1552.64 | bwd_inner_microstep: 1552.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2030 [2024-06-10 19:13:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.89 | bwd_microstep: 851.93 | bwd_inner_microstep: 851.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937 [2024-06-10 19:13:28,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.33 | bwd_microstep: 757.62 | bwd_inner_microstep: 757.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1967 [2024-06-10 19:13:29,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.11 | bwd_microstep: 856.48 | bwd_inner_microstep: 856.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 19:13:31,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1343.63 | bwd_inner_microstep: 1343.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3657 [2024-06-10 19:13:33,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1556.41 | bwd_inner_microstep: 1556.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2718 [2024-06-10 19:13:34,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.90 | bwd_microstep: 939.70 | bwd_inner_microstep: 939.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 19:13:36,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.07 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463 [2024-06-10 19:13:38,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1311.43 | bwd_inner_microstep: 1311.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 19:13:40,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1490.91 | bwd_inner_microstep: 1490.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2218 [2024-06-10 19:13:41,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.63 | bwd_microstep: 863.51 | bwd_inner_microstep: 863.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 19:13:43,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1348.22 | bwd_inner_microstep: 1348.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 19:13:45,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.12 | bwd_microstep: 1660.27 | bwd_inner_microstep: 1660.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 19:13:48,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.16 | bwd_microstep: 1499.34 | bwd_inner_microstep: 1499.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272 [2024-06-10 19:13:49,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.05 | bwd_microstep: 1003.40 | bwd_inner_microstep: 1003.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3554 [2024-06-10 19:13:51,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.65 | bwd_microstep: 1202.56 | bwd_inner_microstep: 1202.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 19:13:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1406.13 | bwd_inner_microstep: 1406.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834 [2024-06-10 19:13:54,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.76 | bwd_microstep: 1358.87 | bwd_inner_microstep: 1358.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281 [2024-06-10 19:13:56,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.27 | bwd_microstep: 909.28 | bwd_inner_microstep: 909.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664 [2024-06-10 19:13:58,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.75 | bwd_microstep: 1476.63 | bwd_inner_microstep: 1476.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 19:13:59,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3730 [2024-06-10 19:14:02,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.40 | bwd_microstep: 1463.43 | bwd_inner_microstep: 1463.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 19:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 19:14:07,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.22 | bwd_microstep: 4379.49 | bwd_inner_microstep: 1745.73 | bwd_allreduce_microstep: 2633.70 | step_microstep: 38.04 [2024-06-10 19:14:07,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15291.32 | bwd: 43524.33 | bwd_inner: 40889.63 | bwd_allreduce: 2633.97 | step: 39.52 {'loss': 1.2215, 'learning_rate': 1.3194879546218709e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 19:14:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.95 | bwd_microstep: 1336.28 | bwd_inner_microstep: 1336.16 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-10 19:14:10,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1477.95 | bwd_inner_microstep: 1477.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 19:14:12,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1342.93 | bwd_inner_microstep: 1342.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397 [2024-06-10 19:14:14,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.17 | bwd_microstep: 1146.53 | bwd_inner_microstep: 1146.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 19:14:15,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 792.81 | bwd_inner_microstep: 792.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 19:14:17,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 19:14:19,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.14 | bwd_microstep: 1385.41 | bwd_inner_microstep: 1385.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 19:14:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1379.61 | bwd_inner_microstep: 1379.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 19:14:22,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3479 [2024-06-10 19:14:24,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.99 | bwd_microstep: 1185.80 | bwd_inner_microstep: 1185.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 19:14:26,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.70 | bwd_microstep: 1387.22 | bwd_inner_microstep: 1387.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3710 [2024-06-10 19:14:28,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.29 | bwd_microstep: 1471.85 | bwd_inner_microstep: 1471.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 19:14:30,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 19:14:32,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.79 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1902 [2024-06-10 19:14:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.35 | bwd_microstep: 807.46 | bwd_inner_microstep: 807.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513 [2024-06-10 19:14:35,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.53 | bwd_microstep: 1510.89 | bwd_inner_microstep: 1510.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3550 [2024-06-10 19:14:37,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.67 | bwd_microstep: 1440.99 | bwd_inner_microstep: 1440.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3833 [2024-06-10 19:14:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.53 | bwd_microstep: 1689.37 | bwd_inner_microstep: 1689.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 19:14:42,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1491.09 | bwd_inner_microstep: 1491.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 19:14:43,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1390.82 | bwd_inner_microstep: 1390.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815 [2024-06-10 19:14:46,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.87 | bwd_microstep: 1511.14 | bwd_inner_microstep: 1511.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3520 [2024-06-10 19:14:47,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.78 | bwd_microstep: 1192.23 | bwd_inner_microstep: 1192.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 19:14:49,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1351.49 | bwd_inner_microstep: 1351.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 19:14:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1381.13 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 19:14:53,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1295.52 | bwd_inner_microstep: 1295.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463 [2024-06-10 19:14:55,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.21 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-10 19:14:56,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.17 | bwd_microstep: 969.91 | bwd_inner_microstep: 969.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3603 [2024-06-10 19:14:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 1703.86 | bwd_inner_microstep: 1703.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-10 19:15:00,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1434.89 | bwd_inner_microstep: 1434.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4111 [2024-06-10 19:15:03,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.16 | bwd_microstep: 1745.75 | bwd_inner_microstep: 1745.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 19:15:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1395.92 | bwd_inner_microstep: 1395.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 19:15:07,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.14 | optimizer_step: 6.57 [2024-06-10 19:15:07,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.86 | bwd_microstep: 2012.51 | bwd_inner_microstep: 1525.41 | bwd_allreduce_microstep: 487.05 | step_microstep: 37.52 [2024-06-10 19:15:07,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16334.67 | bwd: 44164.75 | bwd_inner: 43676.70 | bwd_allreduce: 487.35 | step: 39.10 {'loss': 1.1837, 'learning_rate': 1.3159597133486628e-05, 'epoch': 0.62} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 19:15:09,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.84 | bwd_microstep: 1474.85 | bwd_inner_microstep: 1474.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401 [2024-06-10 19:15:11,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.26 | bwd_microstep: 1178.14 | bwd_inner_microstep: 1178.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 19:15:13,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.01 | bwd_microstep: 1552.94 | bwd_inner_microstep: 1552.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 19:15:15,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1244.60 | bwd_inner_microstep: 1244.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 19:15:17,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1382.41 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 19:15:19,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876 [2024-06-10 19:15:19,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.75 | bwd_microstep: 679.24 | bwd_inner_microstep: 679.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 19:15:21,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1281.70 | bwd_inner_microstep: 1281.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706 [2024-06-10 19:15:23,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.36 | bwd_microstep: 1618.02 | bwd_inner_microstep: 1618.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 19:15:25,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.86 | bwd_microstep: 1450.04 | bwd_inner_microstep: 1450.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 19:15:27,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.20 | bwd_microstep: 1351.78 | bwd_inner_microstep: 1351.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 19:15:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3662 [2024-06-10 19:15:31,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.05 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-10 19:15:34,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1613.56 | bwd_inner_microstep: 1613.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-10 19:15:36,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1575.50 | bwd_inner_microstep: 1575.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 19:15:38,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.51 | bwd_microstep: 1498.30 | bwd_inner_microstep: 1498.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3512 [2024-06-10 19:15:40,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.98 | bwd_microstep: 1437.90 | bwd_inner_microstep: 1437.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 19:15:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1393.91 | bwd_inner_microstep: 1393.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 19:15:44,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1487.99 | bwd_inner_microstep: 1487.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-10 19:15:46,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.71 | bwd_microstep: 1308.81 | bwd_inner_microstep: 1308.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 19:15:48,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.42 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-10 19:15:50,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.00 | bwd_microstep: 1442.63 | bwd_inner_microstep: 1442.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 19:15:52,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.71 | bwd_microstep: 1501.72 | bwd_inner_microstep: 1501.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 19:15:54,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1611.58 | bwd_inner_microstep: 1611.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563 [2024-06-10 19:15:56,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1428.94 | bwd_inner_microstep: 1428.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 19:15:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1662.80 | bwd_inner_microstep: 1662.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516 [2024-06-10 19:16:00,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.48 | bwd_microstep: 1192.55 | bwd_inner_microstep: 1192.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 19:16:02,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1609.51 | bwd_inner_microstep: 1609.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 19:16:04,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1442.31 | bwd_inner_microstep: 1442.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 19:16:06,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.87 | bwd_microstep: 1651.70 | bwd_inner_microstep: 1651.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 19:16:08,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 19:16:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 19:16:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1415.77 | bwd_inner_microstep: 1408.13 | bwd_allreduce_microstep: 7.59 | step_microstep: 37.61 [2024-06-10 19:16:10,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16977.60 | bwd: 45574.23 | bwd_inner: 45565.75 | bwd_allreduce: 7.82 | step: 39.04 {'loss': 1.1564, 'learning_rate': 1.3124338812617881e-05, 'epoch': 0.62} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012 [2024-06-10 19:16:11,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.16 | bwd_microstep: 799.96 | bwd_inner_microstep: 799.90 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 19:16:13,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1385.43 | bwd_inner_microstep: 1385.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850 [2024-06-10 19:16:15,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.61 | bwd_microstep: 1563.25 | bwd_inner_microstep: 1563.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-10 19:16:18,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.71 | bwd_microstep: 1654.25 | bwd_inner_microstep: 1654.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 19:16:20,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1441.39 | bwd_inner_microstep: 1441.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-10 19:16:22,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.18 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 19:16:24,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1433.86 | bwd_inner_microstep: 1433.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 19:16:25,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873 [2024-06-10 19:16:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.13 | bwd_microstep: 678.48 | bwd_inner_microstep: 678.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 19:16:28,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.46 | bwd_microstep: 1286.68 | bwd_inner_microstep: 1286.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 19:16:30,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.56 | bwd_microstep: 1283.81 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2086 [2024-06-10 19:16:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.61 | bwd_microstep: 881.21 | bwd_inner_microstep: 881.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 19:16:33,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-10 19:16:35,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1451.81 | bwd_inner_microstep: 1451.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942 [2024-06-10 19:16:36,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.00 | bwd_microstep: 888.95 | bwd_inner_microstep: 888.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568 [2024-06-10 19:16:38,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1362.86 | bwd_inner_microstep: 1362.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635 [2024-06-10 19:16:40,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1408.54 | bwd_inner_microstep: 1408.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 19:16:42,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.29 | bwd_microstep: 1394.37 | bwd_inner_microstep: 1394.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 19:16:44,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.61 | bwd_microstep: 1406.61 | bwd_inner_microstep: 1406.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 19:16:46,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1357.35 | bwd_inner_microstep: 1357.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-10 19:16:48,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.16 | bwd_microstep: 1632.51 | bwd_inner_microstep: 1632.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440 [2024-06-10 19:16:50,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.66 | bwd_microstep: 1155.69 | bwd_inner_microstep: 1155.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3695 [2024-06-10 19:16:52,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.82 | bwd_microstep: 1724.49 | bwd_inner_microstep: 1724.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 19:16:54,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1661.66 | bwd_inner_microstep: 1661.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 19:16:56,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1302.44 | bwd_inner_microstep: 1302.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1928 [2024-06-10 19:16:57,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.64 | bwd_microstep: 761.12 | bwd_inner_microstep: 761.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-10 19:16:59,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.95 | bwd_microstep: 1306.10 | bwd_inner_microstep: 1306.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3700 [2024-06-10 19:17:01,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2672 [2024-06-10 19:17:03,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.06 | bwd_microstep: 1152.99 | bwd_inner_microstep: 1152.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585 [2024-06-10 19:17:05,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1601.09 | bwd_inner_microstep: 1601.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 19:17:07,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 19:17:12,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.34 | optimizer_step: 6.60 [2024-06-10 19:17:12,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 5064.44 | bwd_inner_microstep: 1864.27 | bwd_allreduce_microstep: 3200.10 | step_microstep: 38.69 [2024-06-10 19:17:12,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15865.08 | bwd: 45888.37 | bwd_inner: 42687.29 | bwd_allreduce: 3200.36 | step: 40.15 {'loss': 1.2133, 'learning_rate': 1.308910470779209e-05, 'epoch': 0.62} ��███▏ | 1072/1726 [18:34:41<13:17:27, 73.16s/it] 62%|██████▏ | 1073/1726 [18:35:44<12:44:51, 70.28s/it] 62%|██████▏ | 1073/1726 [18:35:44<12:44:51, 70.28s/it] 62%|██████▏ | 1074/1726 [18:36:43<12:07:22, 66.94s/it] 62%|██████▏ | 1074/1726 [18:36:43<12:07:22, 66.94s/it] 62%|██████▏ | 1075/1726 [18:37:44<11:46:22, 65.10s/it] 62%|██████▏ | 1075/1726 [18:37:44<11:46:22, 65.10s/it] 62%|██████▏ | 1076/1726 [18:38:47<11:38:05, 64.44s/it] 62%|██████▏ | 1076/1726 [18:38:47<11:38:05, 64.44s/it] 62%|██████▏ | 1077/1726 [18:39:49<11:29:23, 63.73s/it] 62%|█████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 19:17:14,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1334.46 | bwd_inner_microstep: 1334.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915 [2024-06-10 19:17:16,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.24 | bwd_microstep: 1584.79 | bwd_inner_microstep: 1584.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3863 [2024-06-10 19:17:18,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 19:17:20,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1375.55 | bwd_inner_microstep: 1375.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753 [2024-06-10 19:17:22,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 19:17:24,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1252.54 | bwd_inner_microstep: 1252.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 19:17:26,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.04 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 19:17:28,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1295.36 | bwd_inner_microstep: 1295.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 19:17:30,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1149.82 | bwd_inner_microstep: 1149.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 19:17:31,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-10 19:17:34,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.14 | bwd_microstep: 1524.12 | bwd_inner_microstep: 1524.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 19:17:35,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1278.96 | bwd_inner_microstep: 1278.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3642 [2024-06-10 19:17:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1436.32 | bwd_inner_microstep: 1436.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-10 19:17:39,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1416.10 | bwd_inner_microstep: 1416.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1190 [2024-06-10 19:17:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 177.39 | bwd_microstep: 458.63 | bwd_inner_microstep: 458.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-10 19:17:42,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.63 | bwd_microstep: 1520.82 | bwd_inner_microstep: 1520.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 19:17:44,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.72 | bwd_microstep: 1485.65 | bwd_inner_microstep: 1485.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962 [2024-06-10 19:17:45,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.93 | bwd_microstep: 889.47 | bwd_inner_microstep: 889.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453 [2024-06-10 19:17:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1320.32 | bwd_inner_microstep: 1320.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 19:17:49,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.55 | bwd_microstep: 1261.35 | bwd_inner_microstep: 1261.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831 [2024-06-10 19:17:51,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.89 | bwd_microstep: 1389.68 | bwd_inner_microstep: 1389.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533 [2024-06-10 19:17:53,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1325.91 | bwd_inner_microstep: 1325.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 19:17:55,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.97 | bwd_microstep: 1485.38 | bwd_inner_microstep: 1485.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3595 [2024-06-10 19:17:57,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.86 | bwd_microstep: 1702.06 | bwd_inner_microstep: 1702.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-10 19:17:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.95 | bwd_inner_microstep: 1479.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-10 19:18:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1300.92 | bwd_inner_microstep: 1300.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3582 [2024-06-10 19:18:03,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1365.50 | bwd_inner_microstep: 1365.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 19:18:05,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.70 | bwd_microstep: 1509.84 | bwd_inner_microstep: 1509.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3649 [2024-06-10 19:18:07,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.03 | bwd_microstep: 1583.04 | bwd_inner_microstep: 1583.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-10 19:18:09,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.87 | bwd_microstep: 1461.11 | bwd_inner_microstep: 1461.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 19:18:11,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.74 | bwd_microstep: 1636.89 | bwd_inner_microstep: 1636.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 19:18:14,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.15 | optimizer_step: 6.63 [2024-06-10 19:18:14,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 2434.98 | bwd_inner_microstep: 1692.27 | bwd_allreduce_microstep: 742.64 | step_microstep: 37.76 [2024-06-10 19:18:14,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16536.36 | bwd: 45088.17 | bwd_inner: 44344.62 | bwd_allreduce: 742.86 | step: 39.25 {'loss': 1.2284, 'learning_rate': 1.3053894943103598e-05, 'epoch': 0.62} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 19:18:16,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1384.33 | bwd_inner_microstep: 1384.26 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908 [2024-06-10 19:18:18,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.76 | bwd_microstep: 1588.89 | bwd_inner_microstep: 1588.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 19:18:20,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.40 | bwd_microstep: 1339.50 | bwd_inner_microstep: 1339.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 19:18:22,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.26 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-10 19:18:24,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.45 | bwd_microstep: 1528.65 | bwd_inner_microstep: 1528.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 19:18:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1349.98 | bwd_inner_microstep: 1349.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-10 19:18:28,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1536.49 | bwd_inner_microstep: 1536.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 19:18:30,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1483.69 | bwd_inner_microstep: 1483.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-10 19:18:32,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 19:18:34,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.85 | bwd_microstep: 1474.78 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 19:18:36,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1342.02 | bwd_inner_microstep: 1341.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 19:18:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1526.84 | bwd_inner_microstep: 1526.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 19:18:40,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1246.35 | bwd_inner_microstep: 1246.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3536 [2024-06-10 19:18:42,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1520.73 | bwd_inner_microstep: 1520.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2092 [2024-06-10 19:18:43,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.08 | bwd_microstep: 758.10 | bwd_inner_microstep: 758.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 19:18:45,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 19:18:47,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.92 | bwd_microstep: 1458.98 | bwd_inner_microstep: 1458.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 19:18:49,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1297.64 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3924 [2024-06-10 19:18:51,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 19:18:52,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.11 | bwd_microstep: 798.23 | bwd_inner_microstep: 798.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292 [2024-06-10 19:18:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.60 | bwd_microstep: 974.61 | bwd_inner_microstep: 974.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 19:18:55,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3896 [2024-06-10 19:18:58,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1585.82 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-10 19:18:59,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.85 | bwd_microstep: 1307.00 | bwd_inner_microstep: 1306.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3437 [2024-06-10 19:19:01,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.55 | bwd_microstep: 1380.97 | bwd_inner_microstep: 1380.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 19:19:03,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1313.01 | bwd_inner_microstep: 1312.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 19:19:05,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1516.33 | bwd_inner_microstep: 1516.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3571 [2024-06-10 19:19:07,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572 [2024-06-10 19:19:09,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.37 | bwd_microstep: 1593.06 | bwd_inner_microstep: 1593.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598 [2024-06-10 19:19:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1597.47 | bwd_inner_microstep: 1597.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-10 19:19:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.23 | bwd_microstep: 1497.55 | bwd_inner_microstep: 1497.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 19:19:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-10 19:19:17,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.59 | bwd_microstep: 2659.86 | bwd_inner_microstep: 1935.59 | bwd_allreduce_microstep: 724.22 | step_microstep: 37.60 [2024-06-10 19:19:17,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16630.32 | bwd: 45562.59 | bwd_inner: 44837.41 | bwd_allreduce: 724.47 | step: 39.07 {'loss': 1.1892, 'learning_rate': 1.3018709642561e-05, 'epoch': 0.63} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-10 19:19:19,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1396.29 | bwd_inner_microstep: 1396.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 19:19:21,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1401.79 | bwd_inner_microstep: 1401.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 19:19:23,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.99 | bwd_inner_microstep: 1477.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1924 [2024-06-10 19:19:24,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 695.57 | bwd_inner_microstep: 695.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2379 [2024-06-10 19:19:25,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 961.99 | bwd_inner_microstep: 961.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794 [2024-06-10 19:19:27,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.19 | bwd_microstep: 1648.65 | bwd_inner_microstep: 1648.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 19:19:29,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3774 [2024-06-10 19:19:31,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 19:19:33,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1285.35 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101 [2024-06-10 19:19:34,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.11 | bwd_microstep: 824.07 | bwd_inner_microstep: 824.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 19:19:36,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.22 | bwd_inner_microstep: 1253.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498 [2024-06-10 19:19:38,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.77 | bwd_microstep: 1580.30 | bwd_inner_microstep: 1580.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481 [2024-06-10 19:19:40,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1571.32 | bwd_inner_microstep: 1571.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3508 [2024-06-10 19:19:42,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.54 | bwd_microstep: 1347.65 | bwd_inner_microstep: 1347.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 19:19:43,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1252.40 | bwd_inner_microstep: 1252.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 19:19:45,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1398.15 | bwd_inner_microstep: 1398.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650 [2024-06-10 19:19:47,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 1416.95 | bwd_inner_microstep: 1416.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-10 19:19:49,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1417.44 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 19:19:51,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.91 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-10 19:19:53,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1433.46 | bwd_inner_microstep: 1433.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 19:19:54,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 797.48 | bwd_inner_microstep: 797.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178 [2024-06-10 19:19:55,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.27 | bwd_microstep: 859.90 | bwd_inner_microstep: 859.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1939 [2024-06-10 19:19:56,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.67 | bwd_microstep: 729.43 | bwd_inner_microstep: 729.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3832 [2024-06-10 19:19:58,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1360.08 | bwd_inner_microstep: 1360.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-10 19:20:00,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.73 | bwd_microstep: 1498.97 | bwd_inner_microstep: 1498.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-10 19:20:02,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.00 | bwd_microstep: 1006.35 | bwd_inner_microstep: 1006.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049 [2024-06-10 19:20:03,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.86 | bwd_microstep: 812.72 | bwd_inner_microstep: 812.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2235 [2024-06-10 19:20:04,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.16 | bwd_microstep: 837.17 | bwd_inner_microstep: 837.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 19:20:06,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1351.56 | bwd_inner_microstep: 1351.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2112 [2024-06-10 19:20:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.64 | bwd_microstep: 1018.91 | bwd_inner_microstep: 1018.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-10 19:20:09,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.71 | bwd_microstep: 1454.45 | bwd_inner_microstep: 1454.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3797 [2024-06-10 19:20:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 19:20:18,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.05 | bwd_microstep: 8473.61 | bwd_inner_microstep: 1827.22 | bwd_allreduce_microstep: 6646.33 | step_microstep: 38.47 [2024-06-10 19:20:18,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14863.13 | bwd: 46501.62 | bwd_inner: 39854.38 | bwd_allreduce: 6646.57 | step: 39.95 {'loss': 1.2096, 'learning_rate': 1.2983548930086757e-05, 'epoch': 0.63} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-10 19:20:20,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1447.53 | bwd_inner_microstep: 1447.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938 [2024-06-10 19:20:21,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.06 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423 [2024-06-10 19:20:23,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.30 | bwd_microstep: 1150.73 | bwd_inner_microstep: 1150.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4133 [2024-06-10 19:20:25,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1535.31 | bwd_inner_microstep: 1535.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1869 [2024-06-10 19:20:26,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.51 | bwd_microstep: 741.62 | bwd_inner_microstep: 741.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 19:20:28,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.51 | bwd_microstep: 1644.81 | bwd_inner_microstep: 1644.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 19:20:30,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.20 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 19:20:32,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1378.55 | bwd_inner_microstep: 1378.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422 [2024-06-10 19:20:34,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.37 | bwd_microstep: 1280.09 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756 [2024-06-10 19:20:36,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.73 | bwd_microstep: 1628.78 | bwd_inner_microstep: 1628.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 19:20:38,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.27 | bwd_microstep: 1343.28 | bwd_inner_microstep: 1343.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1888 [2024-06-10 19:20:39,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 835.80 | bwd_inner_microstep: 835.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 19:20:41,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1476.56 | bwd_inner_microstep: 1476.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 19:20:43,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1336.11 | bwd_inner_microstep: 1336.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3663 [2024-06-10 19:20:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.20 | bwd_microstep: 1547.74 | bwd_inner_microstep: 1547.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 19:20:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.62 | bwd_microstep: 1387.50 | bwd_inner_microstep: 1387.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-10 19:20:49,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.67 | bwd_microstep: 1596.63 | bwd_inner_microstep: 1596.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-10 19:20:52,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1602.73 | bwd_inner_microstep: 1602.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523 [2024-06-10 19:20:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1453.31 | bwd_inner_microstep: 1453.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3681 [2024-06-10 19:20:55,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1286.18 | bwd_inner_microstep: 1286.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3545 [2024-06-10 19:20:57,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1545.60 | bwd_inner_microstep: 1545.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-10 19:20:59,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.70 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2944 [2024-06-10 19:21:01,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.20 | bwd_microstep: 1006.99 | bwd_inner_microstep: 1006.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 19:21:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1399.75 | bwd_inner_microstep: 1399.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563 [2024-06-10 19:21:04,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.32 | bwd_microstep: 1332.58 | bwd_inner_microstep: 1332.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 19:21:07,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1608.96 | bwd_inner_microstep: 1608.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-10 19:21:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.60 | bwd_microstep: 973.00 | bwd_inner_microstep: 972.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3698 [2024-06-10 19:21:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.39 | bwd_microstep: 1533.47 | bwd_inner_microstep: 1533.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 19:21:12,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1432.78 | bwd_inner_microstep: 1432.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 842 [2024-06-10 19:21:13,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.41 | bwd_microstep: 345.41 | bwd_inner_microstep: 345.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756 [2024-06-10 19:21:15,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1536.70 | bwd_inner_microstep: 1536.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585 [2024-06-10 19:21:20,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.58 [2024-06-10 19:21:20,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 4298.69 | bwd_inner_microstep: 1499.19 | bwd_allreduce_microstep: 2799.45 | step_microstep: 37.87 [2024-06-10 19:21:20,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.00 | bwd: 45013.83 | bwd_inner: 42213.46 | bwd_allreduce: 2799.67 | step: 39.39 {'loss': 1.191, 'learning_rate': 1.2948412929516703e-05, 'epoch': 0.63} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 19:21:21,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1332.67 | bwd_inner_microstep: 1332.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 19:21:23,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.58 | bwd_inner_microstep: 1249.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294 [2024-06-10 19:21:24,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.55 | bwd_microstep: 875.17 | bwd_inner_microstep: 875.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 19:21:26,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1244.60 | bwd_inner_microstep: 1244.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 19:21:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.75 | bwd_microstep: 1538.99 | bwd_inner_microstep: 1538.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 19:21:30,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1245.69 | bwd_inner_microstep: 1245.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 19:21:32,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.03 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 19:21:34,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1251.90 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2433 [2024-06-10 19:21:35,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.26 | bwd_microstep: 946.47 | bwd_inner_microstep: 946.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-10 19:21:37,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.33 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-10 19:21:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 789.73 | bwd_inner_microstep: 789.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3412 [2024-06-10 19:21:40,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1407.24 | bwd_inner_microstep: 1407.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691 [2024-06-10 19:21:42,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.72 | bwd_microstep: 1721.36 | bwd_inner_microstep: 1721.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2019 [2024-06-10 19:21:43,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.70 | bwd_microstep: 743.55 | bwd_inner_microstep: 743.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 19:21:45,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-10 19:21:47,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.00 | bwd_microstep: 1613.75 | bwd_inner_microstep: 1613.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 19:21:49,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1282.17 | bwd_inner_microstep: 1282.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1959 [2024-06-10 19:21:50,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 731.94 | bwd_inner_microstep: 731.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-10 19:21:52,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.58 | bwd_microstep: 1186.61 | bwd_inner_microstep: 1186.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2450 [2024-06-10 19:21:53,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.09 | bwd_microstep: 978.10 | bwd_inner_microstep: 978.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-10 19:21:55,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1414.49 | bwd_inner_microstep: 1414.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2191 [2024-06-10 19:21:56,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.54 | bwd_microstep: 860.70 | bwd_inner_microstep: 860.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-10 19:21:58,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.57 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284 [2024-06-10 19:21:59,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.66 | bwd_microstep: 1069.54 | bwd_inner_microstep: 1069.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 19:22:02,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1496.28 | bwd_inner_microstep: 1496.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3814 [2024-06-10 19:22:04,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.27 | bwd_microstep: 1856.85 | bwd_inner_microstep: 1856.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 19:22:06,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.80 | bwd_microstep: 1355.06 | bwd_inner_microstep: 1355.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823 [2024-06-10 19:22:08,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.12 | bwd_microstep: 1417.21 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 19:22:10,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.03 | bwd_microstep: 1473.98 | bwd_inner_microstep: 1473.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2328 [2024-06-10 19:22:11,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.70 | bwd_microstep: 983.62 | bwd_inner_microstep: 983.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 19:22:13,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.33 | bwd_microstep: 1248.79 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044 [2024-06-10 19:22:21,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.10 | optimizer_step: 6.63 [2024-06-10 19:22:21,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.29 | bwd_microstep: 8097.93 | bwd_inner_microstep: 1059.11 | bwd_allreduce_microstep: 7038.76 | step_microstep: 38.03 [2024-06-10 19:22:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14803.38 | bwd: 46760.03 | bwd_inner: 39720.37 | bwd_allreduce: 7038.99 | step: 39.51 {'loss': 1.1777, 'learning_rate': 1.291330176459965e-05, 'epoch': 0.63} ��▏ | 1077/1726 [18:39:49<11:29:23, 63.73s/it] 62%|██████▏ | 1078/1726 [18:40:51<11:22:33, 63.20s/it] 62%|██████▏ | 1078/1726 [18:40:51<11:22:33, 63.20s/it] 63%|██████▎ | 1079/1726 [18:41:54<11:19:19, 63.00s/it] 63%|██████▎ | 1079/1726 [18:41:54<11:19:19, 63.00s/it] 63%|██████▎ | 1080/1726 [18:42:55<11:14:03, 62.61s/it] 63%|██████▎ | 1080/1726 [18:42:55<11:14:03, 62.61s/it] 63%|██████▎ | 1081/1726 [18:43:56<11:08:08, 62.15s/it] 63%|██████▎ | 1081/1726 [18:43:56<11:08:08, 62.15s/it] 63%|██████▎ | 1082/1726 [18:44:58<11:06:15, 62.07s/it] 63%|██████▎ |dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 19:22:23,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1437.19 | bwd_inner_microstep: 1437.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3974 [2024-06-10 19:22:26,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.42 | bwd_microstep: 1601.15 | bwd_inner_microstep: 1601.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4070 [2024-06-10 19:22:28,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.56 | bwd_microstep: 1552.42 | bwd_inner_microstep: 1552.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3514 [2024-06-10 19:22:30,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1334.37 | bwd_inner_microstep: 1334.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-10 19:22:31,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.24 | bwd_microstep: 1243.12 | bwd_inner_microstep: 1243.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389 [2024-06-10 19:22:33,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.83 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439 [2024-06-10 19:22:34,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.76 | bwd_microstep: 946.37 | bwd_inner_microstep: 946.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479 [2024-06-10 19:22:36,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.10 | bwd_microstep: 1215.24 | bwd_inner_microstep: 1215.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3697 [2024-06-10 19:22:38,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1482.33 | bwd_inner_microstep: 1482.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-10 19:22:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.43 | bwd_microstep: 1632.20 | bwd_inner_microstep: 1632.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3676 [2024-06-10 19:22:43,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.54 | bwd_microstep: 1717.37 | bwd_inner_microstep: 1717.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 19:22:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1400.55 | bwd_inner_microstep: 1400.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3560 [2024-06-10 19:22:47,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1456.64 | bwd_inner_microstep: 1456.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3453 [2024-06-10 19:22:49,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1477.41 | bwd_inner_microstep: 1477.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3688 [2024-06-10 19:22:51,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3663 [2024-06-10 19:22:53,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.09 | bwd_microstep: 1476.83 | bwd_inner_microstep: 1476.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 19:22:55,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1384.72 | bwd_inner_microstep: 1384.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 19:22:57,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-10 19:22:59,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.95 | bwd_microstep: 1608.64 | bwd_inner_microstep: 1608.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 19:23:01,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1397.05 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 19:23:03,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1252.26 | bwd_inner_microstep: 1252.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 19:23:05,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.04 | bwd_microstep: 1656.47 | bwd_inner_microstep: 1656.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 19:23:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827 [2024-06-10 19:23:09,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.58 | bwd_microstep: 1516.48 | bwd_inner_microstep: 1516.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 19:23:11,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1282.60 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 19:23:13,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942 [2024-06-10 19:23:13,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.05 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 19:23:16,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1498.27 | bwd_inner_microstep: 1498.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-10 19:23:17,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.79 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2410 [2024-06-10 19:23:19,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.48 | bwd_microstep: 1017.03 | bwd_inner_microstep: 1017.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 19:23:21,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1444.14 | bwd_inner_microstep: 1444.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3406 [2024-06-10 19:23:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.18 | optimizer_step: 6.61 [2024-06-10 19:23:23,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1543.50 | bwd_inner_microstep: 1535.82 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.78 [2024-06-10 19:23:23,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16639.09 | bwd: 44570.07 | bwd_inner: 44561.55 | bwd_allreduce: 7.86 | step: 39.34 {'loss': 1.2133, 'learning_rate': 1.2878215558996945e-05, 'epoch': 0.63} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 19:23:25,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1275.84 | bwd_inner_microstep: 1275.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4328 [2024-06-10 19:23:27,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.74 | bwd_microstep: 1634.16 | bwd_inner_microstep: 1634.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874 [2024-06-10 19:23:29,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.36 | bwd_microstep: 1581.53 | bwd_inner_microstep: 1581.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3482 [2024-06-10 19:23:31,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.52 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-10 19:23:32,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.73 | bwd_microstep: 697.47 | bwd_inner_microstep: 697.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 19:23:34,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.43 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 19:23:36,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1281.00 | bwd_inner_microstep: 1280.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 19:23:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.71 | bwd_microstep: 1342.42 | bwd_inner_microstep: 1342.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3737 [2024-06-10 19:23:40,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1382.07 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-10 19:23:42,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.02 | bwd_microstep: 1628.20 | bwd_inner_microstep: 1628.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 19:23:44,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1290.88 | bwd_inner_microstep: 1290.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680 [2024-06-10 19:23:46,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.11 | bwd_microstep: 1626.80 | bwd_inner_microstep: 1626.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 19:23:48,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1381.60 | bwd_inner_microstep: 1381.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225 [2024-06-10 19:23:49,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.64 | bwd_microstep: 963.55 | bwd_inner_microstep: 963.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 19:23:51,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1354.52 | bwd_inner_microstep: 1354.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3713 [2024-06-10 19:23:53,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1696.08 | bwd_inner_microstep: 1696.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 19:23:55,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-10 19:23:57,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.53 | bwd_microstep: 1462.94 | bwd_inner_microstep: 1462.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 19:23:59,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1293.09 | bwd_inner_microstep: 1293.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 19:24:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1298.71 | bwd_inner_microstep: 1298.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450 [2024-06-10 19:24:03,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612 [2024-06-10 19:24:05,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1535.36 | bwd_inner_microstep: 1535.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3721 [2024-06-10 19:24:07,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 19:24:08,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.45 | bwd_microstep: 1257.71 | bwd_inner_microstep: 1257.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821 [2024-06-10 19:24:10,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.86 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 19:24:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1487.56 | bwd_inner_microstep: 1487.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2785 [2024-06-10 19:24:14,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.06 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 19:24:16,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1557.32 | bwd_inner_microstep: 1557.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 19:24:18,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-10 19:24:20,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.00 | bwd_microstep: 1438.59 | bwd_inner_microstep: 1438.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820 [2024-06-10 19:24:22,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1584.56 | bwd_inner_microstep: 1584.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3811 [2024-06-10 19:24:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.94 | optimizer_gradients: 4.02 | optimizer_step: 6.62 [2024-06-10 19:24:25,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.79 | bwd_microstep: 2436.94 | bwd_inner_microstep: 2029.51 | bwd_allreduce_microstep: 407.38 | step_microstep: 37.61 [2024-06-10 19:24:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16668.87 | bwd: 45161.86 | bwd_inner: 44753.59 | bwd_allreduce: 407.61 | step: 39.06 {'loss': 1.1771, 'learning_rate': 1.2843154436282014e-05, 'epoch': 0.63} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-10 19:24:27,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.53 | bwd_microstep: 1146.76 | bwd_inner_microstep: 1146.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 19:24:29,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1247.17 | bwd_inner_microstep: 1247.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 19:24:30,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1381.70 | bwd_inner_microstep: 1381.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-10 19:24:32,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.34 | bwd_microstep: 809.76 | bwd_inner_microstep: 809.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 19:24:33,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 19:24:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1657.88 | bwd_inner_microstep: 1657.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 19:24:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1436.38 | bwd_inner_microstep: 1436.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3762 [2024-06-10 19:24:40,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1501.11 | bwd_inner_microstep: 1501.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-10 19:24:42,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.91 | bwd_microstep: 1309.07 | bwd_inner_microstep: 1309.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4061 [2024-06-10 19:24:44,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.55 | bwd_microstep: 1519.73 | bwd_inner_microstep: 1519.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954 [2024-06-10 19:24:45,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.42 | bwd_microstep: 732.45 | bwd_inner_microstep: 732.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-10 19:24:47,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1417.97 | bwd_inner_microstep: 1417.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 19:24:49,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2111 [2024-06-10 19:24:50,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.65 | bwd_microstep: 919.19 | bwd_inner_microstep: 919.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3578 [2024-06-10 19:24:52,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.29 | bwd_microstep: 1668.80 | bwd_inner_microstep: 1668.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-10 19:24:54,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.55 | bwd_microstep: 1610.48 | bwd_inner_microstep: 1610.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3004 [2024-06-10 19:24:56,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.42 | bwd_microstep: 1207.72 | bwd_inner_microstep: 1207.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-10 19:24:57,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.34 | bwd_microstep: 975.29 | bwd_inner_microstep: 975.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-10 19:24:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.57 | bwd_microstep: 980.89 | bwd_inner_microstep: 980.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-10 19:25:01,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1614.04 | bwd_inner_microstep: 1614.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 19:25:03,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1513.54 | bwd_inner_microstep: 1513.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 19:25:05,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.66 | bwd_microstep: 1190.74 | bwd_inner_microstep: 1190.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 19:25:07,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.78 | bwd_microstep: 1453.88 | bwd_inner_microstep: 1453.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606 [2024-06-10 19:25:09,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.85 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-10 19:25:11,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.94 | bwd_microstep: 1420.26 | bwd_inner_microstep: 1420.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444 [2024-06-10 19:25:13,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.14 | bwd_microstep: 1379.78 | bwd_inner_microstep: 1379.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 19:25:15,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.73 | bwd_microstep: 1648.73 | bwd_inner_microstep: 1648.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3779 [2024-06-10 19:25:17,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.74 | bwd_microstep: 1612.56 | bwd_inner_microstep: 1612.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110 [2024-06-10 19:25:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.18 | bwd_microstep: 825.01 | bwd_inner_microstep: 824.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-10 19:25:20,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.68 | bwd_microstep: 1602.95 | bwd_inner_microstep: 1602.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-10 19:25:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.04 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3480 [2024-06-10 19:25:27,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.06 | optimizer_step: 6.61 [2024-06-10 19:25:27,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 3883.58 | bwd_inner_microstep: 1780.90 | bwd_allreduce_microstep: 2102.63 | step_microstep: 37.56 [2024-06-10 19:25:27,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16044.24 | bwd: 45326.52 | bwd_inner: 43222.98 | bwd_allreduce: 2102.86 | step: 39.03 {'loss': 1.2323, 'learning_rate': 1.2808118519939965e-05, 'epoch': 0.63} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1977 [2024-06-10 19:25:28,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.96 | bwd_microstep: 886.73 | bwd_inner_microstep: 886.65 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-10 19:25:30,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.68 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3794 [2024-06-10 19:25:32,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.07 | bwd_microstep: 1346.27 | bwd_inner_microstep: 1346.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 19:25:34,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1378.70 | bwd_inner_microstep: 1378.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 19:25:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1292.75 | bwd_inner_microstep: 1292.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3741 [2024-06-10 19:25:38,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.66 | bwd_microstep: 1629.89 | bwd_inner_microstep: 1629.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420 [2024-06-10 19:25:40,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.93 | bwd_microstep: 1312.92 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3426 [2024-06-10 19:25:41,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1152.22 | bwd_inner_microstep: 1152.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 19:25:42,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.77 | bwd_microstep: 791.32 | bwd_inner_microstep: 791.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 19:25:44,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3399 [2024-06-10 19:25:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.32 | bwd_microstep: 1180.90 | bwd_inner_microstep: 1180.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1874 [2024-06-10 19:25:47,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.39 | bwd_microstep: 804.37 | bwd_inner_microstep: 804.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-10 19:25:49,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.64 | bwd_microstep: 1617.92 | bwd_inner_microstep: 1617.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651 [2024-06-10 19:25:51,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.25 | bwd_microstep: 1508.30 | bwd_inner_microstep: 1508.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 19:25:53,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1609.34 | bwd_inner_microstep: 1609.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 19:25:55,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 19:25:57,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1416.05 | bwd_inner_microstep: 1416.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 19:25:59,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1294.48 | bwd_inner_microstep: 1294.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 19:26:00,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 19:26:03,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.48 | bwd_microstep: 1658.76 | bwd_inner_microstep: 1658.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 19:26:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.52 | bwd_microstep: 1459.17 | bwd_inner_microstep: 1459.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538 [2024-06-10 19:26:07,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1422.96 | bwd_inner_microstep: 1422.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 19:26:09,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1490.86 | bwd_inner_microstep: 1490.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613 [2024-06-10 19:26:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.26 | bwd_microstep: 1609.75 | bwd_inner_microstep: 1609.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 19:26:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1245.39 | bwd_inner_microstep: 1245.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 19:26:15,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-10 19:26:17,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.40 | bwd_microstep: 1369.78 | bwd_inner_microstep: 1369.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2389 [2024-06-10 19:26:18,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.45 | bwd_microstep: 1129.15 | bwd_inner_microstep: 1129.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3599 [2024-06-10 19:26:20,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1246.04 | bwd_inner_microstep: 1246.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-10 19:26:21,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.83 | bwd_microstep: 803.96 | bwd_inner_microstep: 803.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 19:26:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.34 | bwd_inner_microstep: 1402.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778 [2024-06-10 19:26:26,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.19 | optimizer_step: 6.60 [2024-06-10 19:26:26,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.39 | bwd_microstep: 2739.31 | bwd_inner_microstep: 1976.77 | bwd_allreduce_microstep: 762.49 | step_microstep: 37.80 [2024-06-10 19:26:26,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15872.47 | bwd: 43353.58 | bwd_inner: 42590.13 | bwd_allreduce: 762.76 | step: 39.25 {'loss': 1.1982, 'learning_rate': 1.2773107933367093e-05, 'epoch': 0.63} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 19:26:28,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.14 | bwd_microstep: 782.66 | bwd_inner_microstep: 782.58 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060 [2024-06-10 19:26:29,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.74 | bwd_microstep: 812.89 | bwd_inner_microstep: 812.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3916 [2024-06-10 19:26:31,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1690.59 | bwd_inner_microstep: 1690.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 19:26:33,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1553.12 | bwd_inner_microstep: 1553.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762 [2024-06-10 19:26:35,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.47 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 19:26:37,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1283.07 | bwd_inner_microstep: 1283.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 19:26:38,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.76 | bwd_microstep: 793.31 | bwd_inner_microstep: 793.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 19:26:40,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1282.72 | bwd_inner_microstep: 1282.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 19:26:42,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.48 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 19:26:44,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.45 | bwd_microstep: 1532.90 | bwd_inner_microstep: 1532.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447 [2024-06-10 19:26:46,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 19:26:47,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.09 | bwd_microstep: 1285.75 | bwd_inner_microstep: 1285.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2434 [2024-06-10 19:26:49,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.16 | bwd_microstep: 945.66 | bwd_inner_microstep: 945.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 19:26:51,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1353.27 | bwd_inner_microstep: 1353.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 19:26:53,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1493.27 | bwd_inner_microstep: 1493.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 19:26:55,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1490.07 | bwd_inner_microstep: 1490.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2958 [2024-06-10 19:26:57,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.45 | bwd_microstep: 1291.69 | bwd_inner_microstep: 1291.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1979 [2024-06-10 19:26:58,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.69 | bwd_microstep: 735.20 | bwd_inner_microstep: 735.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 19:27:00,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.70 | bwd_microstep: 1615.42 | bwd_inner_microstep: 1615.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2203 [2024-06-10 19:27:01,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.59 | bwd_microstep: 863.48 | bwd_inner_microstep: 863.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 19:27:03,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.06 | bwd_microstep: 1189.73 | bwd_inner_microstep: 1189.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555 [2024-06-10 19:27:04,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.34 | bwd_microstep: 1202.17 | bwd_inner_microstep: 1202.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3726 [2024-06-10 19:27:06,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1339.20 | bwd_inner_microstep: 1339.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3612 [2024-06-10 19:27:08,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1641.88 | bwd_inner_microstep: 1641.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 19:27:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.60 | bwd_microstep: 1402.05 | bwd_inner_microstep: 1402.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952 [2024-06-10 19:27:11,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.89 | bwd_microstep: 731.39 | bwd_inner_microstep: 731.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3815 [2024-06-10 19:27:14,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.67 | bwd_microstep: 1689.82 | bwd_inner_microstep: 1689.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-10 19:27:16,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.05 | bwd_microstep: 1604.35 | bwd_inner_microstep: 1604.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2298 [2024-06-10 19:27:17,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.21 | bwd_microstep: 1006.20 | bwd_inner_microstep: 1006.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3556 [2024-06-10 19:27:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1424.79 | bwd_inner_microstep: 1424.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3437 [2024-06-10 19:27:21,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.11 | bwd_microstep: 1365.59 | bwd_inner_microstep: 1365.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 19:27:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-10 19:27:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 5295.01 | bwd_inner_microstep: 1628.17 | bwd_allreduce_microstep: 3666.79 | step_microstep: 38.34 [2024-06-10 19:27:27,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15361.73 | bwd: 44835.42 | bwd_inner: 41167.67 | bwd_allreduce: 3667.06 | step: 39.83 {'loss': 1.2565, 'learning_rate': 1.273812279987051e-05, 'epoch': 0.63} 1082/1726 [18:44:58<11:06:15, 62.07s/it] 63%|██████▎ | 1083/1726 [18:46:00<11:03:33, 61.92s/it] 63%|██████▎ | 1083/1726 [18:46:00<11:03:33, 61.92s/it] 63%|██████▎ | 1084/1726 [18:47:02<11:03:18, 61.99s/it] 63%|██████▎ | 1084/1726 [18:47:02<11:03:18, 61.99s/it] 63%|██████▎ | 1085/1726 [18:48:04<11:01:22, 61.91s/it] 63%|██████▎ | 1085/1726 [18:48:04<11:01:22, 61.91s/it] 63%|██████▎ | 1086/1726 [18:49:03<10:52:48, 61.20s/it] 63%|██████▎ | 1086/1726 [18:49:03<10:52:48, 61.20s/it] 63%|██████▎ | 1087/1726 [18:50:04<10:49:38, 61.00s/it] 63%|██████▎ | 1087/172dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-10 19:27:29,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.27 | bwd_microstep: 1333.39 | bwd_inner_microstep: 1333.12 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.20 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 19:27:31,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1243.97 | bwd_inner_microstep: 1243.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 19:27:33,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1474.01 | bwd_inner_microstep: 1473.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 19:27:35,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.81 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 19:27:37,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.32 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740 [2024-06-10 19:27:39,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1334.53 | bwd_inner_microstep: 1334.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205 [2024-06-10 19:27:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.08 | bwd_microstep: 954.06 | bwd_inner_microstep: 954.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 19:27:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.74 | bwd_microstep: 1277.35 | bwd_inner_microstep: 1277.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 19:27:43,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 794.45 | bwd_inner_microstep: 794.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-10 19:27:44,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.71 | bwd_microstep: 858.49 | bwd_inner_microstep: 858.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 19:27:46,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1520.98 | bwd_inner_microstep: 1520.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545 [2024-06-10 19:27:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919 [2024-06-10 19:27:49,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.13 | bwd_microstep: 778.65 | bwd_inner_microstep: 778.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3491 [2024-06-10 19:27:51,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1577.66 | bwd_inner_microstep: 1577.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 19:27:53,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.91 | bwd_microstep: 1503.03 | bwd_inner_microstep: 1503.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-10 19:27:55,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1579.64 | bwd_inner_microstep: 1579.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 19:27:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.24 | bwd_microstep: 1386.98 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 19:27:59,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1500.83 | bwd_inner_microstep: 1500.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 19:28:01,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1451.68 | bwd_inner_microstep: 1451.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 19:28:03,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 19:28:05,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1378.29 | bwd_inner_microstep: 1378.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3377 [2024-06-10 19:28:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.94 | bwd_microstep: 1269.88 | bwd_inner_microstep: 1269.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 19:28:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.98 | bwd_microstep: 1497.05 | bwd_inner_microstep: 1497.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3819 [2024-06-10 19:28:11,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.57 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2231 [2024-06-10 19:28:12,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.47 | bwd_microstep: 771.73 | bwd_inner_microstep: 771.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 19:28:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.21 | bwd_microstep: 1660.64 | bwd_inner_microstep: 1660.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611 [2024-06-10 19:28:16,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.99 | bwd_microstep: 1312.18 | bwd_inner_microstep: 1312.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550 [2024-06-10 19:28:19,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1586.27 | bwd_inner_microstep: 1586.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744 [2024-06-10 19:28:21,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.14 | bwd_microstep: 1640.27 | bwd_inner_microstep: 1640.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 19:28:23,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1374.80 | bwd_inner_microstep: 1374.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 19:28:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.70 | bwd_microstep: 1388.85 | bwd_inner_microstep: 1388.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 19:28:29,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.20 | optimizer_step: 6.58 [2024-06-10 19:28:29,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 3567.32 | bwd_inner_microstep: 1571.51 | bwd_allreduce_microstep: 1995.76 | step_microstep: 40.14 [2024-06-10 19:28:29,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16138.97 | bwd: 45258.05 | bwd_inner: 43261.17 | bwd_allreduce: 1996.10 | step: 41.82 {'loss': 1.2228, 'learning_rate': 1.270316324266768e-05, 'epoch': 0.63} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 19:28:31,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1337.65 | bwd_inner_microstep: 1337.54 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908 [2024-06-10 19:28:33,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 19:28:35,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1502.34 | bwd_inner_microstep: 1502.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2008 [2024-06-10 19:28:36,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.81 | bwd_microstep: 831.98 | bwd_inner_microstep: 831.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 19:28:38,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1380.25 | bwd_inner_microstep: 1380.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 19:28:40,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1246.25 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433 [2024-06-10 19:28:41,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1187.55 | bwd_inner_microstep: 1187.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 19:28:42,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 791.19 | bwd_inner_microstep: 791.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:28:44,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1387.43 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 19:28:46,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503 [2024-06-10 19:28:48,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.36 | bwd_microstep: 1222.90 | bwd_inner_microstep: 1222.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 19:28:50,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063 [2024-06-10 19:28:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.51 | bwd_microstep: 915.64 | bwd_inner_microstep: 915.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2970 [2024-06-10 19:28:52,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.59 | bwd_microstep: 1041.17 | bwd_inner_microstep: 1041.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3527 [2024-06-10 19:28:55,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1559.11 | bwd_inner_microstep: 1559.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 19:28:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1510.66 | bwd_inner_microstep: 1510.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3837 [2024-06-10 19:28:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.88 | bwd_microstep: 1689.27 | bwd_inner_microstep: 1689.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 19:29:01,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.50 | bwd_microstep: 1341.88 | bwd_inner_microstep: 1341.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637 [2024-06-10 19:29:02,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.88 | bwd_microstep: 1017.96 | bwd_inner_microstep: 1017.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 19:29:04,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.46 | bwd_microstep: 1304.19 | bwd_inner_microstep: 1304.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 19:29:06,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.87 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 19:29:08,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1400.23 | bwd_inner_microstep: 1400.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-10 19:29:09,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.34 | bwd_microstep: 815.64 | bwd_inner_microstep: 815.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-10 19:29:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.66 | bwd_microstep: 1529.51 | bwd_inner_microstep: 1529.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 19:29:12,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.98 | bwd_microstep: 801.31 | bwd_inner_microstep: 801.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 19:29:14,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.67 | bwd_microstep: 1489.49 | bwd_inner_microstep: 1489.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024 [2024-06-10 19:29:15,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.10 | bwd_microstep: 715.20 | bwd_inner_microstep: 715.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577 [2024-06-10 19:29:17,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.42 | bwd_inner_microstep: 1432.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-10 19:29:19,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.92 | bwd_microstep: 1467.07 | bwd_inner_microstep: 1467.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3570 [2024-06-10 19:29:21,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1629.95 | bwd_inner_microstep: 1629.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3584 [2024-06-10 19:29:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.02 | bwd_microstep: 1619.83 | bwd_inner_microstep: 1619.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-10 19:29:31,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-10 19:29:31,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 6298.14 | bwd_inner_microstep: 1748.42 | bwd_allreduce_microstep: 4549.67 | step_microstep: 37.88 [2024-06-10 19:29:31,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15480.04 | bwd: 45980.83 | bwd_inner: 41430.15 | bwd_allreduce: 4549.97 | step: 39.44 {'loss': 1.2098, 'learning_rate': 1.266822938488597e-05, 'epoch': 0.63} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-10 19:29:32,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 19:29:34,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.51 | bwd_microstep: 1274.24 | bwd_inner_microstep: 1274.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-10 19:29:37,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.14 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 19:29:38,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150 [2024-06-10 19:29:40,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.83 | bwd_microstep: 848.10 | bwd_inner_microstep: 848.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2526 [2024-06-10 19:29:41,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.19 | bwd_microstep: 1027.70 | bwd_inner_microstep: 1027.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485 [2024-06-10 19:29:43,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.78 | bwd_microstep: 1186.14 | bwd_inner_microstep: 1186.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 19:29:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2040 [2024-06-10 19:29:46,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.90 | bwd_microstep: 808.75 | bwd_inner_microstep: 808.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-10 19:29:47,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1345.19 | bwd_inner_microstep: 1345.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 19:29:49,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1249.27 | bwd_inner_microstep: 1249.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2103 [2024-06-10 19:29:50,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.13 | bwd_microstep: 822.91 | bwd_inner_microstep: 822.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499 [2024-06-10 19:29:52,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1324.94 | bwd_inner_microstep: 1324.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661 [2024-06-10 19:29:54,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1610.67 | bwd_inner_microstep: 1610.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3505 [2024-06-10 19:29:56,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.33 | bwd_microstep: 1365.66 | bwd_inner_microstep: 1365.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3542 [2024-06-10 19:29:58,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.21 | bwd_microstep: 1417.32 | bwd_inner_microstep: 1417.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3940 [2024-06-10 19:30:01,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.07 | bwd_microstep: 1796.45 | bwd_inner_microstep: 1796.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 19:30:02,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1339.94 | bwd_inner_microstep: 1339.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 19:30:04,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.19 | bwd_microstep: 1449.97 | bwd_inner_microstep: 1449.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 19:30:06,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.41 | bwd_microstep: 1180.43 | bwd_inner_microstep: 1180.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3814 [2024-06-10 19:30:08,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1618.11 | bwd_inner_microstep: 1618.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 19:30:10,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1283.87 | bwd_inner_microstep: 1283.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 19:30:12,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.27 | bwd_microstep: 1654.08 | bwd_inner_microstep: 1654.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-10 19:30:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.83 | bwd_microstep: 1516.85 | bwd_inner_microstep: 1516.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 19:30:17,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1504.86 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2264 [2024-06-10 19:30:18,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.97 | bwd_microstep: 970.41 | bwd_inner_microstep: 970.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 19:30:20,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1486.22 | bwd_inner_microstep: 1486.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 19:30:22,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1544.49 | bwd_inner_microstep: 1544.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-10 19:30:24,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1336.32 | bwd_inner_microstep: 1336.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 19:30:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1388.67 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3947 [2024-06-10 19:30:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.06 | bwd_microstep: 1803.06 | bwd_inner_microstep: 1803.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3566 [2024-06-10 19:30:32,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 19:30:32,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.26 | bwd_microstep: 3456.46 | bwd_inner_microstep: 1717.21 | bwd_allreduce_microstep: 1739.20 | step_microstep: 37.76 [2024-06-10 19:30:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16213.50 | bwd: 45324.89 | bwd_inner: 43584.79 | bwd_allreduce: 1739.43 | step: 39.25 {'loss': 1.2051, 'learning_rate': 1.263332134956226e-05, 'epoch': 0.63} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423 [2024-06-10 19:30:34,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.96 | bwd_microstep: 1154.65 | bwd_inner_microstep: 1154.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258 [2024-06-10 19:30:35,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.56 | bwd_microstep: 965.10 | bwd_inner_microstep: 965.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 19:30:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1445.50 | bwd_inner_microstep: 1445.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 19:30:39,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1278.07 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 19:30:41,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1389.46 | bwd_inner_microstep: 1389.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 19:30:43,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.92 | bwd_inner_microstep: 1281.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 19:30:45,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3493 [2024-06-10 19:30:47,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.08 | bwd_microstep: 1329.26 | bwd_inner_microstep: 1329.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3812 [2024-06-10 19:30:49,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1529.64 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3671 [2024-06-10 19:30:51,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.21 | bwd_microstep: 1610.99 | bwd_inner_microstep: 1610.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 19:30:53,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.10 | bwd_microstep: 1433.85 | bwd_inner_microstep: 1433.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 19:30:55,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1348.35 | bwd_inner_microstep: 1348.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4005 [2024-06-10 19:30:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.36 | bwd_microstep: 1677.13 | bwd_inner_microstep: 1677.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1888 [2024-06-10 19:30:58,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.25 | bwd_microstep: 775.10 | bwd_inner_microstep: 775.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3667 [2024-06-10 19:31:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.24 | bwd_microstep: 1716.78 | bwd_inner_microstep: 1716.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-10 19:31:03,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1511.79 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2953 [2024-06-10 19:31:04,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.75 | bwd_microstep: 1100.04 | bwd_inner_microstep: 1100.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1974 [2024-06-10 19:31:05,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.49 | bwd_microstep: 859.54 | bwd_inner_microstep: 859.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 19:31:07,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1554.37 | bwd_inner_microstep: 1554.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 19:31:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.23 | bwd_microstep: 1280.33 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 19:31:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1416.58 | bwd_inner_microstep: 1416.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-10 19:31:13,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.48 | bwd_microstep: 1421.47 | bwd_inner_microstep: 1421.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010 [2024-06-10 19:31:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.69 | bwd_microstep: 804.53 | bwd_inner_microstep: 804.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 19:31:16,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2246 [2024-06-10 19:31:17,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.32 | bwd_microstep: 931.49 | bwd_inner_microstep: 931.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 19:31:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.89 | bwd_microstep: 1654.26 | bwd_inner_microstep: 1654.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699 [2024-06-10 19:31:22,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1328.90 | bwd_inner_microstep: 1328.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 19:31:23,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.52 | bwd_microstep: 899.93 | bwd_inner_microstep: 899.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-10 19:31:25,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1606.66 | bwd_inner_microstep: 1606.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612 [2024-06-10 19:31:27,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.93 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-10 19:31:29,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1540.91 | bwd_inner_microstep: 1540.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 19:31:33,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 19:31:33,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.75 | bwd_microstep: 3549.27 | bwd_inner_microstep: 1573.18 | bwd_allreduce_microstep: 1976.04 | step_microstep: 37.68 [2024-06-10 19:31:33,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.21 | bwd: 44714.85 | bwd_inner: 42737.91 | bwd_allreduce: 1976.27 | step: 39.13 {'loss': 1.224, 'learning_rate': 1.2598439259642459e-05, 'epoch': 0.63} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 19:31:35,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.99 | bwd_microstep: 1470.51 | bwd_inner_microstep: 1470.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2499 [2024-06-10 19:31:37,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.84 | bwd_microstep: 1020.84 | bwd_inner_microstep: 1020.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865 [2024-06-10 19:31:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1522.20 | bwd_inner_microstep: 1522.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 19:31:41,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1648.72 | bwd_inner_microstep: 1648.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3956 [2024-06-10 19:31:43,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.38 | bwd_microstep: 1594.65 | bwd_inner_microstep: 1594.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747 [2024-06-10 19:31:46,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1637.66 | bwd_inner_microstep: 1637.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 19:31:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 19:31:49,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.72 | bwd_microstep: 1388.26 | bwd_inner_microstep: 1388.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 19:31:51,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.92 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2109 [2024-06-10 19:31:52,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.46 | bwd_microstep: 825.29 | bwd_inner_microstep: 825.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3943 [2024-06-10 19:31:55,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.52 | bwd_microstep: 1694.94 | bwd_inner_microstep: 1694.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 19:31:56,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1251.55 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-10 19:31:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.90 | bwd_microstep: 1532.30 | bwd_inner_microstep: 1532.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1931 [2024-06-10 19:31:59,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.73 | bwd_microstep: 775.21 | bwd_inner_microstep: 775.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686 [2024-06-10 19:32:02,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.26 | bwd_microstep: 1722.59 | bwd_inner_microstep: 1722.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 19:32:04,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.36 | bwd_microstep: 1581.93 | bwd_inner_microstep: 1581.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2112 [2024-06-10 19:32:05,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.37 | bwd_microstep: 827.71 | bwd_inner_microstep: 827.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2783 [2024-06-10 19:32:07,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.89 | bwd_microstep: 1053.54 | bwd_inner_microstep: 1053.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2095 [2024-06-10 19:32:08,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.70 | bwd_microstep: 979.38 | bwd_inner_microstep: 979.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2126 [2024-06-10 19:32:09,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.65 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2141 [2024-06-10 19:32:11,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.59 | bwd_microstep: 1024.76 | bwd_inner_microstep: 1024.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007 [2024-06-10 19:32:12,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.76 | bwd_microstep: 709.81 | bwd_inner_microstep: 709.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-10 19:32:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1638.38 | bwd_inner_microstep: 1638.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 19:32:16,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.18 | bwd_microstep: 1455.52 | bwd_inner_microstep: 1455.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 19:32:18,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551 [2024-06-10 19:32:19,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.75 | bwd_microstep: 1298.72 | bwd_inner_microstep: 1298.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 19:32:22,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 19:32:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1351.54 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 19:32:25,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.97 | bwd_microstep: 1251.82 | bwd_inner_microstep: 1251.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 19:32:27,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.61 | bwd_microstep: 1558.47 | bwd_inner_microstep: 1558.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 19:32:29,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1546.50 | bwd_inner_microstep: 1546.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 19:32:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 19:32:36,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.61 | bwd_microstep: 5600.98 | bwd_inner_microstep: 1871.15 | bwd_allreduce_microstep: 3729.78 | step_microstep: 37.93 [2024-06-10 19:32:36,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15791.32 | bwd: 46276.97 | bwd_inner: 42546.28 | bwd_allreduce: 3730.01 | step: 39.38 {'loss': 1.2286, 'learning_rate': 1.2563583237981103e-05, 'epoch': 0.63} 6 [18:50:04<10:49:38, 61.00s/it] 63%|██████▎ | 1088/1726 [18:51:05<10:51:00, 61.22s/it] 63%|██████▎ | 1088/1726 [18:51:05<10:51:00, 61.22s/it] 63%|██████▎ | 1089/1726 [18:52:07<10:51:47, 61.39s/it] 63%|██████▎ | 1089/1726 [18:52:07<10:51:47, 61.39s/it] 63%|██████▎ | 1090/1726 [18:53:09<10:52:17, 61.54s/it] 63%|██████▎ | 1090/1726 [18:53:09<10:52:17, 61.54s/it] 63%|██████▎ | 1091/1726 [18:54:10<10:49:20, 61.36s/it] 63%|██████▎ | 1091/1726 [18:54:10<10:49:20, 61.36s/it] 63%|██████▎ | 1092/1726 [18:55:12<10:51:39, 61.67s/it] 63%|██████▎ | 1092/1726 [18:55:dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 19:32:38,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.80 | bwd_microstep: 1333.45 | bwd_inner_microstep: 1333.31 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 19:32:40,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1455.00 | bwd_inner_microstep: 1454.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 19:32:41,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1289.55 | bwd_inner_microstep: 1289.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3899 [2024-06-10 19:32:43,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.34 | bwd_microstep: 1517.29 | bwd_inner_microstep: 1517.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 19:32:45,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941 [2024-06-10 19:32:46,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.08 | bwd_microstep: 820.62 | bwd_inner_microstep: 820.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3426 [2024-06-10 19:32:48,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.84 | bwd_microstep: 1248.07 | bwd_inner_microstep: 1248.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888 [2024-06-10 19:32:50,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.48 | bwd_microstep: 1087.08 | bwd_inner_microstep: 1087.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 19:32:51,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495 [2024-06-10 19:32:53,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1413.44 | bwd_inner_microstep: 1413.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479 [2024-06-10 19:32:55,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1329.12 | bwd_inner_microstep: 1329.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436 [2024-06-10 19:32:57,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.44 | bwd_microstep: 1314.44 | bwd_inner_microstep: 1314.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074 [2024-06-10 19:32:58,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.63 | bwd_microstep: 818.66 | bwd_inner_microstep: 818.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-10 19:33:00,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1514.73 | bwd_inner_microstep: 1514.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2960 [2024-06-10 19:33:02,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.43 | bwd_microstep: 1103.57 | bwd_inner_microstep: 1103.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-10 19:33:04,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.96 | bwd_microstep: 1448.05 | bwd_inner_microstep: 1448.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 19:33:06,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1382.64 | bwd_inner_microstep: 1382.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 19:33:07,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.20 | bwd_microstep: 1284.38 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3837 [2024-06-10 19:33:09,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1293.57 | bwd_inner_microstep: 1293.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 19:33:11,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.98 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 19:33:13,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1406.30 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 19:33:15,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1377.05 | bwd_inner_microstep: 1377.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 19:33:17,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.91 | bwd_microstep: 1558.75 | bwd_inner_microstep: 1558.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 19:33:19,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1251.91 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457 [2024-06-10 19:33:21,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1407.89 | bwd_inner_microstep: 1407.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3726 [2024-06-10 19:33:22,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1336.48 | bwd_inner_microstep: 1336.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2846 [2024-06-10 19:33:24,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.90 | bwd_microstep: 1098.72 | bwd_inner_microstep: 1098.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3726 [2024-06-10 19:33:26,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1557.63 | bwd_inner_microstep: 1557.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591 [2024-06-10 19:33:28,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.68 | bwd_microstep: 1336.90 | bwd_inner_microstep: 1336.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2083 [2024-06-10 19:33:29,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.02 | bwd_microstep: 821.42 | bwd_inner_microstep: 821.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-10 19:33:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.32 | bwd_microstep: 1300.29 | bwd_inner_microstep: 1300.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 19:33:36,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-10 19:33:36,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 4548.46 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 3021.68 | step_microstep: 37.82 [2024-06-10 19:33:36,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15514.05 | bwd: 44464.20 | bwd_inner: 41441.49 | bwd_allreduce: 3021.97 | step: 39.38 {'loss': 1.2142, 'learning_rate': 1.2528753407340929e-05, 'epoch': 0.63} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 19:33:38,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.32 | bwd_microstep: 1368.21 | bwd_inner_microstep: 1368.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2452 [2024-06-10 19:33:39,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.59 | bwd_microstep: 1044.45 | bwd_inner_microstep: 1044.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4241 [2024-06-10 19:33:42,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.27 | bwd_microstep: 1662.09 | bwd_inner_microstep: 1662.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 19:33:44,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.33 | bwd_microstep: 1649.68 | bwd_inner_microstep: 1649.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 19:33:46,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 19:33:48,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1280.88 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2191 [2024-06-10 19:33:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.72 | bwd_microstep: 951.28 | bwd_inner_microstep: 951.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4080 [2024-06-10 19:33:51,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1522.61 | bwd_inner_microstep: 1522.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 19:33:53,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.57 | bwd_microstep: 1385.15 | bwd_inner_microstep: 1385.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004 [2024-06-10 19:33:54,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.06 | bwd_microstep: 894.64 | bwd_inner_microstep: 894.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3510 [2024-06-10 19:33:56,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1510.45 | bwd_inner_microstep: 1510.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 19:33:58,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.92 | bwd_microstep: 1386.79 | bwd_inner_microstep: 1386.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 19:34:00,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.97 | bwd_microstep: 1452.10 | bwd_inner_microstep: 1452.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3657 [2024-06-10 19:34:03,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.96 | bwd_microstep: 1818.66 | bwd_inner_microstep: 1818.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 19:34:04,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1298.34 | bwd_inner_microstep: 1298.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 19:34:06,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.64 | bwd_microstep: 1289.32 | bwd_inner_microstep: 1289.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3875 [2024-06-10 19:34:08,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.32 | bwd_microstep: 1490.28 | bwd_inner_microstep: 1490.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2279 [2024-06-10 19:34:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.22 | bwd_microstep: 972.98 | bwd_inner_microstep: 972.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 19:34:12,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1394.69 | bwd_inner_microstep: 1394.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 19:34:14,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1538.61 | bwd_inner_microstep: 1538.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3853 [2024-06-10 19:34:15,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1272.42 | bwd_inner_microstep: 1272.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3680 [2024-06-10 19:34:17,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.70 | bwd_microstep: 1262.34 | bwd_inner_microstep: 1262.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912 [2024-06-10 19:34:18,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.26 | bwd_microstep: 685.81 | bwd_inner_microstep: 685.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 19:34:20,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1354.89 | bwd_inner_microstep: 1354.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 19:34:22,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1531.84 | bwd_inner_microstep: 1531.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-10 19:34:24,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.20 | bwd_microstep: 1646.90 | bwd_inner_microstep: 1646.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3614 [2024-06-10 19:34:27,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1562.39 | bwd_inner_microstep: 1562.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819 [2024-06-10 19:34:29,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.27 | bwd_microstep: 1690.42 | bwd_inner_microstep: 1690.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583 [2024-06-10 19:34:31,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.27 | bwd_microstep: 1531.13 | bwd_inner_microstep: 1531.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3466 [2024-06-10 19:34:33,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.03 | bwd_microstep: 1572.80 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428 [2024-06-10 19:34:35,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1349.62 | bwd_inner_microstep: 1349.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2932 [2024-06-10 19:34:37,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-10 19:34:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.67 | bwd_microstep: 1941.14 | bwd_inner_microstep: 1298.11 | bwd_allreduce_microstep: 642.99 | step_microstep: 37.67 [2024-06-10 19:34:37,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16421.59 | bwd: 44691.24 | bwd_inner: 44047.35 | bwd_allreduce: 643.21 | step: 39.13 {'loss': 1.1726, 'learning_rate': 1.2493949890392418e-05, 'epoch': 0.63} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-10 19:34:40,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1470.62 | bwd_inner_microstep: 1470.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 19:34:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1475.34 | bwd_inner_microstep: 1475.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:34:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.02 | bwd_microstep: 1378.45 | bwd_inner_microstep: 1378.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 19:34:45,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 19:34:46,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 695.96 | bwd_inner_microstep: 695.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 19:34:47,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.96 | bwd_inner_microstep: 1247.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 19:34:49,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1279.10 | bwd_inner_microstep: 1279.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1957 [2024-06-10 19:34:50,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.02 | bwd_microstep: 730.84 | bwd_inner_microstep: 730.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 19:34:52,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1509.69 | bwd_inner_microstep: 1509.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 19:34:54,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.83 | bwd_microstep: 1155.77 | bwd_inner_microstep: 1155.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 19:34:56,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.95 | bwd_microstep: 1341.23 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3092 [2024-06-10 19:34:57,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.32 | bwd_microstep: 1298.80 | bwd_inner_microstep: 1298.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3488 [2024-06-10 19:34:59,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 19:35:01,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.50 | bwd_microstep: 1478.23 | bwd_inner_microstep: 1478.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-10 19:35:03,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1192.82 | bwd_inner_microstep: 1192.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3824 [2024-06-10 19:35:05,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 19:35:07,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.95 | bwd_microstep: 1189.72 | bwd_inner_microstep: 1189.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-10 19:35:08,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.95 | bwd_microstep: 795.97 | bwd_inner_microstep: 795.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3519 [2024-06-10 19:35:09,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1222.78 | bwd_inner_microstep: 1222.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-10 19:35:11,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.57 | bwd_microstep: 1192.24 | bwd_inner_microstep: 1192.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 19:35:13,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1254.21 | bwd_inner_microstep: 1254.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3618 [2024-06-10 19:35:15,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.70 | bwd_microstep: 1466.57 | bwd_inner_microstep: 1466.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 19:35:17,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1553.82 | bwd_inner_microstep: 1553.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 19:35:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.48 | bwd_microstep: 1495.73 | bwd_inner_microstep: 1495.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 19:35:21,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1253.11 | bwd_inner_microstep: 1253.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593 [2024-06-10 19:35:23,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.84 | bwd_microstep: 1365.68 | bwd_inner_microstep: 1365.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591 [2024-06-10 19:35:25,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.14 | bwd_microstep: 1458.04 | bwd_inner_microstep: 1458.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3814 [2024-06-10 19:35:27,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.96 | bwd_microstep: 1510.36 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2673 [2024-06-10 19:35:28,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.78 | bwd_microstep: 1116.28 | bwd_inner_microstep: 1116.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 19:35:30,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.38 | bwd_microstep: 1354.95 | bwd_inner_microstep: 1354.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3446 [2024-06-10 19:35:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1398.74 | bwd_inner_microstep: 1398.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 19:35:39,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 19:35:39,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.95 | bwd_microstep: 6078.94 | bwd_inner_microstep: 1570.34 | bwd_allreduce_microstep: 4508.54 | step_microstep: 38.16 [2024-06-10 19:35:39,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15385.04 | bwd: 45566.09 | bwd_inner: 41056.64 | bwd_allreduce: 4508.77 | step: 39.61 {'loss': 1.1916, 'learning_rate': 1.245917280971337e-05, 'epoch': 0.63} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 19:35:41,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1472.78 | bwd_inner_microstep: 1472.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 19:35:43,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1379.97 | bwd_inner_microstep: 1379.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 19:35:44,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.58 | bwd_microstep: 1275.86 | bwd_inner_microstep: 1275.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-10 19:35:47,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1584.98 | bwd_inner_microstep: 1584.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 19:35:49,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 19:35:50,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.71 | bwd_microstep: 1281.49 | bwd_inner_microstep: 1281.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 19:35:52,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.51 | bwd_microstep: 1314.25 | bwd_inner_microstep: 1314.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 19:35:54,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1251.68 | bwd_inner_microstep: 1251.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 19:35:56,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 19:35:58,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.87 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 19:36:00,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.97 | bwd_microstep: 1396.28 | bwd_inner_microstep: 1396.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 19:36:01,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.12 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4056 [2024-06-10 19:36:04,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.77 | bwd_microstep: 1724.50 | bwd_inner_microstep: 1724.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3433 [2024-06-10 19:36:05,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.18 | bwd_microstep: 1184.50 | bwd_inner_microstep: 1184.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 19:36:07,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.96 | bwd_microstep: 1257.96 | bwd_inner_microstep: 1257.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825 [2024-06-10 19:36:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.80 | bwd_microstep: 1751.32 | bwd_inner_microstep: 1751.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 19:36:11,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 796.39 | bwd_inner_microstep: 796.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561 [2024-06-10 19:36:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1595.26 | bwd_inner_microstep: 1595.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 19:36:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1287.46 | bwd_inner_microstep: 1287.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 19:36:16,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 19:36:18,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.19 | bwd_microstep: 1391.00 | bwd_inner_microstep: 1390.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 19:36:20,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1393.51 | bwd_inner_microstep: 1393.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3655 [2024-06-10 19:36:23,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.59 | bwd_microstep: 1655.04 | bwd_inner_microstep: 1655.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942 [2024-06-10 19:36:24,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 700.92 | bwd_inner_microstep: 700.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 19:36:25,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1257.54 | bwd_inner_microstep: 1257.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 19:36:27,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-10 19:36:29,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1547.43 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3553 [2024-06-10 19:36:31,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.44 | bwd_microstep: 1459.85 | bwd_inner_microstep: 1459.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3780 [2024-06-10 19:36:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.88 | bwd_microstep: 1639.40 | bwd_inner_microstep: 1639.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2929 [2024-06-10 19:36:35,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.19 | bwd_microstep: 1225.44 | bwd_inner_microstep: 1225.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 19:36:37,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.04 | bwd_microstep: 1308.24 | bwd_inner_microstep: 1308.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876 [2024-06-10 19:36:39,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-10 19:36:39,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.10 | bwd_microstep: 1521.89 | bwd_inner_microstep: 1514.24 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.53 [2024-06-10 19:36:39,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16414.08 | bwd: 43774.75 | bwd_inner: 43766.25 | bwd_allreduce: 7.84 | step: 39.02 {'loss': 1.1956, 'learning_rate': 1.242442228778848e-05, 'epoch': 0.63} dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 1953 [2024-06-10 19:36:40,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.46 | bwd_microstep: 802.35 | bwd_inner_microstep: 802.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3879 [2024-06-10 19:36:43,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.50 | bwd_microstep: 1544.44 | bwd_inner_microstep: 1544.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 19:36:44,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1380.42 | bwd_inner_microstep: 1380.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 19:36:46,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1480.32 | bwd_inner_microstep: 1480.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 19:36:49,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.44 | bwd_microstep: 1548.40 | bwd_inner_microstep: 1548.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 19:36:50,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.66 | bwd_microstep: 790.86 | bwd_inner_microstep: 790.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 19:36:52,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.90 | bwd_microstep: 1529.72 | bwd_inner_microstep: 1529.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-10 19:36:53,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 788.76 | bwd_inner_microstep: 788.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 19:36:54,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.57 | bwd_microstep: 794.58 | bwd_inner_microstep: 794.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3690 [2024-06-10 19:36:56,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.28 | bwd_microstep: 1485.59 | bwd_inner_microstep: 1485.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4114 [2024-06-10 19:36:58,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.28 | bwd_microstep: 1668.21 | bwd_inner_microstep: 1668.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3540 [2024-06-10 19:37:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.40 | bwd_microstep: 1324.72 | bwd_inner_microstep: 1324.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 19:37:02,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1379.55 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3936 [2024-06-10 19:37:04,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.14 | bwd_microstep: 1688.63 | bwd_inner_microstep: 1688.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 19:37:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.65 | bwd_inner_microstep: 1244.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952 [2024-06-10 19:37:08,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1503.54 | bwd_inner_microstep: 1503.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-10 19:37:10,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1422.61 | bwd_inner_microstep: 1422.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 19:37:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.32 | bwd_microstep: 1381.37 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 19:37:14,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.76 | bwd_microstep: 1508.86 | bwd_inner_microstep: 1508.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 623 [2024-06-10 19:37:15,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.30 | bwd_microstep: 265.08 | bwd_inner_microstep: 265.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 618 [2024-06-10 19:37:15,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.61 | bwd_microstep: 261.40 | bwd_inner_microstep: 261.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3459 [2024-06-10 19:37:17,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.67 | bwd_microstep: 1505.09 | bwd_inner_microstep: 1505.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-10 19:37:19,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.73 | bwd_microstep: 1518.09 | bwd_inner_microstep: 1518.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 19:37:21,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 19:37:23,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1485.43 | bwd_inner_microstep: 1485.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857 [2024-06-10 19:37:25,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1363.32 | bwd_inner_microstep: 1363.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2006 [2024-06-10 19:37:26,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.98 | bwd_microstep: 709.53 | bwd_inner_microstep: 709.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4055 [2024-06-10 19:37:28,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1458.55 | bwd_inner_microstep: 1458.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 19:37:30,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1395.17 | bwd_inner_microstep: 1395.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3438 [2024-06-10 19:37:32,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.83 | bwd_microstep: 1311.09 | bwd_inner_microstep: 1311.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 19:37:33,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1378.19 | bwd_inner_microstep: 1378.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2628 [2024-06-10 19:37:40,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.20 | optimizer_step: 6.57 [2024-06-10 19:37:40,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.84 | bwd_microstep: 6535.03 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 5278.72 | step_microstep: 37.78 [2024-06-10 19:37:40,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15124.88 | bwd: 45703.22 | bwd_inner: 40423.58 | bwd_allreduce: 5278.96 | step: 39.27 {'loss': 1.2249, 'learning_rate': 1.2389698447008916e-05, 'epoch': 0.64} 12<10:51:39, 61.67s/it] 63%|██████▎ | 1093/1726 [18:56:13<10:46:19, 61.26s/it] 63%|██████▎ | 1093/1726 [18:56:13<10:46:19, 61.26s/it] 63%|██████▎ | 1094/1726 [18:57:14<10:45:52, 61.32s/it] 63%|██████▎ | 1094/1726 [18:57:14<10:45:52, 61.32s/it] 63%|██████▎ | 1095/1726 [18:58:16<10:44:42, 61.30s/it] 63%|██████▎ | 1095/1726 [18:58:16<10:44:42, 61.30s/it] 63%|██████▎ | 1096/1726 [18:59:16<10:41:12, 61.07s/it] 63%|██████▎ | 1096/1726 [18:59:16<10:41:12, 61.07s/it] 64%|██████▎ | 1097/1726 [19:00:17<10:40:27, 61.09s/it] 64%|██████▎ | 1097/1726 [19:00:17<10:40:dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935 [2024-06-10 19:37:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.90 | bwd_microstep: 877.64 | bwd_inner_microstep: 877.55 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640 [2024-06-10 19:37:44,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.13 | bwd_microstep: 1409.28 | bwd_inner_microstep: 1409.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876 [2024-06-10 19:37:46,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.36 | bwd_microstep: 1581.21 | bwd_inner_microstep: 1581.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2293 [2024-06-10 19:37:47,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 784.05 | bwd_inner_microstep: 784.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 19:37:49,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1281.76 | bwd_inner_microstep: 1281.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3799 [2024-06-10 19:37:51,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 19:37:53,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.46 | bwd_microstep: 1405.28 | bwd_inner_microstep: 1405.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2425 [2024-06-10 19:37:54,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.99 | bwd_microstep: 841.91 | bwd_inner_microstep: 841.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839 [2024-06-10 19:37:56,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1559.18 | bwd_inner_microstep: 1559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2154 [2024-06-10 19:37:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.22 | bwd_microstep: 909.92 | bwd_inner_microstep: 909.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 19:37:58,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.10 | bwd_microstep: 792.76 | bwd_inner_microstep: 792.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480 [2024-06-10 19:38:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1412.92 | bwd_inner_microstep: 1412.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3498 [2024-06-10 19:38:02,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1331.56 | bwd_inner_microstep: 1331.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 19:38:03,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.27 | bwd_microstep: 888.72 | bwd_inner_microstep: 888.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 19:38:05,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.85 | bwd_microstep: 1482.76 | bwd_inner_microstep: 1482.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 19:38:07,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2287 [2024-06-10 19:38:09,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.12 | bwd_microstep: 1072.67 | bwd_inner_microstep: 1072.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 19:38:11,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.62 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 19:38:12,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 19:38:14,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1297.16 | bwd_inner_microstep: 1297.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640 [2024-06-10 19:38:16,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.19 | bwd_microstep: 1418.79 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 19:38:18,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.36 | bwd_microstep: 977.89 | bwd_inner_microstep: 977.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607 [2024-06-10 19:38:20,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1439.20 | bwd_inner_microstep: 1439.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 19:38:22,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.13 | bwd_microstep: 1488.63 | bwd_inner_microstep: 1488.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 19:38:23,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1296.98 | bwd_inner_microstep: 1296.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3546 [2024-06-10 19:38:25,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1375.56 | bwd_inner_microstep: 1375.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 19:38:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1394.38 | bwd_inner_microstep: 1394.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 19:38:29,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3755 [2024-06-10 19:38:31,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1500.50 | bwd_inner_microstep: 1500.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3591 [2024-06-10 19:38:34,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.92 | bwd_microstep: 1702.15 | bwd_inner_microstep: 1702.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3565 [2024-06-10 19:38:36,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1544.55 | bwd_inner_microstep: 1544.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 19:38:42,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.29 | optimizer_step: 6.63 [2024-06-10 19:38:42,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.15 | bwd_microstep: 5213.26 | bwd_inner_microstep: 1552.63 | bwd_allreduce_microstep: 3660.57 | step_microstep: 38.08 [2024-06-10 19:38:42,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15515.35 | bwd: 45229.62 | bwd_inner: 41568.07 | bwd_allreduce: 3660.84 | step: 39.63 {'loss': 1.2006, 'learning_rate': 1.2355001409671856e-05, 'epoch': 0.64} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 19:38:44,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1473.90 | bwd_inner_microstep: 1473.73 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 19:38:45,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1273.94 | bwd_inner_microstep: 1273.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848 [2024-06-10 19:38:47,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.04 | bwd_microstep: 1460.96 | bwd_inner_microstep: 1460.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3484 [2024-06-10 19:38:49,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.64 | bwd_microstep: 1247.58 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882 [2024-06-10 19:38:50,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.38 | bwd_microstep: 679.66 | bwd_inner_microstep: 679.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 19:38:52,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1248.26 | bwd_inner_microstep: 1248.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4013 [2024-06-10 19:38:54,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.91 | bwd_microstep: 1619.06 | bwd_inner_microstep: 1619.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3746 [2024-06-10 19:38:56,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1442.97 | bwd_inner_microstep: 1442.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505 [2024-06-10 19:38:58,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.03 | bwd_microstep: 1319.29 | bwd_inner_microstep: 1319.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3501 [2024-06-10 19:39:00,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3920 [2024-06-10 19:39:02,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1584.63 | bwd_inner_microstep: 1584.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2621 [2024-06-10 19:39:04,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 424.46 | bwd_microstep: 1143.40 | bwd_inner_microstep: 1143.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 19:39:05,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1346.93 | bwd_inner_microstep: 1346.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3838 [2024-06-10 19:39:07,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1391.20 | bwd_inner_microstep: 1391.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 19:39:09,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 19:39:11,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1280.80 | bwd_inner_microstep: 1280.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 19:39:13,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1500.64 | bwd_inner_microstep: 1500.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3682 [2024-06-10 19:39:15,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1423.58 | bwd_inner_microstep: 1423.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 19:39:17,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.62 | bwd_microstep: 1611.90 | bwd_inner_microstep: 1611.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998 [2024-06-10 19:39:18,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.07 | bwd_microstep: 708.01 | bwd_inner_microstep: 707.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 19:39:20,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1390.87 | bwd_inner_microstep: 1390.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063 [2024-06-10 19:39:21,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.30 | bwd_microstep: 913.83 | bwd_inner_microstep: 913.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3443 [2024-06-10 19:39:23,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.29 | bwd_microstep: 1285.11 | bwd_inner_microstep: 1285.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 19:39:25,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1556.55 | bwd_inner_microstep: 1556.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3429 [2024-06-10 19:39:27,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.49 | bwd_microstep: 1379.78 | bwd_inner_microstep: 1379.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3427 [2024-06-10 19:39:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1409.14 | bwd_inner_microstep: 1409.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389 [2024-06-10 19:39:31,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.08 | bwd_microstep: 1337.80 | bwd_inner_microstep: 1337.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 609 [2024-06-10 19:39:31,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.65 | bwd_microstep: 259.98 | bwd_inner_microstep: 259.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 19:39:34,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.24 | bwd_microstep: 1600.95 | bwd_inner_microstep: 1600.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 19:39:36,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.96 | bwd_microstep: 1757.16 | bwd_inner_microstep: 1757.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3606 [2024-06-10 19:39:38,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1470.44 | bwd_inner_microstep: 1470.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 19:39:42,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.19 | optimizer_step: 6.60 [2024-06-10 19:39:42,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 3518.85 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 1939.54 | step_microstep: 37.75 [2024-06-10 19:39:42,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15842.83 | bwd: 44446.77 | bwd_inner: 42506.19 | bwd_allreduce: 1939.84 | step: 39.33 {'loss': 1.1916, 'learning_rate': 1.2320331297980097e-05, 'epoch': 0.64} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468 [2024-06-10 19:39:44,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1398.69 | bwd_inner_microstep: 1398.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3913 [2024-06-10 19:39:46,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1490.02 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 19:39:48,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1374.38 | bwd_inner_microstep: 1374.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 19:39:50,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1296.54 | bwd_inner_microstep: 1296.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-10 19:39:52,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1647.02 | bwd_inner_microstep: 1646.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401 [2024-06-10 19:39:54,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.03 | bwd_microstep: 1180.92 | bwd_inner_microstep: 1180.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 19:39:55,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1244.70 | bwd_inner_microstep: 1244.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705 [2024-06-10 19:39:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 19:39:59,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1388.61 | bwd_inner_microstep: 1388.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 19:40:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1247.59 | bwd_inner_microstep: 1247.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2214 [2024-06-10 19:40:02,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.62 | bwd_microstep: 987.14 | bwd_inner_microstep: 987.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3494 [2024-06-10 19:40:04,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.86 | bwd_microstep: 1514.19 | bwd_inner_microstep: 1514.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684 [2024-06-10 19:40:07,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.29 | bwd_microstep: 1525.57 | bwd_inner_microstep: 1525.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 19:40:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1467.14 | bwd_inner_microstep: 1467.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3647 [2024-06-10 19:40:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.46 | bwd_microstep: 1535.21 | bwd_inner_microstep: 1535.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446 [2024-06-10 19:40:12,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.83 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988 [2024-06-10 19:40:14,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.76 | bwd_microstep: 833.20 | bwd_inner_microstep: 833.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970 [2024-06-10 19:40:15,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 798.42 | bwd_inner_microstep: 798.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 19:40:16,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.70 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-10 19:40:18,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1412.92 | bwd_inner_microstep: 1412.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3564 [2024-06-10 19:40:20,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.99 | bwd_microstep: 1447.49 | bwd_inner_microstep: 1447.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 19:40:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1392.98 | bwd_inner_microstep: 1392.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546 [2024-06-10 19:40:24,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.40 | bwd_microstep: 1200.06 | bwd_inner_microstep: 1200.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 19:40:26,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1412.78 | bwd_inner_microstep: 1412.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 19:40:28,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1390.80 | bwd_inner_microstep: 1390.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812 [2024-06-10 19:40:30,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.84 | bwd_microstep: 1617.15 | bwd_inner_microstep: 1617.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3533 [2024-06-10 19:40:32,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1453.94 | bwd_inner_microstep: 1453.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 19:40:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3591 [2024-06-10 19:40:36,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3556 [2024-06-10 19:40:38,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.09 | bwd_microstep: 1358.12 | bwd_inner_microstep: 1358.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2917 [2024-06-10 19:40:39,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.25 | bwd_microstep: 1190.05 | bwd_inner_microstep: 1190.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 19:40:45,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.28 | optimizer_step: 6.63 [2024-06-10 19:40:45,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.62 | bwd_microstep: 5416.75 | bwd_inner_microstep: 1867.45 | bwd_allreduce_microstep: 3549.25 | step_microstep: 38.97 [2024-06-10 19:40:46,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16166.95 | bwd: 46848.27 | bwd_inner: 43298.10 | bwd_allreduce: 3549.48 | step: 40.53 {'loss': 1.1933, 'learning_rate': 1.2285688234041575e-05, 'epoch': 0.64} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 19:40:48,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1488.70 | bwd_inner_microstep: 1488.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 19:40:49,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1343.18 | bwd_inner_microstep: 1343.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 19:40:51,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.21 | bwd_microstep: 1310.77 | bwd_inner_microstep: 1310.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875 [2024-06-10 19:40:53,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.83 | bwd_microstep: 1581.59 | bwd_inner_microstep: 1581.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 19:40:56,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-10 19:40:58,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.91 | bwd_microstep: 1640.79 | bwd_inner_microstep: 1640.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 19:40:59,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1149.41 | bwd_inner_microstep: 1149.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1892 [2024-06-10 19:41:00,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.43 | bwd_microstep: 683.78 | bwd_inner_microstep: 683.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 19:41:02,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.08 | bwd_microstep: 1284.53 | bwd_inner_microstep: 1284.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 19:41:04,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 19:41:05,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.81 | bwd_microstep: 1154.66 | bwd_inner_microstep: 1154.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 19:41:07,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.62 | bwd_microstep: 1394.65 | bwd_inner_microstep: 1394.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1981 [2024-06-10 19:41:09,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 830.55 | bwd_inner_microstep: 830.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-10 19:41:11,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1530.51 | bwd_inner_microstep: 1530.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2088 [2024-06-10 19:41:12,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.38 | bwd_microstep: 919.95 | bwd_inner_microstep: 919.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 19:41:14,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.81 | bwd_microstep: 1612.70 | bwd_inner_microstep: 1612.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 19:41:16,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1471.92 | bwd_inner_microstep: 1471.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3643 [2024-06-10 19:41:18,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.40 | bwd_microstep: 1472.81 | bwd_inner_microstep: 1472.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535 [2024-06-10 19:41:20,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1421.60 | bwd_inner_microstep: 1421.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3583 [2024-06-10 19:41:22,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.81 | bwd_microstep: 1208.30 | bwd_inner_microstep: 1208.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-10 19:41:24,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.25 | bwd_microstep: 1613.97 | bwd_inner_microstep: 1613.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 19:41:26,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.38 | bwd_microstep: 1485.45 | bwd_inner_microstep: 1485.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3617 [2024-06-10 19:41:28,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1341.77 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2344 [2024-06-10 19:41:29,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.41 | bwd_microstep: 987.67 | bwd_inner_microstep: 987.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 19:41:31,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.83 | bwd_microstep: 1277.11 | bwd_inner_microstep: 1277.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 19:41:33,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1298.79 | bwd_inner_microstep: 1298.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3693 [2024-06-10 19:41:35,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.90 | bwd_microstep: 1453.67 | bwd_inner_microstep: 1453.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 19:41:37,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.10 | bwd_microstep: 1409.86 | bwd_inner_microstep: 1409.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-10 19:41:39,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1412.57 | bwd_inner_microstep: 1412.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 19:41:41,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1497.49 | bwd_inner_microstep: 1497.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2073 [2024-06-10 19:41:42,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.44 | bwd_microstep: 1010.23 | bwd_inner_microstep: 1010.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-10 19:41:46,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 19:41:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.16 | bwd_microstep: 3351.36 | bwd_inner_microstep: 1458.82 | bwd_allreduce_microstep: 1892.48 | step_microstep: 37.90 [2024-06-10 19:41:46,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.37 | bwd: 44439.31 | bwd_inner: 42545.92 | bwd_allreduce: 1892.71 | step: 39.51 {'loss': 1.2369, 'learning_rate': 1.2251072339868997e-05, 'epoch': 0.64} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 19:41:48,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1332.14 | bwd_inner_microstep: 1332.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3391 [2024-06-10 19:41:50,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.40 | bwd_microstep: 1143.95 | bwd_inner_microstep: 1143.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2310 [2024-06-10 19:41:51,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.00 | bwd_microstep: 979.78 | bwd_inner_microstep: 979.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2321 [2024-06-10 19:41:52,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.49 | bwd_microstep: 820.57 | bwd_inner_microstep: 820.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3793 [2024-06-10 19:41:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.62 | bwd_microstep: 1646.01 | bwd_inner_microstep: 1645.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 19:41:56,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.44 | bwd_microstep: 1451.73 | bwd_inner_microstep: 1451.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-10 19:41:58,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1410.81 | bwd_inner_microstep: 1410.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 19:41:59,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.92 | bwd_microstep: 678.99 | bwd_inner_microstep: 678.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 19:42:00,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.71 | bwd_microstep: 804.26 | bwd_inner_microstep: 804.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 19:42:02,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.35 | bwd_microstep: 1317.55 | bwd_inner_microstep: 1317.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670 [2024-06-10 19:42:04,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.59 | bwd_microstep: 1512.33 | bwd_inner_microstep: 1512.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2906 [2024-06-10 19:42:06,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.95 | bwd_microstep: 1160.16 | bwd_inner_microstep: 1160.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3686 [2024-06-10 19:42:08,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1328.65 | bwd_inner_microstep: 1328.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967 [2024-06-10 19:42:09,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.87 | bwd_microstep: 889.44 | bwd_inner_microstep: 889.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3401 [2024-06-10 19:42:11,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.71 | bwd_microstep: 1197.48 | bwd_inner_microstep: 1197.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 19:42:12,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.26 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655 [2024-06-10 19:42:14,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.26 | bwd_microstep: 1417.14 | bwd_inner_microstep: 1417.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 19:42:16,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1398.11 | bwd_inner_microstep: 1398.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1395 [2024-06-10 19:42:17,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.36 | bwd_microstep: 527.47 | bwd_inner_microstep: 527.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-10 19:42:19,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1507.23 | bwd_inner_microstep: 1507.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297 [2024-06-10 19:42:20,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.18 | bwd_microstep: 880.98 | bwd_inner_microstep: 880.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-10 19:42:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1515.03 | bwd_inner_microstep: 1515.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 19:42:25,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.36 | bwd_microstep: 1655.40 | bwd_inner_microstep: 1655.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 19:42:26,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.35 | bwd_microstep: 1255.74 | bwd_inner_microstep: 1255.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 19:42:28,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.77 | bwd_microstep: 1402.13 | bwd_inner_microstep: 1402.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-10 19:42:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.01 | bwd_microstep: 811.30 | bwd_inner_microstep: 811.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864 [2024-06-10 19:42:31,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.40 | bwd_microstep: 1465.22 | bwd_inner_microstep: 1465.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762 [2024-06-10 19:42:34,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.29 | bwd_microstep: 1474.47 | bwd_inner_microstep: 1474.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3575 [2024-06-10 19:42:35,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.50 | bwd_microstep: 1236.22 | bwd_inner_microstep: 1236.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2240 [2024-06-10 19:42:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.02 | bwd_microstep: 964.56 | bwd_inner_microstep: 964.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2247 [2024-06-10 19:42:38,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.33 | bwd_microstep: 902.17 | bwd_inner_microstep: 902.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3593 [2024-06-10 19:42:48,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-10 19:42:48,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.84 | bwd_microstep: 9164.91 | bwd_inner_microstep: 1874.73 | bwd_allreduce_microstep: 7290.12 | step_microstep: 37.99 [2024-06-10 19:42:48,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14566.33 | bwd: 46498.11 | bwd_inner: 39207.09 | bwd_allreduce: 7290.36 | step: 39.45 {'loss': 1.2101, 'learning_rate': 1.221648373737935e-05, 'epoch': 0.64} 27, 61.09s/it] 64%|██████▎ | 1098/1726 [19:01:18<10:39:23, 61.09s/it] 64%|██████▎ | 1098/1726 [19:01:18<10:39:23, 61.09s/it] 64%|██████▎ | 1099/1726 [19:02:19<10:36:56, 60.95s/it] 64%|██████▎ | 1099/1726 [19:02:19<10:36:56, 60.95s/it] 64%|██████▎ | 1100/1726 [19:03:22<10:43:27, 61.67s/it] 64%|██████▎ | 1100/1726 [19:03:22<10:43:27, 61.67s/it] 64%|██████▍ | 1101/1726 [19:04:23<10:39:16, 61.37s/it] 64%|██████▍ | 1101/1726 [19:04:23<10:39:16, 61.37s/it] 64%|██████▍ | 1102/1726 [19:05:24<10:38:18, 61.38s/it] 64%|██████▍ | 1102/1726 [19:05:24<10:38:18, 61.38dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960 [2024-06-10 19:42:49,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 698.60 | bwd_inner_microstep: 698.47 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4157 [2024-06-10 19:42:51,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.03 | bwd_microstep: 1638.11 | bwd_inner_microstep: 1638.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841 [2024-06-10 19:42:53,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1653.06 | bwd_inner_microstep: 1653.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-10 19:42:54,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.67 | bwd_microstep: 971.67 | bwd_inner_microstep: 971.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 19:42:56,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.33 | bwd_microstep: 1341.29 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 19:42:58,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.56 | bwd_microstep: 1274.98 | bwd_inner_microstep: 1274.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 19:43:00,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.81 | bwd_microstep: 1146.98 | bwd_inner_microstep: 1146.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754 [2024-06-10 19:43:02,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.75 | bwd_microstep: 1537.99 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2051 [2024-06-10 19:43:03,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.26 | bwd_microstep: 863.80 | bwd_inner_microstep: 863.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910 [2024-06-10 19:43:04,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.49 | bwd_microstep: 748.76 | bwd_inner_microstep: 748.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 19:43:06,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.83 | bwd_microstep: 1378.59 | bwd_inner_microstep: 1378.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3462 [2024-06-10 19:43:08,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.88 | bwd_microstep: 1566.01 | bwd_inner_microstep: 1565.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3502 [2024-06-10 19:43:10,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1510.07 | bwd_inner_microstep: 1510.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489 [2024-06-10 19:43:12,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.39 | bwd_microstep: 1581.55 | bwd_inner_microstep: 1581.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 19:43:15,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.36 | bwd_microstep: 1647.89 | bwd_inner_microstep: 1647.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 19:43:17,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1500.92 | bwd_inner_microstep: 1500.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 19:43:19,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-10 19:43:21,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1349.82 | bwd_inner_microstep: 1349.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680 [2024-06-10 19:43:23,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.94 | bwd_microstep: 1626.12 | bwd_inner_microstep: 1626.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 19:43:25,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.94 | bwd_microstep: 1558.86 | bwd_inner_microstep: 1558.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454 [2024-06-10 19:43:27,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.81 | bwd_microstep: 1219.86 | bwd_inner_microstep: 1219.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 19:43:29,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.54 | bwd_microstep: 1554.06 | bwd_inner_microstep: 1554.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 19:43:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.39 | bwd_microstep: 1508.74 | bwd_inner_microstep: 1508.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425 [2024-06-10 19:43:33,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.39 | bwd_microstep: 1211.37 | bwd_inner_microstep: 1211.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545 [2024-06-10 19:43:35,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.80 | bwd_microstep: 1585.12 | bwd_inner_microstep: 1585.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3814 [2024-06-10 19:43:37,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.31 | bwd_microstep: 1716.80 | bwd_inner_microstep: 1716.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-10 19:43:39,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.18 | bwd_microstep: 1584.54 | bwd_inner_microstep: 1584.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2620 [2024-06-10 19:43:41,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.64 | bwd_microstep: 1111.19 | bwd_inner_microstep: 1111.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 19:43:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.31 | bwd_microstep: 1302.12 | bwd_inner_microstep: 1302.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3606 [2024-06-10 19:43:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3568 [2024-06-10 19:43:46,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.30 | bwd_microstep: 1449.55 | bwd_inner_microstep: 1449.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960 [2024-06-10 19:43:48,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 19:43:48,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.29 | bwd_microstep: 1129.57 | bwd_inner_microstep: 740.72 | bwd_allreduce_microstep: 388.80 | step_microstep: 37.69 [2024-06-10 19:43:48,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16173.06 | bwd: 43816.62 | bwd_inner: 43426.83 | bwd_allreduce: 389.06 | step: 39.24 {'loss': 1.1748, 'learning_rate': 1.2181922548393519e-05, 'epoch': 0.64} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-10 19:43:50,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.33 | bwd_microstep: 1442.16 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 19:43:52,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 19:43:54,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.39 | bwd_microstep: 1483.38 | bwd_inner_microstep: 1483.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 19:43:56,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1476.56 | bwd_inner_microstep: 1476.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 19:43:58,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.37 | bwd_microstep: 1383.46 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 19:44:00,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1479.92 | bwd_inner_microstep: 1479.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 19:44:02,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1636.58 | bwd_inner_microstep: 1636.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868 [2024-06-10 19:44:03,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.56 | bwd_microstep: 708.80 | bwd_inner_microstep: 708.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 19:44:05,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1283.91 | bwd_inner_microstep: 1283.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3408 [2024-06-10 19:44:07,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1305.51 | bwd_inner_microstep: 1305.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 19:44:09,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1442.86 | bwd_inner_microstep: 1442.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655 [2024-06-10 19:44:11,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.39 | bwd_microstep: 1612.38 | bwd_inner_microstep: 1612.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3689 [2024-06-10 19:44:13,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.38 | bwd_microstep: 1423.72 | bwd_inner_microstep: 1423.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-10 19:44:15,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.27 | bwd_microstep: 1423.46 | bwd_inner_microstep: 1423.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3461 [2024-06-10 19:44:17,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.38 | bwd_microstep: 1246.96 | bwd_inner_microstep: 1246.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659 [2024-06-10 19:44:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1425.50 | bwd_inner_microstep: 1425.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2088 [2024-06-10 19:44:20,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.56 | bwd_microstep: 727.48 | bwd_inner_microstep: 727.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634 [2024-06-10 19:44:22,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.19 | bwd_microstep: 1511.24 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2088 [2024-06-10 19:44:23,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.22 | bwd_microstep: 791.12 | bwd_inner_microstep: 791.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 19:44:25,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.76 | bwd_microstep: 1658.38 | bwd_inner_microstep: 1658.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 19:44:26,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 977.35 | bwd_inner_microstep: 977.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-10 19:44:29,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1524.48 | bwd_inner_microstep: 1524.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3824 [2024-06-10 19:44:30,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.57 | bwd_microstep: 1293.47 | bwd_inner_microstep: 1293.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3719 [2024-06-10 19:44:32,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.97 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 19:44:33,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.35 | bwd_microstep: 882.18 | bwd_inner_microstep: 882.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-10 19:44:34,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.18 | bwd_microstep: 815.49 | bwd_inner_microstep: 815.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 19:44:36,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1445.85 | bwd_inner_microstep: 1445.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1746 [2024-06-10 19:44:37,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 244.14 | bwd_microstep: 627.55 | bwd_inner_microstep: 627.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3805 [2024-06-10 19:44:39,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.28 | bwd_microstep: 1509.63 | bwd_inner_microstep: 1509.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 19:44:41,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1348.90 | bwd_inner_microstep: 1348.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450 [2024-06-10 19:44:43,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1403.70 | bwd_inner_microstep: 1403.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 19:44:50,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 19:44:50,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 6524.86 | bwd_inner_microstep: 1643.96 | bwd_allreduce_microstep: 4880.85 | step_microstep: 37.80 [2024-06-10 19:44:50,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15539.63 | bwd: 46538.20 | bwd_inner: 41656.38 | bwd_allreduce: 4881.13 | step: 39.26 {'loss': 1.208, 'learning_rate': 1.2147388894635832e-05, 'epoch': 0.64} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-10 19:44:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.23 | bwd_microstep: 1269.32 | bwd_inner_microstep: 1269.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 19:44:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1411.03 | bwd_inner_microstep: 1411.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-10 19:44:56,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 19:44:58,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1389.59 | bwd_inner_microstep: 1389.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3768 [2024-06-10 19:45:00,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1505.53 | bwd_inner_microstep: 1505.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 19:45:02,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-10 19:45:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.08 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 19:45:05,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1252.64 | bwd_inner_microstep: 1252.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-10 19:45:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1320.01 | bwd_inner_microstep: 1319.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 19:45:09,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1345.71 | bwd_inner_microstep: 1345.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 19:45:10,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.33 | bwd_microstep: 783.08 | bwd_inner_microstep: 783.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 19:45:12,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994 [2024-06-10 19:45:13,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.64 | bwd_microstep: 897.21 | bwd_inner_microstep: 897.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3034 [2024-06-10 19:45:15,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.32 | bwd_microstep: 1230.60 | bwd_inner_microstep: 1230.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 19:45:17,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.54 | bwd_microstep: 1514.11 | bwd_inner_microstep: 1514.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-10 19:45:18,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.10 | bwd_microstep: 891.14 | bwd_inner_microstep: 891.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-10 19:45:20,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3718 [2024-06-10 19:45:22,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.03 | bwd_microstep: 1562.56 | bwd_inner_microstep: 1562.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3532 [2024-06-10 19:45:24,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1277.17 | bwd_inner_microstep: 1277.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3468 [2024-06-10 19:45:26,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1325.67 | bwd_inner_microstep: 1325.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-10 19:45:28,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.83 | bwd_microstep: 1693.85 | bwd_inner_microstep: 1693.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 19:45:30,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.90 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1903 [2024-06-10 19:45:31,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.34 | bwd_microstep: 684.33 | bwd_inner_microstep: 684.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 19:45:33,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1294.89 | bwd_inner_microstep: 1294.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3563 [2024-06-10 19:45:35,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.13 | bwd_microstep: 1204.71 | bwd_inner_microstep: 1204.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3828 [2024-06-10 19:45:37,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1296.64 | bwd_inner_microstep: 1296.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 971 [2024-06-10 19:45:37,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 150.78 | bwd_microstep: 386.45 | bwd_inner_microstep: 386.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 19:45:39,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.54 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 19:45:41,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.79 | bwd_microstep: 1458.46 | bwd_inner_microstep: 1458.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2248 [2024-06-10 19:45:42,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.60 | bwd_microstep: 842.34 | bwd_inner_microstep: 842.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1953 [2024-06-10 19:45:43,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.17 | bwd_inner_microstep: 701.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045 [2024-06-10 19:45:52,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 19:45:52,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.03 | bwd_microstep: 8140.24 | bwd_inner_microstep: 930.71 | bwd_allreduce_microstep: 7209.47 | step_microstep: 38.02 [2024-06-10 19:45:52,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14685.16 | bwd: 46347.70 | bwd_inner: 39137.32 | bwd_allreduce: 7209.70 | step: 39.45 {'loss': 1.1387, 'learning_rate': 1.2112882897733634e-05, 'epoch': 0.64} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 19:45:53,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.47 | bwd_microstep: 1266.14 | bwd_inner_microstep: 1266.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063 [2024-06-10 19:45:55,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.25 | bwd_microstep: 815.13 | bwd_inner_microstep: 815.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4260 [2024-06-10 19:45:57,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.21 | bwd_microstep: 1666.99 | bwd_inner_microstep: 1666.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827 [2024-06-10 19:45:59,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1518.57 | bwd_inner_microstep: 1518.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829 [2024-06-10 19:46:01,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 19:46:03,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.47 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 19:46:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1395.08 | bwd_inner_microstep: 1395.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 19:46:06,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 19:46:08,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-10 19:46:10,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1312.51 | bwd_inner_microstep: 1312.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3500 [2024-06-10 19:46:12,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.23 | bwd_microstep: 1220.85 | bwd_inner_microstep: 1220.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3490 [2024-06-10 19:46:14,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.23 | bwd_microstep: 1508.56 | bwd_inner_microstep: 1508.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379 [2024-06-10 19:46:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.79 | bwd_microstep: 1239.18 | bwd_inner_microstep: 1239.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2487 [2024-06-10 19:46:17,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.53 | bwd_microstep: 952.99 | bwd_inner_microstep: 952.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3648 [2024-06-10 19:46:19,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.05 | bwd_microstep: 1312.67 | bwd_inner_microstep: 1312.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 19:46:21,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 19:46:23,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1372.87 | bwd_inner_microstep: 1372.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465 [2024-06-10 19:46:25,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1216.14 | bwd_inner_microstep: 1216.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2293 [2024-06-10 19:46:26,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.19 | bwd_microstep: 879.47 | bwd_inner_microstep: 879.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 19:46:28,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1490 [2024-06-10 19:46:28,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 210.06 | bwd_microstep: 546.08 | bwd_inner_microstep: 546.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 19:46:30,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.73 | bwd_microstep: 1384.85 | bwd_inner_microstep: 1384.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-10 19:46:31,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.94 | bwd_microstep: 801.21 | bwd_inner_microstep: 801.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-10 19:46:33,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.09 | bwd_microstep: 1353.61 | bwd_inner_microstep: 1353.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-10 19:46:35,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1414.80 | bwd_inner_microstep: 1414.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3811 [2024-06-10 19:46:37,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3614 [2024-06-10 19:46:39,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1572.76 | bwd_inner_microstep: 1572.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824 [2024-06-10 19:46:42,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.80 | bwd_microstep: 1755.28 | bwd_inner_microstep: 1755.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3585 [2024-06-10 19:46:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.49 | bwd_microstep: 1463.64 | bwd_inner_microstep: 1463.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2252 [2024-06-10 19:46:45,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.20 | bwd_microstep: 869.09 | bwd_inner_microstep: 869.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 19:46:47,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1635.38 | bwd_inner_microstep: 1635.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2908 [2024-06-10 19:46:53,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 19:46:53,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.12 | bwd_microstep: 5735.63 | bwd_inner_microstep: 1276.73 | bwd_allreduce_microstep: 4458.85 | step_microstep: 37.68 [2024-06-10 19:46:53,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15507.74 | bwd: 45987.38 | bwd_inner: 41527.63 | bwd_allreduce: 4459.08 | step: 39.13 {'loss': 1.1551, 'learning_rate': 1.2078404679216864e-05, 'epoch': 0.64} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 19:46:55,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1381.47 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860 [2024-06-10 19:46:58,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.56 | bwd_microstep: 1558.41 | bwd_inner_microstep: 1558.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 19:47:00,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462 [2024-06-10 19:47:01,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.37 | bwd_microstep: 1211.12 | bwd_inner_microstep: 1211.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 19:47:02,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 793.48 | bwd_inner_microstep: 793.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:47:04,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 19:47:06,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4062 [2024-06-10 19:47:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1585.89 | bwd_inner_microstep: 1585.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 19:47:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.63 | bwd_microstep: 1386.38 | bwd_inner_microstep: 1386.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2077 [2024-06-10 19:47:11,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.06 | bwd_microstep: 851.74 | bwd_inner_microstep: 851.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697 [2024-06-10 19:47:13,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.93 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-10 19:47:15,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1346.82 | bwd_inner_microstep: 1346.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670 [2024-06-10 19:47:17,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.55 | bwd_microstep: 1513.18 | bwd_inner_microstep: 1513.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-10 19:47:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.72 | bwd_microstep: 885.42 | bwd_inner_microstep: 885.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 19:47:20,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.78 | bwd_microstep: 1257.23 | bwd_inner_microstep: 1257.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3883 [2024-06-10 19:47:23,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.08 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 19:47:24,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 794.34 | bwd_inner_microstep: 794.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3504 [2024-06-10 19:47:26,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1513.15 | bwd_inner_microstep: 1513.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 19:47:28,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 19:47:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.30 | bwd_microstep: 1548.25 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 19:47:32,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 19:47:34,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 19:47:35,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1252.95 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 19:47:37,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1295.63 | bwd_inner_microstep: 1295.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 19:47:39,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1397.05 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 19:47:41,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1416.01 | bwd_inner_microstep: 1415.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 19:47:43,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.76 | bwd_microstep: 1449.63 | bwd_inner_microstep: 1449.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-10 19:47:45,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1497.36 | bwd_inner_microstep: 1497.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-10 19:47:47,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1297.44 | bwd_inner_microstep: 1297.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3574 [2024-06-10 19:47:49,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.44 | bwd_microstep: 1523.64 | bwd_inner_microstep: 1523.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577 [2024-06-10 19:47:51,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2044 [2024-06-10 19:47:54,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-10 19:47:54,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.42 | bwd_microstep: 2261.56 | bwd_inner_microstep: 965.52 | bwd_allreduce_microstep: 1296.00 | step_microstep: 37.67 [2024-06-10 19:47:54,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15962.49 | bwd: 44083.12 | bwd_inner: 42786.22 | bwd_allreduce: 1296.23 | step: 39.12 {'loss': 1.2537, 'learning_rate': 1.2043954360517635e-05, 'epoch': 0.64} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612 [2024-06-10 19:47:56,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1331.60 | bwd_inner_microstep: 1331.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975 [2024-06-10 19:47:58,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1602.85 | bwd_inner_microstep: 1602.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851 [2024-06-10 19:48:00,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.45 | bwd_microstep: 1557.81 | bwd_inner_microstep: 1557.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 19:48:02,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 19:48:04,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3740 [2024-06-10 19:48:06,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 19:48:08,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1389.73 | bwd_inner_microstep: 1389.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4112 [2024-06-10 19:48:10,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.88 | bwd_microstep: 1734.93 | bwd_inner_microstep: 1734.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 19:48:12,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1386.28 | bwd_inner_microstep: 1386.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686 [2024-06-10 19:48:14,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1522.87 | bwd_inner_microstep: 1522.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502 [2024-06-10 19:48:16,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1439.54 | bwd_inner_microstep: 1439.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 19:48:18,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1471.23 | bwd_inner_microstep: 1471.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-10 19:48:20,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1241.83 | bwd_inner_microstep: 1241.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 19:48:22,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.30 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 19:48:24,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1244.17 | bwd_inner_microstep: 1244.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-10 19:48:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.47 | bwd_microstep: 1645.50 | bwd_inner_microstep: 1645.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-10 19:48:28,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.95 | bwd_microstep: 1244.23 | bwd_inner_microstep: 1244.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 19:48:29,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.88 | bwd_microstep: 1391.50 | bwd_inner_microstep: 1391.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446 [2024-06-10 19:48:31,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1287.33 | bwd_inner_microstep: 1287.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 19:48:33,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.24 | bwd_microstep: 1349.44 | bwd_inner_microstep: 1349.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-10 19:48:35,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.39 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 19:48:37,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1554.87 | bwd_inner_microstep: 1554.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:48:39,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1378.99 | bwd_inner_microstep: 1378.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 19:48:41,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1249.79 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 19:48:43,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.82 | bwd_microstep: 1456.88 | bwd_inner_microstep: 1456.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-10 19:48:44,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 974.14 | bwd_inner_microstep: 974.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 19:48:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.03 | bwd_microstep: 1551.46 | bwd_inner_microstep: 1551.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 19:48:49,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.87 | bwd_microstep: 1528.88 | bwd_inner_microstep: 1528.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2055 [2024-06-10 19:48:50,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.70 | bwd_microstep: 872.95 | bwd_inner_microstep: 872.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-10 19:48:52,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.25 | bwd_microstep: 1516.01 | bwd_inner_microstep: 1515.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3413 [2024-06-10 19:48:54,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.71 | bwd_microstep: 1465.13 | bwd_inner_microstep: 1465.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 19:48:56,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-10 19:48:56,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1452.98 | bwd_inner_microstep: 1377.30 | bwd_allreduce_microstep: 75.63 | step_microstep: 37.75 [2024-06-10 19:48:56,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16767.28 | bwd: 44962.57 | bwd_inner: 44886.04 | bwd_allreduce: 75.85 | step: 39.24 s/it] 64%|██████▍ | 1103/1726 [19:06:25<10:34:00, 61.06s/it] 64%|██████▍ | 1103/1726 [19:06:25<10:34:00, 61.06s/it] 64%|██████▍ | 1104/1726 [19:07:27<10:37:11, 61.46s/it] 64%|██████▍ | 1104/1726 [19:07:27<10:37:11, 61.46s/it] 64%|██████▍ | 1105/1726 [19:08:28<10:35:48, 61.43s/it] 64%|██████▍ | 1105/1726 [19:08:28<10:35:48, 61.43s/it] 64%|██████▍ | 1106/1726 [19:09:30<10:35:58, 61.55s/it] 64%|██████▍ | 1106/1726 [19:09:30<10:35:58, 61.55s/it] 64%|██████▍ | 1107/1726 [19:10:31<10:31:18, 61.19s/it] 64%|██████▍ | 1107/1726 [19:10:31<10:31:18, 61.19s/it] 64{'loss': 1.2238, 'learning_rate': 1.2009532062969801e-05, 'epoch': 0.64} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 19:48:58,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1373.92 | bwd_inner_microstep: 1373.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3910 [2024-06-10 19:49:00,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1421.35 | bwd_inner_microstep: 1421.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858 [2024-06-10 19:49:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.10 | bwd_microstep: 1560.32 | bwd_inner_microstep: 1560.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 19:49:04,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403 [2024-06-10 19:49:05,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.63 | bwd_microstep: 1209.33 | bwd_inner_microstep: 1209.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3753 [2024-06-10 19:49:08,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.91 | bwd_microstep: 1586.29 | bwd_inner_microstep: 1586.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 19:49:09,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.28 | bwd_microstep: 806.92 | bwd_inner_microstep: 806.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 19:49:11,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 19:49:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 1249.70 | bwd_inner_microstep: 1249.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3692 [2024-06-10 19:49:15,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1555.54 | bwd_inner_microstep: 1555.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3406 [2024-06-10 19:49:16,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1391.13 | bwd_inner_microstep: 1391.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2012 [2024-06-10 19:49:18,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.12 | bwd_microstep: 862.20 | bwd_inner_microstep: 862.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-10 19:49:20,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1412.53 | bwd_inner_microstep: 1412.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642 [2024-06-10 19:49:22,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.96 | bwd_microstep: 1710.74 | bwd_inner_microstep: 1710.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644 [2024-06-10 19:49:24,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.52 | bwd_microstep: 1504.39 | bwd_inner_microstep: 1504.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3706 [2024-06-10 19:49:26,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.90 | bwd_microstep: 1578.95 | bwd_inner_microstep: 1578.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3845 [2024-06-10 19:49:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1463.40 | bwd_inner_microstep: 1463.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 19:49:30,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1513.53 | bwd_inner_microstep: 1513.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987 [2024-06-10 19:49:31,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.12 | bwd_microstep: 707.15 | bwd_inner_microstep: 707.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3578 [2024-06-10 19:49:33,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1206.37 | bwd_inner_microstep: 1206.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-10 19:49:35,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.09 | bwd_microstep: 1606.48 | bwd_inner_microstep: 1606.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3620 [2024-06-10 19:49:37,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1440.97 | bwd_inner_microstep: 1440.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3450 [2024-06-10 19:49:39,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.44 | bwd_microstep: 1187.53 | bwd_inner_microstep: 1187.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 19:49:41,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.36 | bwd_microstep: 1372.21 | bwd_inner_microstep: 1372.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 19:49:43,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.97 | bwd_microstep: 1438.96 | bwd_inner_microstep: 1438.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919 [2024-06-10 19:49:44,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.10 | bwd_microstep: 1193.62 | bwd_inner_microstep: 1193.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 19:49:46,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.42 | bwd_microstep: 1297.97 | bwd_inner_microstep: 1297.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 19:49:48,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.48 | bwd_microstep: 1185.14 | bwd_inner_microstep: 1185.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-10 19:49:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1512.78 | bwd_inner_microstep: 1512.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803 [2024-06-10 19:49:52,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.51 | bwd_microstep: 1618.54 | bwd_inner_microstep: 1618.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3001 [2024-06-10 19:49:54,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.77 | bwd_microstep: 1203.47 | bwd_inner_microstep: 1203.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3584 [2024-06-10 19:49:56,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-10 19:49:56,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.72 | bwd_microstep: 1786.80 | bwd_inner_microstep: 1628.22 | bwd_allreduce_microstep: 158.53 | step_microstep: 37.90 [2024-06-10 19:49:56,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16296.87 | bwd: 43712.18 | bwd_inner: 43552.75 | bwd_allreduce: 158.76 | step: 39.41 {'loss': 1.247, 'learning_rate': 1.1975137907808492e-05, 'epoch': 0.64} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-10 19:49:58,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1509.49 | bwd_inner_microstep: 1509.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952 [2024-06-10 19:50:01,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.92 | bwd_microstep: 1694.04 | bwd_inner_microstep: 1694.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3804 [2024-06-10 19:50:03,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1551.79 | bwd_inner_microstep: 1551.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-10 19:50:05,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.77 | bwd_microstep: 1441.83 | bwd_inner_microstep: 1441.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 19:50:07,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.64 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742 [2024-06-10 19:50:09,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.36 | bwd_microstep: 1632.03 | bwd_inner_microstep: 1632.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-10 19:50:10,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1215.47 | bwd_inner_microstep: 1215.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709 [2024-06-10 19:50:13,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.79 | bwd_microstep: 1625.82 | bwd_inner_microstep: 1625.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 19:50:15,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.38 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 19:50:16,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-10 19:50:18,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.52 | bwd_microstep: 1422.14 | bwd_inner_microstep: 1422.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2661 [2024-06-10 19:50:20,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.91 | bwd_microstep: 1082.05 | bwd_inner_microstep: 1082.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3718 [2024-06-10 19:50:22,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.19 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2133 [2024-06-10 19:50:23,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.70 | bwd_microstep: 1023.06 | bwd_inner_microstep: 1023.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910 [2024-06-10 19:50:25,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.08 | bwd_microstep: 1665.63 | bwd_inner_microstep: 1665.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2143 [2024-06-10 19:50:26,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.49 | bwd_microstep: 740.77 | bwd_inner_microstep: 740.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527 [2024-06-10 19:50:29,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.14 | bwd_microstep: 1581.61 | bwd_inner_microstep: 1581.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3420 [2024-06-10 19:50:31,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.03 | bwd_microstep: 1612.84 | bwd_inner_microstep: 1612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3380 [2024-06-10 19:50:32,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.59 | bwd_microstep: 1241.68 | bwd_inner_microstep: 1241.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2911 [2024-06-10 19:50:34,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.46 | bwd_microstep: 1280.30 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3668 [2024-06-10 19:50:36,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.60 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287 [2024-06-10 19:50:38,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 876.74 | bwd_inner_microstep: 876.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820 [2024-06-10 19:50:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.17 | bwd_microstep: 1852.53 | bwd_inner_microstep: 1852.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-10 19:50:41,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.96 | bwd_microstep: 912.68 | bwd_inner_microstep: 912.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-10 19:50:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.12 | bwd_microstep: 1610.99 | bwd_inner_microstep: 1610.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 19:50:46,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1503.66 | bwd_inner_microstep: 1503.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 19:50:48,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.29 | bwd_microstep: 1346.56 | bwd_inner_microstep: 1346.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 19:50:49,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 19:50:51,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1354.77 | bwd_inner_microstep: 1354.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549 [2024-06-10 19:50:53,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1326.87 | bwd_inner_microstep: 1326.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576 [2024-06-10 19:50:55,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.02 | bwd_microstep: 1599.70 | bwd_inner_microstep: 1599.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 19:50:57,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-10 19:50:57,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.03 | bwd_microstep: 1323.02 | bwd_inner_microstep: 1314.10 | bwd_allreduce_microstep: 8.88 | step_microstep: 37.55 [2024-06-10 19:50:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16234.77 | bwd: 44420.35 | bwd_inner: 44410.58 | bwd_allreduce: 9.11 | step: 39.10 {'loss': 1.2276, 'learning_rate': 1.1940772016169753e-05, 'epoch': 0.64} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 19:50:59,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1473.62 | bwd_inner_microstep: 1473.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-10 19:51:01,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.80 | bwd_microstep: 1182.05 | bwd_inner_microstep: 1182.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037 [2024-06-10 19:51:02,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.04 | bwd_microstep: 808.66 | bwd_inner_microstep: 808.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3554 [2024-06-10 19:51:04,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.97 | bwd_microstep: 1360.33 | bwd_inner_microstep: 1360.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 19:51:05,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.96 | bwd_microstep: 791.40 | bwd_inner_microstep: 791.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 19:51:06,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.11 | bwd_microstep: 790.66 | bwd_inner_microstep: 790.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-10 19:51:07,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.48 | bwd_microstep: 777.43 | bwd_inner_microstep: 777.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2528 [2024-06-10 19:51:08,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.24 | bwd_microstep: 933.04 | bwd_inner_microstep: 933.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741 [2024-06-10 19:51:10,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.96 | bwd_microstep: 1439.35 | bwd_inner_microstep: 1439.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 19:51:12,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1286.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 19:51:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1953 [2024-06-10 19:51:15,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.93 | bwd_microstep: 823.78 | bwd_inner_microstep: 823.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562 [2024-06-10 19:51:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.21 | bwd_microstep: 1595.55 | bwd_inner_microstep: 1595.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-10 19:51:20,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.06 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439 [2024-06-10 19:51:21,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.76 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 19:51:24,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.80 | bwd_inner_microstep: 1603.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 19:51:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.33 | bwd_microstep: 1494.06 | bwd_inner_microstep: 1494.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-10 19:51:27,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 697.43 | bwd_inner_microstep: 697.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 19:51:28,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130 [2024-06-10 19:51:29,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.96 | bwd_microstep: 830.64 | bwd_inner_microstep: 830.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 19:51:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.87 | bwd_microstep: 1492.70 | bwd_inner_microstep: 1492.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 19:51:33,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1277.49 | bwd_inner_microstep: 1277.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 19:51:35,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.72 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822 [2024-06-10 19:51:37,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1388.92 | bwd_inner_microstep: 1388.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 19:51:39,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.10 | bwd_microstep: 1292.96 | bwd_inner_microstep: 1292.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 19:51:41,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.52 | bwd_microstep: 1660.48 | bwd_inner_microstep: 1660.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461 [2024-06-10 19:51:43,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.16 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-10 19:51:44,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 809.88 | bwd_inner_microstep: 809.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2658 [2024-06-10 19:51:46,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.41 | bwd_microstep: 1117.62 | bwd_inner_microstep: 1117.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-10 19:51:48,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.83 | bwd_microstep: 1593.35 | bwd_inner_microstep: 1593.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2943 [2024-06-10 19:51:50,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.54 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3423 [2024-06-10 19:51:59,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.19 | optimizer_step: 6.64 [2024-06-10 19:51:59,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.40 | bwd_microstep: 8710.11 | bwd_inner_microstep: 1757.13 | bwd_allreduce_microstep: 6952.93 | step_microstep: 37.91 [2024-06-10 19:51:59,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14826.60 | bwd: 46682.22 | bwd_inner: 39728.23 | bwd_allreduce: 6953.23 | step: 39.43 {'loss': 1.2272, 'learning_rate': 1.190643450909008e-05, 'epoch': 0.64} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 19:52:00,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.04 | bwd_microstep: 784.93 | bwd_inner_microstep: 784.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2419 [2024-06-10 19:52:01,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.37 | bwd_microstep: 964.07 | bwd_inner_microstep: 964.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 19:52:03,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1382.76 | bwd_inner_microstep: 1382.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 19:52:05,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1339.28 | bwd_inner_microstep: 1339.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950 [2024-06-10 19:52:06,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.16 | bwd_microstep: 728.11 | bwd_inner_microstep: 728.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 19:52:08,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.01 | bwd_microstep: 1144.37 | bwd_inner_microstep: 1144.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2885 [2024-06-10 19:52:09,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.31 | bwd_microstep: 1086.15 | bwd_inner_microstep: 1086.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211 [2024-06-10 19:52:11,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.90 | bwd_microstep: 954.80 | bwd_inner_microstep: 954.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2141 [2024-06-10 19:52:12,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.09 | bwd_microstep: 799.45 | bwd_inner_microstep: 799.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490 [2024-06-10 19:52:14,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1317.88 | bwd_inner_microstep: 1317.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 19:52:16,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1408.66 | bwd_inner_microstep: 1408.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483 [2024-06-10 19:52:18,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1570.46 | bwd_inner_microstep: 1570.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 19:52:20,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1345.41 | bwd_inner_microstep: 1345.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642 [2024-06-10 19:52:22,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.07 | bwd_microstep: 1673.49 | bwd_inner_microstep: 1673.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 19:52:24,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 19:52:26,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1440.41 | bwd_inner_microstep: 1440.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 19:52:28,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 19:52:29,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.40 | bwd_microstep: 797.92 | bwd_inner_microstep: 797.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-10 19:52:31,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.68 | bwd_microstep: 1445.37 | bwd_inner_microstep: 1445.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3475 [2024-06-10 19:52:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.50 | bwd_microstep: 1435.03 | bwd_inner_microstep: 1435.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 19:52:35,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.07 | bwd_microstep: 1598.92 | bwd_inner_microstep: 1598.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3721 [2024-06-10 19:52:37,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1465.93 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295 [2024-06-10 19:52:38,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.83 | bwd_microstep: 879.45 | bwd_inner_microstep: 879.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-10 19:52:40,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.86 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 19:52:42,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1404.28 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 19:52:44,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1298.41 | bwd_inner_microstep: 1298.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 19:52:46,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.57 | bwd_microstep: 1438.81 | bwd_inner_microstep: 1438.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-10 19:52:48,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.68 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 19:52:50,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1394.11 | bwd_inner_microstep: 1394.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 19:52:51,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.91 | bwd_microstep: 1405.49 | bwd_inner_microstep: 1405.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 19:52:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 19:53:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.19 | optimizer_step: 6.58 [2024-06-10 19:53:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.12 | bwd_microstep: 7624.53 | bwd_inner_microstep: 1419.17 | bwd_allreduce_microstep: 6205.31 | step_microstep: 38.11 [2024-06-10 19:53:02,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15263.96 | bwd: 47005.91 | bwd_inner: 40799.69 | bwd_allreduce: 6205.54 | step: 39.52 {'loss': 1.1481, 'learning_rate': 1.1872125507505993e-05, 'epoch': 0.64} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 19:53:04,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1379.79 | bwd_inner_microstep: 1379.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-10 19:53:05,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.39 | bwd_microstep: 775.48 | bwd_inner_microstep: 775.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 19:53:07,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1475.20 | bwd_inner_microstep: 1475.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 19:53:09,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1541.36 | bwd_inner_microstep: 1541.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 19:53:11,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.41 | bwd_microstep: 1549.63 | bwd_inner_microstep: 1549.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 19:53:13,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1386.02 | bwd_inner_microstep: 1385.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1976 [2024-06-10 19:53:14,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.81 | bwd_microstep: 702.00 | bwd_inner_microstep: 701.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 19:53:16,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1379.90 | bwd_inner_microstep: 1379.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 19:53:18,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1402.68 | bwd_inner_microstep: 1402.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3503 [2024-06-10 19:53:19,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1250.70 | bwd_inner_microstep: 1250.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-10 19:53:21,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.25 | bwd_microstep: 1156.90 | bwd_inner_microstep: 1156.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 19:53:23,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.92 | bwd_microstep: 1485.13 | bwd_inner_microstep: 1485.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 19:53:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 794.37 | bwd_inner_microstep: 794.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 19:53:26,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 19:53:28,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1389.22 | bwd_inner_microstep: 1389.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 19:53:30,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646 [2024-06-10 19:53:32,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1410.14 | bwd_inner_microstep: 1410.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3972 [2024-06-10 19:53:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.70 | bwd_microstep: 1638.53 | bwd_inner_microstep: 1638.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3515 [2024-06-10 19:53:36,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.35 | bwd_microstep: 1514.25 | bwd_inner_microstep: 1514.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-10 19:53:38,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1416.78 | bwd_inner_microstep: 1416.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 19:53:40,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.36 | bwd_microstep: 1634.84 | bwd_inner_microstep: 1634.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 19:53:43,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1555.15 | bwd_inner_microstep: 1555.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 19:53:45,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1406.93 | bwd_inner_microstep: 1406.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827 [2024-06-10 19:53:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.02 | bwd_microstep: 1511.83 | bwd_inner_microstep: 1511.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 19:53:49,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 1396.12 | bwd_inner_microstep: 1396.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 19:53:51,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1424.58 | bwd_inner_microstep: 1424.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3852 [2024-06-10 19:53:53,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.84 | bwd_microstep: 1731.95 | bwd_inner_microstep: 1731.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3596 [2024-06-10 19:53:55,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1369.17 | bwd_inner_microstep: 1369.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2439 [2024-06-10 19:53:56,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 947.81 | bwd_inner_microstep: 947.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2041 [2024-06-10 19:53:57,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.95 | bwd_microstep: 747.06 | bwd_inner_microstep: 747.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 19:53:59,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.27 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 19:54:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 19:54:03,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.21 | bwd_microstep: 3381.30 | bwd_inner_microstep: 1577.98 | bwd_allreduce_microstep: 1803.28 | step_microstep: 37.67 [2024-06-10 19:54:03,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16093.04 | bwd: 44963.78 | bwd_inner: 43159.61 | bwd_allreduce: 1803.51 | step: 39.11 %|██████▍ | 1108/1726 [19:11:33<10:32:59, 61.46s/it] 64%|██████▍ | 1108/1726 [19:11:33<10:32:59, 61.46s/it] 64%|██████▍ | 1109/1726 [19:12:33<10:28:32, 61.12s/it] 64%|██████▍ | 1109/1726 [19:12:33<10:28:32, 61.12s/it] 64%|██████▍ | 1110/1726 [19:13:34<10:27:08, 61.09s/it] 64%|██████▍ | 1110/1726 [19:13:34<10:27:08, 61.09s/it] 64%|██████▍ | 1111/1726 [19:14:36<10:28:25, 61.31s/it] 64%|██████▍ | 1111/1726 [19:14:36<10:28:25, 61.31s/it] 64%|██████▍ | 1112/1726 [19:15:38<10:31:20, 61.69s/it] 64%|██████▍ | 1112/1726 [19:15:38<10:31:20, 61.69s/it] 64%|██�{'loss': 1.2004, 'learning_rate': 1.1837845132253615e-05, 'epoch': 0.64} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 19:54:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1263.44 | bwd_inner_microstep: 1263.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 19:54:07,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1351.08 | bwd_inner_microstep: 1351.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 19:54:09,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1552.43 | bwd_inner_microstep: 1552.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2259 [2024-06-10 19:54:10,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.48 | bwd_microstep: 967.56 | bwd_inner_microstep: 967.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1411 [2024-06-10 19:54:11,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.70 | bwd_microstep: 562.53 | bwd_inner_microstep: 562.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435 [2024-06-10 19:54:13,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.23 | bwd_microstep: 1156.81 | bwd_inner_microstep: 1156.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404 [2024-06-10 19:54:14,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.43 | bwd_microstep: 1177.96 | bwd_inner_microstep: 1177.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425 [2024-06-10 19:54:16,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.90 | bwd_microstep: 1214.78 | bwd_inner_microstep: 1214.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 19:54:18,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1397.69 | bwd_inner_microstep: 1397.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 19:54:20,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-10 19:54:22,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.67 | bwd_microstep: 1526.80 | bwd_inner_microstep: 1526.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 712 [2024-06-10 19:54:22,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 115.51 | bwd_microstep: 289.80 | bwd_inner_microstep: 289.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3501 [2024-06-10 19:54:24,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.83 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 19:54:26,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1342.73 | bwd_inner_microstep: 1342.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507 [2024-06-10 19:54:28,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1444.00 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2674 [2024-06-10 19:54:29,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.56 | bwd_microstep: 984.15 | bwd_inner_microstep: 984.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 19:54:31,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.42 | bwd_microstep: 1526.94 | bwd_inner_microstep: 1526.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498 [2024-06-10 19:54:34,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1581.86 | bwd_inner_microstep: 1581.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 19:54:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.46 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 19:54:37,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.71 | bwd_microstep: 1350.49 | bwd_inner_microstep: 1350.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617 [2024-06-10 19:54:40,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.39 | bwd_microstep: 1611.57 | bwd_inner_microstep: 1611.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 19:54:42,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.46 | bwd_microstep: 1553.16 | bwd_inner_microstep: 1553.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699 [2024-06-10 19:54:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.14 | bwd_microstep: 1332.35 | bwd_inner_microstep: 1332.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-10 19:54:46,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1492.54 | bwd_inner_microstep: 1492.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 19:54:48,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1435.29 | bwd_inner_microstep: 1435.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2065 [2024-06-10 19:54:49,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.59 | bwd_microstep: 946.21 | bwd_inner_microstep: 946.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 19:54:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.75 | bwd_microstep: 1591.81 | bwd_inner_microstep: 1591.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 19:54:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1402.04 | bwd_inner_microstep: 1402.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2061 [2024-06-10 19:54:54,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.44 | bwd_microstep: 1007.91 | bwd_inner_microstep: 1007.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559 [2024-06-10 19:54:56,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.27 | bwd_microstep: 1430.55 | bwd_inner_microstep: 1430.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 19:54:58,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.77 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-10 19:55:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 19:55:04,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.74 | bwd_microstep: 5675.16 | bwd_inner_microstep: 1898.96 | bwd_allreduce_microstep: 3776.15 | step_microstep: 37.89 [2024-06-10 19:55:04,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15491.49 | bwd: 45591.33 | bwd_inner: 41814.28 | bwd_allreduce: 3776.38 | step: 39.32 {'loss': 1.2255, 'learning_rate': 1.1803593504068256e-05, 'epoch': 0.65} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400 [2024-06-10 19:55:06,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.71 | bwd_microstep: 1172.56 | bwd_inner_microstep: 1172.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 19:55:08,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1373.72 | bwd_inner_microstep: 1373.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3935 [2024-06-10 19:55:10,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.95 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 19:55:12,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 19:55:13,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.66 | bwd_microstep: 969.87 | bwd_inner_microstep: 969.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764 [2024-06-10 19:55:15,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.28 | bwd_inner_microstep: 1438.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4097 [2024-06-10 19:55:18,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.30 | bwd_microstep: 1565.75 | bwd_inner_microstep: 1565.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 19:55:19,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.17 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 19:55:21,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.32 | bwd_microstep: 797.94 | bwd_inner_microstep: 797.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3498 [2024-06-10 19:55:22,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.71 | bwd_microstep: 1316.11 | bwd_inner_microstep: 1316.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-10 19:55:24,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.84 | bwd_microstep: 1316.30 | bwd_inner_microstep: 1316.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 19:55:26,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.38 | bwd_inner_microstep: 1385.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3540 [2024-06-10 19:55:28,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.26 | bwd_microstep: 1688.24 | bwd_inner_microstep: 1688.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 19:55:30,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.08 | bwd_microstep: 1386.73 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623 [2024-06-10 19:55:33,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.82 | bwd_microstep: 1705.55 | bwd_inner_microstep: 1705.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439 [2024-06-10 19:55:35,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.19 | bwd_microstep: 1310.38 | bwd_inner_microstep: 1310.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3638 [2024-06-10 19:55:37,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.20 | bwd_microstep: 1678.77 | bwd_inner_microstep: 1678.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632 [2024-06-10 19:55:39,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1436.88 | bwd_inner_microstep: 1436.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3722 [2024-06-10 19:55:41,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.55 | bwd_microstep: 1439.31 | bwd_inner_microstep: 1439.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 19:55:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1287.37 | bwd_inner_microstep: 1287.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 19:55:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1450.05 | bwd_inner_microstep: 1450.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 19:55:46,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.23 | bwd_microstep: 1284.04 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-10 19:55:48,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.59 | bwd_microstep: 971.60 | bwd_inner_microstep: 971.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3697 [2024-06-10 19:55:50,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.20 | bwd_microstep: 1480.21 | bwd_inner_microstep: 1480.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3595 [2024-06-10 19:55:52,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.96 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005 [2024-06-10 19:55:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 894.84 | bwd_inner_microstep: 894.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 19:55:55,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.93 | bwd_microstep: 1599.02 | bwd_inner_microstep: 1598.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 19:55:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 19:55:59,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1552.09 | bwd_inner_microstep: 1552.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469 [2024-06-10 19:56:01,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.56 | bwd_microstep: 1182.45 | bwd_inner_microstep: 1182.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-10 19:56:02,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.60 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 19:56:06,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.17 | optimizer_step: 6.60 [2024-06-10 19:56:06,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.63 | bwd_microstep: 3328.39 | bwd_inner_microstep: 1643.36 | bwd_allreduce_microstep: 1684.98 | step_microstep: 37.67 [2024-06-10 19:56:06,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16262.83 | bwd: 45342.90 | bwd_inner: 43657.03 | bwd_allreduce: 1685.21 | step: 39.13 {'loss': 1.2148, 'learning_rate': 1.1769370743583957e-05, 'epoch': 0.65} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-10 19:56:08,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3971 [2024-06-10 19:56:11,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.43 | bwd_microstep: 1705.89 | bwd_inner_microstep: 1705.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 19:56:13,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1377.04 | bwd_inner_microstep: 1377.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3844 [2024-06-10 19:56:15,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.26 | bwd_microstep: 1425.28 | bwd_inner_microstep: 1425.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-10 19:56:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 807.80 | bwd_inner_microstep: 807.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 770 [2024-06-10 19:56:16,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.70 | bwd_microstep: 304.55 | bwd_inner_microstep: 304.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 19:56:18,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.09 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 19:56:20,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.87 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 19:56:22,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1350.89 | bwd_inner_microstep: 1350.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 19:56:23,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1951 [2024-06-10 19:56:24,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.44 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2116 [2024-06-10 19:56:26,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.00 | bwd_microstep: 957.34 | bwd_inner_microstep: 957.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521 [2024-06-10 19:56:28,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.75 | bwd_microstep: 1451.43 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 19:56:30,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.24 | bwd_inner_microstep: 1485.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3821 [2024-06-10 19:56:32,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.38 | bwd_microstep: 1478.98 | bwd_inner_microstep: 1478.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 19:56:34,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.54 | bwd_microstep: 1559.34 | bwd_inner_microstep: 1559.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644 [2024-06-10 19:56:36,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1514.23 | bwd_inner_microstep: 1514.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480 [2024-06-10 19:56:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.07 | bwd_microstep: 1215.60 | bwd_inner_microstep: 1215.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-10 19:56:40,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1511.17 | bwd_inner_microstep: 1511.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3622 [2024-06-10 19:56:42,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1611.14 | bwd_inner_microstep: 1611.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3828 [2024-06-10 19:56:44,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 19:56:46,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.47 | bwd_microstep: 1254.48 | bwd_inner_microstep: 1254.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2121 [2024-06-10 19:56:47,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.47 | bwd_microstep: 830.02 | bwd_inner_microstep: 829.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3870 [2024-06-10 19:56:49,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.80 | bwd_microstep: 1568.93 | bwd_inner_microstep: 1568.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 19:56:51,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.86 | bwd_microstep: 1287.22 | bwd_inner_microstep: 1287.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178 [2024-06-10 19:56:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.69 | bwd_microstep: 856.71 | bwd_inner_microstep: 856.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3652 [2024-06-10 19:56:54,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.58 | bwd_microstep: 1444.63 | bwd_inner_microstep: 1444.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3457 [2024-06-10 19:56:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1498.99 | bwd_inner_microstep: 1498.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004 [2024-06-10 19:56:57,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.47 | bwd_microstep: 894.20 | bwd_inner_microstep: 894.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3592 [2024-06-10 19:57:00,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.41 | bwd_microstep: 1805.83 | bwd_inner_microstep: 1805.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-10 19:57:02,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775 [2024-06-10 19:57:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.28 | optimizer_step: 6.58 [2024-06-10 19:57:08,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.21 | bwd_microstep: 5254.22 | bwd_inner_microstep: 1668.67 | bwd_allreduce_microstep: 3585.49 | step_microstep: 38.07 [2024-06-10 19:57:08,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15523.31 | bwd: 45271.58 | bwd_inner: 41685.17 | bwd_allreduce: 3585.72 | step: 39.61 {'loss': 1.1815, 'learning_rate': 1.1735176971333115e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 19:57:10,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.33 | bwd_microstep: 1462.77 | bwd_inner_microstep: 1462.71 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017 [2024-06-10 19:57:12,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.95 | bwd_microstep: 1609.27 | bwd_inner_microstep: 1609.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 19:57:14,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 19:57:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1653.04 | bwd_inner_microstep: 1653.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-10 19:57:18,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.03 | bwd_microstep: 1641.75 | bwd_inner_microstep: 1641.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-10 19:57:20,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.10 | bwd_microstep: 1448.87 | bwd_inner_microstep: 1448.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 19:57:22,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1247.28 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706 [2024-06-10 19:57:24,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1629.52 | bwd_inner_microstep: 1629.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3695 [2024-06-10 19:57:26,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.44 | bwd_microstep: 1587.82 | bwd_inner_microstep: 1587.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664 [2024-06-10 19:57:29,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.15 | bwd_microstep: 1717.32 | bwd_inner_microstep: 1717.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671 [2024-06-10 19:57:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1324.38 | bwd_inner_microstep: 1324.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3406 [2024-06-10 19:57:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1392.58 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1954 [2024-06-10 19:57:34,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.05 | bwd_microstep: 920.36 | bwd_inner_microstep: 920.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-10 19:57:36,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.83 | bwd_microstep: 1521.07 | bwd_inner_microstep: 1521.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-10 19:57:38,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1252.84 | bwd_inner_microstep: 1252.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-10 19:57:40,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1386.71 | bwd_inner_microstep: 1386.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-10 19:57:42,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.48 | bwd_microstep: 1521.14 | bwd_inner_microstep: 1521.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141 [2024-06-10 19:57:43,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.10 | bwd_microstep: 833.75 | bwd_inner_microstep: 833.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477 [2024-06-10 19:57:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.14 | bwd_microstep: 1348.94 | bwd_inner_microstep: 1348.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2178 [2024-06-10 19:57:46,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 858.64 | bwd_inner_microstep: 858.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 19:57:47,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 800.82 | bwd_inner_microstep: 800.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 19:57:49,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.40 | bwd_microstep: 1452.73 | bwd_inner_microstep: 1452.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 19:57:51,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.55 | bwd_microstep: 1286.94 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-10 19:57:53,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1517.64 | bwd_inner_microstep: 1517.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-10 19:57:55,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1295.24 | bwd_inner_microstep: 1295.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2271 [2024-06-10 19:57:56,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.10 | bwd_microstep: 810.53 | bwd_inner_microstep: 810.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 19:57:58,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 19:58:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.40 | bwd_microstep: 1456.00 | bwd_inner_microstep: 1455.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2040 [2024-06-10 19:58:01,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.55 | bwd_microstep: 872.64 | bwd_inner_microstep: 872.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-10 19:58:02,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.94 | bwd_microstep: 790.03 | bwd_inner_microstep: 790.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3586 [2024-06-10 19:58:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1565.00 | bwd_inner_microstep: 1564.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3748 [2024-06-10 19:58:08,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 19:58:08,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 2907.84 | bwd_inner_microstep: 1660.63 | bwd_allreduce_microstep: 1247.15 | step_microstep: 38.01 [2024-06-10 19:58:08,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15892.81 | bwd: 44002.15 | bwd_inner: 42754.05 | bwd_allreduce: 1247.40 | step: 39.53 {'loss': 1.2364, 'learning_rate': 1.1701012307746021e-05, 'epoch': 0.65} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1876 [2024-06-10 19:58:09,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.56 | bwd_microstep: 824.18 | bwd_inner_microstep: 824.12 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3921 [2024-06-10 19:58:11,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.10 | bwd_microstep: 1592.14 | bwd_inner_microstep: 1592.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3895 [2024-06-10 19:58:13,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1389.86 | bwd_inner_microstep: 1389.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 19:58:15,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1343.58 | bwd_inner_microstep: 1343.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922 [2024-06-10 19:58:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 824.61 | bwd_inner_microstep: 824.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2232 [2024-06-10 19:58:17,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.47 | bwd_microstep: 814.69 | bwd_inner_microstep: 814.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 19:58:19,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.64 | bwd_microstep: 1284.26 | bwd_inner_microstep: 1284.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 19:58:21,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1412.38 | bwd_inner_microstep: 1412.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 19:58:23,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 19:58:25,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.70 | bwd_microstep: 1434.24 | bwd_inner_microstep: 1434.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 19:58:26,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 795.49 | bwd_inner_microstep: 795.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-10 19:58:27,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.90 | bwd_microstep: 1220.36 | bwd_inner_microstep: 1220.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 19:58:29,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.04 | bwd_microstep: 794.84 | bwd_inner_microstep: 794.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2121 [2024-06-10 19:58:30,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.79 | bwd_microstep: 970.50 | bwd_inner_microstep: 970.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2429 [2024-06-10 19:58:31,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.20 | bwd_microstep: 1107.96 | bwd_inner_microstep: 1107.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3662 [2024-06-10 19:58:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 19:58:35,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2068 [2024-06-10 19:58:36,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.21 | bwd_microstep: 849.45 | bwd_inner_microstep: 849.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3538 [2024-06-10 19:58:38,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.69 | bwd_microstep: 1199.67 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982 [2024-06-10 19:58:39,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 893.68 | bwd_inner_microstep: 893.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3604 [2024-06-10 19:58:41,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.16 | bwd_microstep: 1370.38 | bwd_inner_microstep: 1370.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 19:58:43,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.56 | bwd_microstep: 1292.45 | bwd_inner_microstep: 1292.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 19:58:45,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.14 | bwd_microstep: 1243.10 | bwd_inner_microstep: 1243.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3607 [2024-06-10 19:58:47,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.03 | bwd_microstep: 1359.86 | bwd_inner_microstep: 1359.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3605 [2024-06-10 19:58:49,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1705.32 | bwd_inner_microstep: 1705.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968 [2024-06-10 19:58:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.03 | bwd_microstep: 703.67 | bwd_inner_microstep: 703.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 19:58:52,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1390.21 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 19:58:54,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817 [2024-06-10 19:58:56,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.47 | bwd_microstep: 1481.05 | bwd_inner_microstep: 1481.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-10 19:58:58,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1405.02 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 19:59:00,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.77 | bwd_microstep: 1500.63 | bwd_inner_microstep: 1500.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060 [2024-06-10 19:59:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-10 19:59:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.65 | bwd_microstep: 10549.83 | bwd_inner_microstep: 977.33 | bwd_allreduce_microstep: 9572.43 | step_microstep: 38.62 [2024-06-10 19:59:11,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14449.88 | bwd: 48113.52 | bwd_inner: 38540.11 | bwd_allreduce: 9572.70 | step: 40.13 ��███▍ | 1113/1726 [19:16:40<10:29:22, 61.60s/it] 64%|██████▍ | 1113/1726 [19:16:40<10:29:22, 61.60s/it] 65%|██████▍ | 1114/1726 [19:17:41<10:27:45, 61.55s/it] 65%|██████▍ | 1114/1726 [19:17:41<10:27:45, 61.55s/it] 65%|██████▍ | 1115/1726 [19:18:43<10:27:56, 61.66s/it] 65%|██████▍ | 1115/1726 [19:18:43<10:27:56, 61.66s/it] 65%|██████▍ | 1116/1726 [19:19:44<10:25:15, 61.50s/it] 65%|██████▍ | 1116/1726 [19:19:44<10:25:15, 61.50s/it] 65%|██████▍ | 1117/1726 [19:20:44<10:20:22, 61.12s/it] 65%|██████▍ | 1117/1726 [19:20:44<10:20:22, 61.12s/it] 65%|█████�{'loss': 1.1988, 'learning_rate': 1.166687687315043e-05, 'epoch': 0.65} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 19:59:12,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.78 | bwd_microstep: 1264.35 | bwd_inner_microstep: 1264.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 19:59:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 19:59:16,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1372.36 | bwd_inner_microstep: 1372.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 19:59:18,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.77 | bwd_microstep: 1454.07 | bwd_inner_microstep: 1454.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800 [2024-06-10 19:59:20,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.48 | bwd_microstep: 1644.90 | bwd_inner_microstep: 1644.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 19:59:22,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 19:59:24,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1279.39 | bwd_inner_microstep: 1279.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486 [2024-06-10 19:59:26,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1310.59 | bwd_inner_microstep: 1310.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 743 [2024-06-10 19:59:26,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.30 | bwd_microstep: 298.47 | bwd_inner_microstep: 298.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 19:59:28,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2140 [2024-06-10 19:59:30,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.65 | bwd_microstep: 861.62 | bwd_inner_microstep: 861.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424 [2024-06-10 19:59:31,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1281.77 | bwd_inner_microstep: 1281.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-10 19:59:33,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.46 | bwd_microstep: 1422.41 | bwd_inner_microstep: 1422.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 19:59:35,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1492.98 | bwd_inner_microstep: 1492.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2398 [2024-06-10 19:59:37,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.73 | bwd_microstep: 1035.59 | bwd_inner_microstep: 1035.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2011 [2024-06-10 19:59:38,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.07 | bwd_microstep: 832.05 | bwd_inner_microstep: 832.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3828 [2024-06-10 19:59:40,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.34 | bwd_microstep: 1858.31 | bwd_inner_microstep: 1858.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476 [2024-06-10 19:59:42,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1437.17 | bwd_inner_microstep: 1437.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498 [2024-06-10 19:59:44,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.38 | bwd_microstep: 1448.14 | bwd_inner_microstep: 1448.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2290 [2024-06-10 19:59:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.26 | bwd_microstep: 1070.94 | bwd_inner_microstep: 1070.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 19:59:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.47 | bwd_microstep: 1450.55 | bwd_inner_microstep: 1450.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 19:59:50,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.82 | bwd_microstep: 1357.88 | bwd_inner_microstep: 1357.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 19:59:52,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1277.78 | bwd_inner_microstep: 1277.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538 [2024-06-10 19:59:54,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.90 | bwd_microstep: 1421.24 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-10 19:59:55,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.09 | bwd_microstep: 913.22 | bwd_inner_microstep: 913.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546 [2024-06-10 19:59:57,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.42 | bwd_microstep: 1589.72 | bwd_inner_microstep: 1589.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 19:59:59,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.17 | bwd_microstep: 1439.65 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 20:00:01,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1657.93 | bwd_inner_microstep: 1657.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 20:00:04,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.48 | bwd_microstep: 1659.45 | bwd_inner_microstep: 1659.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-10 20:00:05,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.88 | bwd_microstep: 811.33 | bwd_inner_microstep: 811.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 20:00:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1298.26 | bwd_inner_microstep: 1298.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 20:00:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 20:00:13,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 6466.55 | bwd_inner_microstep: 1580.97 | bwd_allreduce_microstep: 4885.53 | step_microstep: 37.85 [2024-06-10 20:00:14,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.55 | bwd: 46861.13 | bwd_inner: 41974.70 | bwd_allreduce: 4885.76 | step: 39.34 {'loss': 1.2173, 'learning_rate': 1.1632770787771167e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-10 20:00:16,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.17 | bwd_microstep: 1501.97 | bwd_inner_microstep: 1501.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 20:00:17,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1287.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3406 [2024-06-10 20:00:19,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.16 | bwd_microstep: 1204.25 | bwd_inner_microstep: 1204.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841 [2024-06-10 20:00:21,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.30 | bwd_microstep: 1656.22 | bwd_inner_microstep: 1656.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 20:00:23,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.20 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-10 20:00:25,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.62 | bwd_microstep: 1277.21 | bwd_inner_microstep: 1277.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 20:00:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1405.91 | bwd_inner_microstep: 1405.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 20:00:29,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1281.97 | bwd_inner_microstep: 1281.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-10 20:00:30,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.20 | bwd_microstep: 1297.80 | bwd_inner_microstep: 1297.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618 [2024-06-10 20:00:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1312.25 | bwd_inner_microstep: 1312.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2082 [2024-06-10 20:00:33,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.98 | bwd_microstep: 859.47 | bwd_inner_microstep: 859.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 20:00:35,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1350.04 | bwd_inner_microstep: 1350.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 20:00:37,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.12 | bwd_microstep: 1340.29 | bwd_inner_microstep: 1340.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3224 [2024-06-10 20:00:39,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.14 | bwd_microstep: 1324.23 | bwd_inner_microstep: 1324.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 20:00:41,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1315.35 | bwd_inner_microstep: 1315.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467 [2024-06-10 20:00:43,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3527 [2024-06-10 20:00:44,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.89 | bwd_microstep: 1195.27 | bwd_inner_microstep: 1195.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672 [2024-06-10 20:00:46,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.31 | bwd_microstep: 1419.65 | bwd_inner_microstep: 1419.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446 [2024-06-10 20:00:48,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.89 | bwd_microstep: 1375.75 | bwd_inner_microstep: 1375.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 20:00:50,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 20:00:52,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.91 | bwd_microstep: 1663.31 | bwd_inner_microstep: 1663.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2927 [2024-06-10 20:00:54,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.49 | bwd_microstep: 1253.14 | bwd_inner_microstep: 1253.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2293 [2024-06-10 20:00:55,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.90 | bwd_microstep: 974.90 | bwd_inner_microstep: 974.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3816 [2024-06-10 20:00:57,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1584.95 | bwd_inner_microstep: 1584.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224 [2024-06-10 20:00:59,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.33 | bwd_microstep: 963.37 | bwd_inner_microstep: 963.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2538 [2024-06-10 20:01:00,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.76 | bwd_microstep: 966.39 | bwd_inner_microstep: 966.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2219 [2024-06-10 20:01:01,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.91 | bwd_microstep: 862.98 | bwd_inner_microstep: 862.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-10 20:01:03,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.74 | bwd_microstep: 1184.68 | bwd_inner_microstep: 1184.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 20:01:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.27 | bwd_microstep: 1460.12 | bwd_inner_microstep: 1460.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2453 [2024-06-10 20:01:06,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.77 | bwd_microstep: 1047.51 | bwd_inner_microstep: 1047.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590 [2024-06-10 20:01:09,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.18 | bwd_microstep: 1700.69 | bwd_inner_microstep: 1700.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 20:01:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 20:01:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 6960.70 | bwd_inner_microstep: 1685.88 | bwd_allreduce_microstep: 5274.76 | step_microstep: 37.99 [2024-06-10 20:01:16,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15558.80 | bwd: 46969.49 | bwd_inner: 41693.83 | bwd_allreduce: 5274.99 | step: 39.44 {'loss': 1.1683, 'learning_rate': 1.1598694171729703e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 20:01:18,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1462.28 | bwd_inner_microstep: 1462.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 20:01:20,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.24 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3842 [2024-06-10 20:01:22,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.58 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857 [2024-06-10 20:01:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.10 | bwd_microstep: 677.84 | bwd_inner_microstep: 677.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2631 [2024-06-10 20:01:25,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.20 | bwd_microstep: 919.15 | bwd_inner_microstep: 919.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 20:01:27,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.00 | bwd_microstep: 1440.32 | bwd_inner_microstep: 1440.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 20:01:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.54 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 20:01:30,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1279.76 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-10 20:01:31,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.68 | bwd_microstep: 793.79 | bwd_inner_microstep: 793.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408 [2024-06-10 20:01:33,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1181.36 | bwd_inner_microstep: 1181.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692 [2024-06-10 20:01:35,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1622.52 | bwd_inner_microstep: 1622.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3419 [2024-06-10 20:01:37,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.72 | bwd_microstep: 1309.87 | bwd_inner_microstep: 1309.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-10 20:01:39,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1359.83 | bwd_inner_microstep: 1359.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1945 [2024-06-10 20:01:40,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.52 | bwd_microstep: 885.43 | bwd_inner_microstep: 885.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 20:01:42,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.76 | bwd_microstep: 1595.97 | bwd_inner_microstep: 1595.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3639 [2024-06-10 20:01:45,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.28 | bwd_microstep: 1706.80 | bwd_inner_microstep: 1706.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107 [2024-06-10 20:01:46,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.34 | bwd_microstep: 918.06 | bwd_inner_microstep: 918.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2110 [2024-06-10 20:01:47,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.90 | bwd_microstep: 856.36 | bwd_inner_microstep: 856.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3459 [2024-06-10 20:01:49,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.81 | bwd_microstep: 1569.47 | bwd_inner_microstep: 1569.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2893 [2024-06-10 20:01:51,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.85 | bwd_microstep: 1087.14 | bwd_inner_microstep: 1087.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686 [2024-06-10 20:01:53,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1430.06 | bwd_inner_microstep: 1430.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 20:01:55,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.85 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 20:01:57,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1495.18 | bwd_inner_microstep: 1495.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145 [2024-06-10 20:01:58,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 850.53 | bwd_inner_microstep: 850.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-10 20:02:00,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.39 | bwd_microstep: 1304.82 | bwd_inner_microstep: 1304.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2183 [2024-06-10 20:02:01,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.23 | bwd_microstep: 857.32 | bwd_inner_microstep: 857.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285 [2024-06-10 20:02:02,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.01 | bwd_microstep: 784.57 | bwd_inner_microstep: 784.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 20:02:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 1505.56 | bwd_inner_microstep: 1505.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 20:02:06,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.35 | bwd_microstep: 1659.32 | bwd_inner_microstep: 1659.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572 [2024-06-10 20:02:09,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.76 | bwd_microstep: 1589.97 | bwd_inner_microstep: 1589.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 20:02:11,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.50 | bwd_microstep: 1471.36 | bwd_inner_microstep: 1471.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 20:02:17,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 20:02:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.20 | bwd_microstep: 5540.29 | bwd_inner_microstep: 1437.10 | bwd_allreduce_microstep: 4103.14 | step_microstep: 38.01 [2024-06-10 20:02:17,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15195.53 | bwd: 44750.34 | bwd_inner: 40646.30 | bwd_allreduce: 4103.37 | step: 39.47 {'loss': 1.1936, 'learning_rate': 1.156464714504369e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 20:02:19,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.89 | bwd_microstep: 1471.21 | bwd_inner_microstep: 1471.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3863 [2024-06-10 20:02:21,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1360.59 | bwd_inner_microstep: 1360.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484 [2024-06-10 20:02:22,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.47 | bwd_microstep: 1310.49 | bwd_inner_microstep: 1310.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 20:02:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1245.79 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 20:02:26,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3716 [2024-06-10 20:02:28,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1267.22 | bwd_inner_microstep: 1267.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 20:02:29,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1248.29 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 20:02:31,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1279.12 | bwd_inner_microstep: 1279.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734 [2024-06-10 20:02:33,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.11 | bwd_microstep: 1436.76 | bwd_inner_microstep: 1436.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489 [2024-06-10 20:02:35,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1442.85 | bwd_inner_microstep: 1442.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3387 [2024-06-10 20:02:37,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1290.32 | bwd_inner_microstep: 1290.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 20:02:39,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.44 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 20:02:41,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1516.44 | bwd_inner_microstep: 1516.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3666 [2024-06-10 20:02:43,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.46 | bwd_microstep: 1820.31 | bwd_inner_microstep: 1820.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-10 20:02:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.56 | bwd_microstep: 798.21 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1970 [2024-06-10 20:02:45,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.66 | bwd_microstep: 765.43 | bwd_inner_microstep: 765.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912 [2024-06-10 20:02:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.39 | bwd_microstep: 685.35 | bwd_inner_microstep: 685.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 20:02:48,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1526.41 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:02:50,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.28 | bwd_microstep: 1284.62 | bwd_inner_microstep: 1284.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2011 [2024-06-10 20:02:51,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.08 | bwd_microstep: 739.04 | bwd_inner_microstep: 739.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3550 [2024-06-10 20:02:53,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.37 | bwd_microstep: 1205.69 | bwd_inner_microstep: 1205.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 20:02:55,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.52 | bwd_microstep: 1643.23 | bwd_inner_microstep: 1643.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 20:02:57,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1400.58 | bwd_inner_microstep: 1400.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548 [2024-06-10 20:02:59,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1427.67 | bwd_inner_microstep: 1427.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-10 20:03:01,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808 [2024-06-10 20:03:03,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1615.65 | bwd_inner_microstep: 1615.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3909 [2024-06-10 20:03:05,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1394.63 | bwd_inner_microstep: 1394.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560 [2024-06-10 20:03:07,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1421.79 | bwd_inner_microstep: 1421.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734 [2024-06-10 20:03:09,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.84 | bwd_microstep: 1562.78 | bwd_inner_microstep: 1562.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 20:03:11,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.30 | bwd_microstep: 1507.64 | bwd_inner_microstep: 1507.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035 [2024-06-10 20:03:13,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.06 | bwd_microstep: 808.65 | bwd_inner_microstep: 808.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 20:03:16,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.18 | optimizer_step: 6.58 [2024-06-10 20:03:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.15 | bwd_microstep: 3005.55 | bwd_inner_microstep: 1417.74 | bwd_allreduce_microstep: 1587.76 | step_microstep: 37.78 [2024-06-10 20:03:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15660.89 | bwd: 43465.37 | bwd_inner: 41876.71 | bwd_allreduce: 1587.98 | step: 39.24 {'loss': 1.2184, 'learning_rate': 1.1530629827626583e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 20:03:18,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1473.13 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475 [2024-06-10 20:03:20,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.62 | bwd_microstep: 1341.96 | bwd_inner_microstep: 1341.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4176 [2024-06-10 20:03:22,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.47 | bwd_microstep: 1752.12 | bwd_inner_microstep: 1752.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 20:03:25,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1550.08 | bwd_inner_microstep: 1550.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 20:03:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1282.00 | bwd_inner_microstep: 1281.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 20:03:28,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1150.48 | bwd_inner_microstep: 1150.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 20:03:29,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.64 | bwd_microstep: 793.45 | bwd_inner_microstep: 793.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 20:03:31,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 20:03:33,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.64 | bwd_inner_microstep: 1253.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3496 [2024-06-10 20:03:35,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.58 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 20:03:37,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1484.46 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092 [2024-06-10 20:03:38,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.18 | bwd_microstep: 920.33 | bwd_inner_microstep: 920.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733 [2024-06-10 20:03:40,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.75 | bwd_microstep: 1624.68 | bwd_inner_microstep: 1624.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1976 [2024-06-10 20:03:41,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.73 | bwd_microstep: 854.14 | bwd_inner_microstep: 854.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 20:03:43,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.86 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3890 [2024-06-10 20:03:45,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1497.11 | bwd_inner_microstep: 1497.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 20:03:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1527.09 | bwd_inner_microstep: 1527.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3638 [2024-06-10 20:03:50,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1568.61 | bwd_inner_microstep: 1568.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699 [2024-06-10 20:03:52,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.88 | bwd_microstep: 1726.41 | bwd_inner_microstep: 1726.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3628 [2024-06-10 20:03:54,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.11 | bwd_microstep: 1711.97 | bwd_inner_microstep: 1711.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3678 [2024-06-10 20:03:56,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1328.04 | bwd_inner_microstep: 1328.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 20:03:58,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1512.72 | bwd_inner_microstep: 1512.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 20:04:00,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1460.95 | bwd_inner_microstep: 1460.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 20:04:02,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1412.61 | bwd_inner_microstep: 1412.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-10 20:04:03,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.81 | bwd_microstep: 797.06 | bwd_inner_microstep: 797.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 20:04:05,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1535.73 | bwd_inner_microstep: 1535.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 917 [2024-06-10 20:04:06,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.47 | bwd_microstep: 374.65 | bwd_inner_microstep: 374.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2034 [2024-06-10 20:04:07,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.98 | bwd_microstep: 744.64 | bwd_inner_microstep: 744.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 20:04:09,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1493.62 | bwd_inner_microstep: 1493.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-10 20:04:11,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.45 | bwd_microstep: 1484.57 | bwd_inner_microstep: 1484.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-10 20:04:13,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.20 | bwd_microstep: 1639.31 | bwd_inner_microstep: 1639.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831 [2024-06-10 20:04:16,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 20:04:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1668.42 | bwd_inner_microstep: 1572.77 | bwd_allreduce_microstep: 95.60 | step_microstep: 37.46 [2024-06-10 20:04:16,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16031.78 | bwd: 43108.22 | bwd_inner: 43011.69 | bwd_allreduce: 95.83 | step: 39.05 ��▍ | 1118/1726 [19:21:47<10:24:43, 61.65s/it] 65%|██████▍ | 1118/1726 [19:21:47<10:24:43, 61.65s/it] 65%|██████▍ | 1119/1726 [19:22:50<10:27:21, 62.01s/it] 65%|██████▍ | 1119/1726 [19:22:50<10:27:21, 62.01s/it] 65%|██████▍ | 1120/1726 [19:23:53<10:28:52, 62.26s/it] 65%|██████▍ | 1120/1726 [19:23:53<10:28:52, 62.26s/it] 65%|██████▍ | 1121/1726 [19:24:53<10:21:48, 61.67s/it] 65%|██████▍ | 1121/1726 [19:24:53<10:21:48, 61.67s/it] 65%|██████▌ | 1122/1726 [19:25:53<10:14:05, 61.00s/it] 65%|██████▌ | 1122/1726 [19:25:53<10:14:05, 61.00s/it] 65%|██████▌ |{'loss': 1.159, 'learning_rate': 1.1496642339287191e-05, 'epoch': 0.65} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 20:04:17,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1278.26 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 20:04:19,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.77 | bwd_microstep: 1245.30 | bwd_inner_microstep: 1245.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471 [2024-06-10 20:04:21,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.84 | bwd_microstep: 1441.78 | bwd_inner_microstep: 1441.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 20:04:23,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225 [2024-06-10 20:04:24,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.58 | bwd_microstep: 864.12 | bwd_inner_microstep: 864.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 20:04:26,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1284.67 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 20:04:28,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1248.33 | bwd_inner_microstep: 1248.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 20:04:30,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.59 | bwd_microstep: 1389.09 | bwd_inner_microstep: 1389.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 20:04:31,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.84 | bwd_microstep: 799.03 | bwd_inner_microstep: 799.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1914 [2024-06-10 20:04:32,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.07 | bwd_microstep: 731.96 | bwd_inner_microstep: 731.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1943 [2024-06-10 20:04:33,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.84 | bwd_microstep: 848.25 | bwd_inner_microstep: 848.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3665 [2024-06-10 20:04:35,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.82 | bwd_microstep: 1580.43 | bwd_inner_microstep: 1580.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663 [2024-06-10 20:04:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.60 | bwd_microstep: 1719.81 | bwd_inner_microstep: 1719.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-10 20:04:39,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1403.50 | bwd_inner_microstep: 1403.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 20:04:41,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.00 | bwd_microstep: 1374.00 | bwd_inner_microstep: 1373.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 20:04:43,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.62 | bwd_microstep: 1252.21 | bwd_inner_microstep: 1252.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454 [2024-06-10 20:04:45,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.85 | bwd_microstep: 1315.58 | bwd_inner_microstep: 1315.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3478 [2024-06-10 20:04:47,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.17 | bwd_microstep: 1436.14 | bwd_inner_microstep: 1436.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-10 20:04:48,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 801.06 | bwd_inner_microstep: 801.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 20:04:50,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.35 | bwd_microstep: 1381.61 | bwd_inner_microstep: 1381.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715 [2024-06-10 20:04:52,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.18 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3824 [2024-06-10 20:04:54,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.84 | bwd_microstep: 1259.02 | bwd_inner_microstep: 1258.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 20:04:55,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.31 | bwd_microstep: 1403.37 | bwd_inner_microstep: 1403.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1912 [2024-06-10 20:04:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.82 | bwd_microstep: 750.33 | bwd_inner_microstep: 750.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 20:04:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1398.62 | bwd_inner_microstep: 1398.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825 [2024-06-10 20:05:01,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1584.53 | bwd_inner_microstep: 1584.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233 [2024-06-10 20:05:02,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.28 | bwd_microstep: 865.39 | bwd_inner_microstep: 865.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 20:05:04,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.59 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3432 [2024-06-10 20:05:06,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1397.45 | bwd_inner_microstep: 1397.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 20:05:08,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1251.11 | bwd_inner_microstep: 1251.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-10 20:05:10,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.54 | bwd_microstep: 1479.67 | bwd_inner_microstep: 1479.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-10 20:05:19,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.37 | optimizer_step: 6.59 [2024-06-10 20:05:19,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.30 | bwd_microstep: 8316.65 | bwd_inner_microstep: 1824.57 | bwd_allreduce_microstep: 6492.02 | step_microstep: 40.28 [2024-06-10 20:05:19,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15290.42 | bwd: 47438.16 | bwd_inner: 40945.05 | bwd_allreduce: 6492.35 | step: 41.84 {'loss': 1.2558, 'learning_rate': 1.1462684799729272e-05, 'epoch': 0.65} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510 [2024-06-10 20:05:21,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 1581.92 | bwd_inner_microstep: 1581.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 20:05:23,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.79 | bwd_microstep: 1275.66 | bwd_inner_microstep: 1275.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 20:05:24,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.54 | bwd_microstep: 1199.22 | bwd_inner_microstep: 1199.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 20:05:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1479.37 | bwd_inner_microstep: 1479.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-10 20:05:27,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.71 | bwd_microstep: 808.03 | bwd_inner_microstep: 808.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 20:05:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.39 | bwd_microstep: 1277.23 | bwd_inner_microstep: 1277.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4096 [2024-06-10 20:05:31,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.45 | bwd_microstep: 1622.81 | bwd_inner_microstep: 1622.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 768 [2024-06-10 20:05:32,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 120.15 | bwd_microstep: 303.22 | bwd_inner_microstep: 303.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 20:05:34,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1246.98 | bwd_inner_microstep: 1246.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 20:05:35,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1383.89 | bwd_inner_microstep: 1383.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3726 [2024-06-10 20:05:38,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.02 | bwd_inner_microstep: 1557.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 20:05:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.61 | bwd_microstep: 1417.95 | bwd_inner_microstep: 1417.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3513 [2024-06-10 20:05:42,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.11 | bwd_microstep: 1431.39 | bwd_inner_microstep: 1431.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-10 20:05:43,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.18 | bwd_microstep: 1273.97 | bwd_inner_microstep: 1273.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3442 [2024-06-10 20:05:45,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1297.80 | bwd_inner_microstep: 1297.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-10 20:05:47,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2065 [2024-06-10 20:05:48,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 913.95 | bwd_inner_microstep: 913.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 20:05:50,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.46 | bwd_inner_microstep: 1396.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 20:05:52,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.97 | bwd_microstep: 1388.32 | bwd_inner_microstep: 1388.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 20:05:54,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1275.86 | bwd_inner_microstep: 1275.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 557 [2024-06-10 20:05:54,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 98.53 | bwd_microstep: 248.52 | bwd_inner_microstep: 248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3543 [2024-06-10 20:05:56,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1358.61 | bwd_inner_microstep: 1358.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610 [2024-06-10 20:05:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1338.35 | bwd_inner_microstep: 1338.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 20:06:00,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.33 | bwd_microstep: 1303.84 | bwd_inner_microstep: 1303.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3558 [2024-06-10 20:06:02,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.06 | bwd_microstep: 1478.63 | bwd_inner_microstep: 1478.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819 [2024-06-10 20:06:04,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1690.05 | bwd_inner_microstep: 1690.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 20:06:06,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.05 | bwd_microstep: 1476.46 | bwd_inner_microstep: 1476.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3561 [2024-06-10 20:06:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1664.72 | bwd_inner_microstep: 1664.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 20:06:10,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1375.51 | bwd_inner_microstep: 1375.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3780 [2024-06-10 20:06:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.88 | bwd_microstep: 1384.74 | bwd_inner_microstep: 1384.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3405 [2024-06-10 20:06:14,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.14 | bwd_microstep: 1208.99 | bwd_inner_microstep: 1208.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 20:06:18,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.94 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-10 20:06:18,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.37 | bwd_microstep: 3509.87 | bwd_inner_microstep: 1408.28 | bwd_allreduce_microstep: 2101.54 | step_microstep: 39.04 [2024-06-10 20:06:18,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15529.91 | bwd: 43565.45 | bwd_inner: 41463.01 | bwd_allreduce: 2101.76 | step: 40.54 {'loss': 1.1775, 'learning_rate': 1.142875732855111e-05, 'epoch': 0.65} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 20:06:20,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.43 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 20:06:22,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1272.57 | bwd_inner_microstep: 1272.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473 [2024-06-10 20:06:23,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.91 | bwd_microstep: 1181.74 | bwd_inner_microstep: 1181.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-10 20:06:25,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1409.25 | bwd_inner_microstep: 1409.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-10 20:06:27,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.51 | bwd_microstep: 1501.85 | bwd_inner_microstep: 1501.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 20:06:28,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.47 | bwd_microstep: 677.42 | bwd_inner_microstep: 677.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 20:06:30,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1243.05 | bwd_inner_microstep: 1243.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 20:06:32,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-10 20:06:34,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1533.46 | bwd_inner_microstep: 1533.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 20:06:36,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.65 | bwd_microstep: 1277.60 | bwd_inner_microstep: 1277.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089 [2024-06-10 20:06:37,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.68 | bwd_microstep: 853.19 | bwd_inner_microstep: 853.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3835 [2024-06-10 20:06:39,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.93 | bwd_microstep: 1690.15 | bwd_inner_microstep: 1690.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-10 20:06:41,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-10 20:06:42,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.34 | bwd_microstep: 889.38 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2663 [2024-06-10 20:06:44,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.51 | bwd_microstep: 1150.30 | bwd_inner_microstep: 1150.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135 [2024-06-10 20:06:45,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.70 | bwd_microstep: 827.79 | bwd_inner_microstep: 827.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993 [2024-06-10 20:06:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.03 | bwd_microstep: 895.92 | bwd_inner_microstep: 895.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 20:06:47,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 696.35 | bwd_inner_microstep: 696.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3644 [2024-06-10 20:06:49,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1510.02 | bwd_inner_microstep: 1509.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031 [2024-06-10 20:06:51,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.59 | bwd_microstep: 806.12 | bwd_inner_microstep: 806.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 20:06:52,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1394.16 | bwd_inner_microstep: 1394.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529 [2024-06-10 20:06:54,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.03 | bwd_microstep: 1196.11 | bwd_inner_microstep: 1196.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 20:06:56,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.12 | bwd_microstep: 1190.92 | bwd_inner_microstep: 1190.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 20:06:58,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1455.73 | bwd_inner_microstep: 1455.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 20:07:00,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1298.96 | bwd_inner_microstep: 1298.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 20:07:02,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.76 | bwd_microstep: 1401.42 | bwd_inner_microstep: 1401.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 20:07:04,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1453.77 | bwd_inner_microstep: 1453.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 20:07:06,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1495.46 | bwd_inner_microstep: 1495.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 20:07:07,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1347.78 | bwd_inner_microstep: 1347.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 20:07:10,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.64 | bwd_microstep: 1647.69 | bwd_inner_microstep: 1647.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 20:07:12,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.58 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 20:07:18,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 20:07:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 5667.29 | bwd_inner_microstep: 1417.76 | bwd_allreduce_microstep: 4249.48 | step_microstep: 38.33 [2024-06-10 20:07:18,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15042.39 | bwd: 44440.94 | bwd_inner: 40190.55 | bwd_allreduce: 4249.71 | step: 39.80 {'loss': 1.2329, 'learning_rate': 1.1394860045245084e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 20:07:20,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.65 | bwd_microstep: 1469.35 | bwd_inner_microstep: 1469.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3621 [2024-06-10 20:07:22,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2308 [2024-06-10 20:07:23,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.67 | bwd_microstep: 882.98 | bwd_inner_microstep: 882.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 20:07:25,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1483.04 | bwd_inner_microstep: 1483.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2890 [2024-06-10 20:07:27,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.40 | bwd_microstep: 1234.89 | bwd_inner_microstep: 1234.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 20:07:29,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1242.38 | bwd_inner_microstep: 1242.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 20:07:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.35 | bwd_microstep: 790.79 | bwd_inner_microstep: 790.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 20:07:32,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1387.19 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1922 [2024-06-10 20:07:33,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.88 | bwd_microstep: 879.13 | bwd_inner_microstep: 879.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3694 [2024-06-10 20:07:35,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.48 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490 [2024-06-10 20:07:37,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1575.09 | bwd_inner_microstep: 1575.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3119 [2024-06-10 20:07:39,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1246.51 | bwd_inner_microstep: 1246.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 20:07:40,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1340.92 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 20:07:42,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645 [2024-06-10 20:07:45,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.31 | bwd_microstep: 1639.21 | bwd_inner_microstep: 1639.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-10 20:07:47,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.05 | bwd_microstep: 1408.44 | bwd_inner_microstep: 1408.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2433 [2024-06-10 20:07:48,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.42 | bwd_microstep: 851.36 | bwd_inner_microstep: 851.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 20:07:50,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.40 | bwd_microstep: 1556.53 | bwd_inner_microstep: 1556.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 20:07:52,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1384.04 | bwd_inner_microstep: 1384.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-10 20:07:53,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.51 | bwd_microstep: 976.02 | bwd_inner_microstep: 975.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 20:07:55,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1254.79 | bwd_inner_microstep: 1254.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432 [2024-06-10 20:07:56,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.19 | bwd_microstep: 1152.74 | bwd_inner_microstep: 1152.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-10 20:07:58,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1399.23 | bwd_inner_microstep: 1399.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-10 20:08:00,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1319.44 | bwd_inner_microstep: 1319.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561 [2024-06-10 20:08:02,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1594.27 | bwd_inner_microstep: 1594.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 20:08:04,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.29 | bwd_microstep: 1300.63 | bwd_inner_microstep: 1300.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 20:08:07,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.08 | bwd_microstep: 1657.71 | bwd_inner_microstep: 1657.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753 [2024-06-10 20:08:09,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1539.26 | bwd_inner_microstep: 1539.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 20:08:11,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1642.58 | bwd_inner_microstep: 1642.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 20:08:13,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.02 | bwd_microstep: 1646.55 | bwd_inner_microstep: 1646.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-10 20:08:15,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.07 | bwd_microstep: 1496.46 | bwd_inner_microstep: 1496.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 20:08:19,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.03 | optimizer_step: 6.58 [2024-06-10 20:08:19,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.00 | bwd_microstep: 2894.49 | bwd_inner_microstep: 1521.64 | bwd_allreduce_microstep: 1372.81 | step_microstep: 37.55 [2024-06-10 20:08:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16026.66 | bwd: 44441.46 | bwd_inner: 43067.76 | bwd_allreduce: 1373.04 | step: 39.03 {'loss': 1.1706, 'learning_rate': 1.1360993069197241e-05, 'epoch': 0.65} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418 [2024-06-10 20:08:20,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.75 | bwd_microstep: 1270.57 | bwd_inner_microstep: 1270.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 20:08:22,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1345.61 | bwd_inner_microstep: 1345.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3847 [2024-06-10 20:08:24,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1390.67 | bwd_inner_microstep: 1390.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 20:08:26,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1245.65 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 20:08:28,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1276.98 | bwd_inner_microstep: 1276.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 20:08:30,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.19 | bwd_microstep: 1649.22 | bwd_inner_microstep: 1649.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-10 20:08:32,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.42 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 761 [2024-06-10 20:08:32,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.97 | bwd_microstep: 301.58 | bwd_inner_microstep: 301.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-10 20:08:34,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.53 | bwd_microstep: 1527.60 | bwd_inner_microstep: 1527.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 20:08:36,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1313.55 | bwd_inner_microstep: 1313.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492 [2024-06-10 20:08:38,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.53 | bwd_microstep: 1441.49 | bwd_inner_microstep: 1441.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113 [2024-06-10 20:08:39,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.88 | bwd_microstep: 825.85 | bwd_inner_microstep: 825.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 20:08:41,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1340.11 | bwd_inner_microstep: 1340.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 20:08:43,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.00 | bwd_microstep: 1479.29 | bwd_inner_microstep: 1479.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527 [2024-06-10 20:08:45,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.30 | bwd_microstep: 1591.13 | bwd_inner_microstep: 1591.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657 [2024-06-10 20:08:47,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.22 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1970 [2024-06-10 20:08:48,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.74 | bwd_microstep: 831.27 | bwd_inner_microstep: 831.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 20:08:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1352.91 | bwd_inner_microstep: 1352.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-10 20:08:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.24 | bwd_microstep: 1309.91 | bwd_inner_microstep: 1309.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3654 [2024-06-10 20:08:54,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1289.89 | bwd_inner_microstep: 1289.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3671 [2024-06-10 20:08:56,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1457.23 | bwd_inner_microstep: 1457.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-10 20:08:58,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.75 | bwd_microstep: 1416.05 | bwd_inner_microstep: 1416.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 20:09:00,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1419.08 | bwd_inner_microstep: 1419.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576 [2024-06-10 20:09:02,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.55 | bwd_microstep: 1308.08 | bwd_inner_microstep: 1308.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819 [2024-06-10 20:09:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.91 | bwd_microstep: 1357.39 | bwd_inner_microstep: 1357.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515 [2024-06-10 20:09:05,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.10 | bwd_microstep: 1222.65 | bwd_inner_microstep: 1222.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3058 [2024-06-10 20:09:07,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.98 | bwd_microstep: 1234.00 | bwd_inner_microstep: 1233.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-10 20:09:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1433.75 | bwd_inner_microstep: 1433.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-10 20:09:11,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1538.74 | bwd_inner_microstep: 1538.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 20:09:13,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.44 | bwd_microstep: 1650.48 | bwd_inner_microstep: 1650.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2241 [2024-06-10 20:09:14,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.64 | bwd_microstep: 845.64 | bwd_inner_microstep: 845.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-10 20:09:19,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-10 20:09:19,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.15 | bwd_microstep: 4301.39 | bwd_inner_microstep: 1563.05 | bwd_allreduce_microstep: 2738.28 | step_microstep: 37.74 [2024-06-10 20:09:19,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15654.24 | bwd: 44629.74 | bwd_inner: 41890.55 | bwd_allreduce: 2738.51 | step: 39.31 1123/1726 [19:26:52<10:08:30, 60.55s/it] 65%|██████▌ | 1123/1726 [19:26:52<10:08:30, 60.55s/it] 65%|██████▌ | 1124/1726 [19:27:55<10:15:05, 61.31s/it] 65%|██████▌ | 1124/1726 [19:27:55<10:15:05, 61.31s/it] 65%|██████▌ | 1125/1726 [19:28:55<10:08:25, 60.74s/it] 65%|██████▌ | 1125/1726 [19:28:55<10:08:25, 60.74s/it] 65%|██████▌ | 1126/1726 [19:29:55<10:04:35, 60.46s/it] 65%|██████▌ | 1126/1726 [19:29:55<10:04:35, 60.46s/it] 65%|██████▌ | 1127/1726 [19:30:55<10:04:37, 60.56s/it] 65%|██████▌ | 1127/1726 [19:30:55<10:04:37, 60.56s/it] 65%|██████▌ | 1128/172{'loss': 1.2433, 'learning_rate': 1.1327156519686896e-05, 'epoch': 0.65} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401 [2024-06-10 20:09:21,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.02 | bwd_microstep: 1434.85 | bwd_inner_microstep: 1434.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3888 [2024-06-10 20:09:24,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.76 | bwd_microstep: 1677.42 | bwd_inner_microstep: 1677.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 20:09:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.11 | bwd_microstep: 1340.81 | bwd_inner_microstep: 1340.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 20:09:28,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.17 | bwd_microstep: 1549.26 | bwd_inner_microstep: 1549.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 20:09:29,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1375.73 | bwd_inner_microstep: 1375.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2590 [2024-06-10 20:09:31,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.37 | bwd_microstep: 1007.97 | bwd_inner_microstep: 1007.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 20:09:33,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1380.93 | bwd_inner_microstep: 1380.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-10 20:09:34,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 799.13 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2929 [2024-06-10 20:09:35,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.18 | bwd_microstep: 1093.32 | bwd_inner_microstep: 1093.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2170 [2024-06-10 20:09:37,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.84 | bwd_microstep: 948.68 | bwd_inner_microstep: 948.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 20:09:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.12 | bwd_microstep: 1289.46 | bwd_inner_microstep: 1289.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 725 [2024-06-10 20:09:39,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.90 | bwd_microstep: 294.30 | bwd_inner_microstep: 294.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942 [2024-06-10 20:09:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.08 | bwd_microstep: 825.07 | bwd_inner_microstep: 825.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656 [2024-06-10 20:09:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.43 | bwd_microstep: 1513.71 | bwd_inner_microstep: 1513.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475 [2024-06-10 20:09:44,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.32 | bwd_microstep: 1577.24 | bwd_inner_microstep: 1577.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-10 20:09:46,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1439.66 | bwd_inner_microstep: 1439.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930 [2024-06-10 20:09:47,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 699.06 | bwd_inner_microstep: 699.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-10 20:09:49,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-10 20:09:51,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.39 | bwd_microstep: 1159.03 | bwd_inner_microstep: 1159.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 20:09:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.82 | bwd_microstep: 1454.64 | bwd_inner_microstep: 1454.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2087 [2024-06-10 20:09:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.12 | bwd_microstep: 822.25 | bwd_inner_microstep: 822.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 4010 [2024-06-10 20:09:57,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.20 | bwd_microstep: 1920.73 | bwd_inner_microstep: 1920.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3523 [2024-06-10 20:09:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.28 | bwd_microstep: 1516.32 | bwd_inner_microstep: 1516.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 636 [2024-06-10 20:09:59,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 104.08 | bwd_microstep: 264.98 | bwd_inner_microstep: 264.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1983 [2024-06-10 20:10:00,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.87 | bwd_microstep: 832.22 | bwd_inner_microstep: 832.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-10 20:10:02,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1441.70 | bwd_inner_microstep: 1441.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-10 20:10:05,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.95 | bwd_microstep: 1756.88 | bwd_inner_microstep: 1756.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563 [2024-06-10 20:10:07,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.24 | bwd_microstep: 1594.61 | bwd_inner_microstep: 1594.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 20:10:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.54 | bwd_microstep: 1646.31 | bwd_inner_microstep: 1646.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 20:10:11,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.81 | bwd_microstep: 1518.33 | bwd_inner_microstep: 1518.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 20:10:13,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.41 | bwd_microstep: 1541.35 | bwd_inner_microstep: 1541.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 20:10:23,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-10 20:10:23,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.68 | bwd_microstep: 9093.61 | bwd_inner_microstep: 1099.43 | bwd_allreduce_microstep: 7994.12 | step_microstep: 37.84 [2024-06-10 20:10:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14995.16 | bwd: 48299.93 | bwd_inner: 40304.91 | bwd_allreduce: 7994.35 | step: 39.33 {'loss': 1.2375, 'learning_rate': 1.1293350515886203e-05, 'epoch': 0.65} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3458 [2024-06-10 20:10:25,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1423.25 | bwd_inner_microstep: 1423.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 20:10:27,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1398.75 | bwd_inner_microstep: 1398.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3980 [2024-06-10 20:10:29,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1498.10 | bwd_inner_microstep: 1498.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 20:10:31,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1648.70 | bwd_inner_microstep: 1648.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 4129 [2024-06-10 20:10:33,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.62 | bwd_microstep: 1699.40 | bwd_inner_microstep: 1699.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-10 20:10:35,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.25 | bwd_microstep: 1181.84 | bwd_inner_microstep: 1181.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 20:10:37,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1346.36 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 20:10:39,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1428.28 | bwd_inner_microstep: 1428.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605 [2024-06-10 20:10:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1303.64 | bwd_inner_microstep: 1303.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-10 20:10:43,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.96 | bwd_microstep: 1307.03 | bwd_inner_microstep: 1307.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3428 [2024-06-10 20:10:45,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1504.09 | bwd_inner_microstep: 1504.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 20:10:46,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.52 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2645 [2024-06-10 20:10:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.06 | bwd_microstep: 1112.29 | bwd_inner_microstep: 1112.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3549 [2024-06-10 20:10:50,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1436.87 | bwd_inner_microstep: 1436.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-10 20:10:52,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1600.42 | bwd_inner_microstep: 1600.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446 [2024-06-10 20:10:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1407.59 | bwd_inner_microstep: 1407.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3462 [2024-06-10 20:10:56,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1400.06 | bwd_inner_microstep: 1400.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-10 20:10:57,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.07 | bwd_microstep: 791.67 | bwd_inner_microstep: 791.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102 [2024-06-10 20:10:58,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.23 | bwd_microstep: 821.41 | bwd_inner_microstep: 821.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 20:11:00,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1286.05 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 20:11:02,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1373.31 | bwd_inner_microstep: 1373.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585 [2024-06-10 20:11:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1530.70 | bwd_inner_microstep: 1530.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-10 20:11:06,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1652.66 | bwd_inner_microstep: 1652.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 20:11:09,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.01 | bwd_microstep: 1651.98 | bwd_inner_microstep: 1651.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708 [2024-06-10 20:11:10,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1329.47 | bwd_inner_microstep: 1329.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 20:11:12,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.80 | bwd_microstep: 1389.02 | bwd_inner_microstep: 1388.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 20:11:14,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1508.58 | bwd_inner_microstep: 1508.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 20:11:17,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-10 20:11:18,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1297.24 | bwd_inner_microstep: 1297.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-10 20:11:21,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.16 | bwd_microstep: 1655.81 | bwd_inner_microstep: 1655.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2133 [2024-06-10 20:11:22,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.51 | bwd_microstep: 893.25 | bwd_inner_microstep: 893.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-10 20:11:24,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.64 | optimizer_gradients: 4.14 | optimizer_step: 6.63 [2024-06-10 20:11:24,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.17 | bwd_microstep: 1347.80 | bwd_inner_microstep: 1307.02 | bwd_allreduce_microstep: 40.73 | step_microstep: 39.34 [2024-06-10 20:11:24,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16458.30 | bwd: 44113.63 | bwd_inner: 44072.00 | bwd_allreduce: 40.95 | step: 40.77 {'loss': 1.2005, 'learning_rate': 1.1259575176859739e-05, 'epoch': 0.65} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1880 [2024-06-10 20:11:25,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.95 | bwd_microstep: 768.41 | bwd_inner_microstep: 768.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752 [2024-06-10 20:11:27,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.30 | bwd_microstep: 1539.10 | bwd_inner_microstep: 1539.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 20:11:29,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1243.14 | bwd_inner_microstep: 1243.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429 [2024-06-10 20:11:30,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1282.14 | bwd_inner_microstep: 1282.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2340 [2024-06-10 20:11:32,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.24 | bwd_microstep: 889.09 | bwd_inner_microstep: 889.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2061 [2024-06-10 20:11:33,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.32 | bwd_microstep: 815.23 | bwd_inner_microstep: 815.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066 [2024-06-10 20:11:35,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.09 | bwd_microstep: 1628.46 | bwd_inner_microstep: 1628.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2050 [2024-06-10 20:11:36,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.34 | bwd_microstep: 753.59 | bwd_inner_microstep: 753.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 20:11:38,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.92 | bwd_microstep: 1155.01 | bwd_inner_microstep: 1154.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 20:11:39,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1250.56 | bwd_inner_microstep: 1250.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 20:11:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1349.02 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 20:11:43,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1248.78 | bwd_inner_microstep: 1248.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-10 20:11:45,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1523.49 | bwd_inner_microstep: 1523.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2408 [2024-06-10 20:11:47,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.96 | bwd_microstep: 1039.90 | bwd_inner_microstep: 1039.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-10 20:11:48,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.94 | bwd_microstep: 1241.40 | bwd_inner_microstep: 1241.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3646 [2024-06-10 20:11:51,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.33 | bwd_microstep: 1604.41 | bwd_inner_microstep: 1604.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3611 [2024-06-10 20:11:53,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.31 | bwd_microstep: 1632.79 | bwd_inner_microstep: 1632.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-10 20:11:54,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.79 | bwd_microstep: 779.69 | bwd_inner_microstep: 779.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3831 [2024-06-10 20:11:56,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.44 | bwd_microstep: 1855.81 | bwd_inner_microstep: 1855.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-10 20:11:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1444.64 | bwd_inner_microstep: 1444.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3478 [2024-06-10 20:12:00,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1331.10 | bwd_inner_microstep: 1331.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-10 20:12:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.79 | bwd_microstep: 797.33 | bwd_inner_microstep: 797.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3492 [2024-06-10 20:12:03,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.12 | bwd_microstep: 1562.20 | bwd_inner_microstep: 1562.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 20:12:05,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3622 [2024-06-10 20:12:07,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1537.89 | bwd_inner_microstep: 1537.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769 [2024-06-10 20:12:09,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1448.64 | bwd_inner_microstep: 1448.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-10 20:12:11,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.68 | bwd_microstep: 1492.38 | bwd_inner_microstep: 1492.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816 [2024-06-10 20:12:13,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1358.72 | bwd_inner_microstep: 1358.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2550 [2024-06-10 20:12:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.66 | bwd_microstep: 967.64 | bwd_inner_microstep: 967.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042 [2024-06-10 20:12:16,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.15 | bwd_microstep: 812.66 | bwd_inner_microstep: 812.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814 [2024-06-10 20:12:18,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.11 | bwd_microstep: 1599.63 | bwd_inner_microstep: 1599.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-10 20:12:24,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.10 | optimizer_step: 6.62 [2024-06-10 20:12:24,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.04 | bwd_microstep: 5690.11 | bwd_inner_microstep: 2098.15 | bwd_allreduce_microstep: 3591.90 | step_microstep: 38.04 [2024-06-10 20:12:24,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15267.92 | bwd: 44923.20 | bwd_inner: 41330.37 | bwd_allreduce: 3592.14 | step: 39.56 {'loss': 1.1641, 'learning_rate': 1.122583062156406e-05, 'epoch': 0.66} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 20:12:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 784.19 | bwd_inner_microstep: 784.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3933 [2024-06-10 20:12:28,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 20:12:29,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.76 | bwd_microstep: 1246.40 | bwd_inner_microstep: 1246.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-10 20:12:31,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.78 | bwd_microstep: 1446.28 | bwd_inner_microstep: 1446.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 20:12:33,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.05 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 20:12:35,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768 [2024-06-10 20:12:37,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1339.15 | bwd_inner_microstep: 1339.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941 [2024-06-10 20:12:38,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.46 | bwd_microstep: 761.45 | bwd_inner_microstep: 761.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446 [2024-06-10 20:12:40,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.23 | bwd_microstep: 1189.36 | bwd_inner_microstep: 1189.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 20:12:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.29 | bwd_microstep: 1381.69 | bwd_inner_microstep: 1381.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891 [2024-06-10 20:12:43,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.34 | bwd_microstep: 712.90 | bwd_inner_microstep: 712.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1917 [2024-06-10 20:12:44,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.53 | bwd_microstep: 749.47 | bwd_inner_microstep: 749.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 20:12:46,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.01 | bwd_microstep: 1486.11 | bwd_inner_microstep: 1486.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3719 [2024-06-10 20:12:48,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1563.15 | bwd_inner_microstep: 1563.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3659 [2024-06-10 20:12:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.58 | bwd_microstep: 1611.11 | bwd_inner_microstep: 1611.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 20:12:52,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.65 | bwd_microstep: 1181.15 | bwd_inner_microstep: 1181.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3465 [2024-06-10 20:12:54,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.13 | bwd_microstep: 1542.45 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676 [2024-06-10 20:12:56,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.30 | bwd_microstep: 1620.62 | bwd_inner_microstep: 1620.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 20:12:58,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1416.21 | bwd_inner_microstep: 1416.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-10 20:13:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1480.72 | bwd_inner_microstep: 1480.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3480 [2024-06-10 20:13:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.15 | bwd_microstep: 1405.75 | bwd_inner_microstep: 1405.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3831 [2024-06-10 20:13:04,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.62 | bwd_microstep: 1616.51 | bwd_inner_microstep: 1616.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 20:13:06,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.98 | bwd_microstep: 1614.81 | bwd_inner_microstep: 1614.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-10 20:13:08,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1495.14 | bwd_inner_microstep: 1495.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 20:13:10,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1378.11 | bwd_inner_microstep: 1378.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 20:13:13,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.18 | bwd_microstep: 1663.57 | bwd_inner_microstep: 1663.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 20:13:15,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1392.79 | bwd_inner_microstep: 1392.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2289 [2024-06-10 20:13:16,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 878.62 | bwd_inner_microstep: 878.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 20:13:18,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1391.49 | bwd_inner_microstep: 1391.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 20:13:20,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.89 | bwd_microstep: 1635.78 | bwd_inner_microstep: 1635.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 20:13:22,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1552.95 | bwd_inner_microstep: 1552.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3581 [2024-06-10 20:13:26,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.58 [2024-06-10 20:13:26,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.13 | bwd_microstep: 3091.53 | bwd_inner_microstep: 1926.42 | bwd_allreduce_microstep: 1165.05 | step_microstep: 39.40 [2024-06-10 20:13:26,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16244.00 | bwd: 44914.45 | bwd_inner: 43748.47 | bwd_allreduce: 1165.28 | step: 40.86 {'loss': 1.2043, 'learning_rate': 1.1192116968847313e-05, 'epoch': 0.66} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4351 [2024-06-10 20:13:28,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.33 | bwd_microstep: 1788.91 | bwd_inner_microstep: 1788.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 20:13:30,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1480.13 | bwd_inner_microstep: 1480.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-10 20:13:32,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.02 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 20:13:34,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.54 | bwd_microstep: 1651.68 | bwd_inner_microstep: 1651.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 20:13:36,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1344.90 | bwd_inner_microstep: 1344.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3499 [2024-06-10 20:13:38,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.80 | bwd_microstep: 1335.91 | bwd_inner_microstep: 1335.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469 [2024-06-10 20:13:40,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.63 | bwd_microstep: 1341.52 | bwd_inner_microstep: 1341.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 20:13:42,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.48 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747 [2024-06-10 20:13:44,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1638.66 | bwd_inner_microstep: 1638.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-10 20:13:46,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1211.96 | bwd_inner_microstep: 1211.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 20:13:48,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.78 | bwd_microstep: 1432.08 | bwd_inner_microstep: 1432.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 20:13:50,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1480.45 | bwd_inner_microstep: 1480.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3686 [2024-06-10 20:13:52,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.51 | bwd_microstep: 1720.09 | bwd_inner_microstep: 1720.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043 [2024-06-10 20:13:53,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 809.81 | bwd_inner_microstep: 809.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 20:13:55,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1386.14 | bwd_inner_microstep: 1386.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-10 20:13:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.08 | bwd_microstep: 695.38 | bwd_inner_microstep: 695.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537 [2024-06-10 20:13:58,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.26 | bwd_microstep: 1426.35 | bwd_inner_microstep: 1426.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2171 [2024-06-10 20:13:59,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.52 | bwd_microstep: 885.28 | bwd_inner_microstep: 885.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2317 [2024-06-10 20:14:01,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.72 | bwd_microstep: 982.84 | bwd_inner_microstep: 982.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-10 20:14:03,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1557.47 | bwd_inner_microstep: 1557.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 20:14:04,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 975.27 | bwd_inner_microstep: 975.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-10 20:14:05,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.08 | bwd_microstep: 697.39 | bwd_inner_microstep: 697.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576 [2024-06-10 20:14:07,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.26 | bwd_microstep: 1333.52 | bwd_inner_microstep: 1333.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 20:14:09,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.43 | bwd_microstep: 1662.68 | bwd_inner_microstep: 1662.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 20:14:11,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1401.73 | bwd_inner_microstep: 1401.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 20:14:13,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.01 | bwd_microstep: 1284.72 | bwd_inner_microstep: 1284.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-10 20:14:15,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1449.74 | bwd_inner_microstep: 1449.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3495 [2024-06-10 20:14:17,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.92 | bwd_microstep: 1530.40 | bwd_inner_microstep: 1530.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 20:14:19,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1352.83 | bwd_inner_microstep: 1352.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 20:14:21,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.98 | bwd_microstep: 1597.09 | bwd_inner_microstep: 1597.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 20:14:23,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1403.03 | bwd_inner_microstep: 1403.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468 [2024-06-10 20:14:29,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 20:14:29,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.96 | bwd_microstep: 4860.43 | bwd_inner_microstep: 1583.92 | bwd_allreduce_microstep: 3276.45 | step_microstep: 37.97 [2024-06-10 20:14:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16051.75 | bwd: 46347.86 | bwd_inner: 43070.50 | bwd_allreduce: 3276.69 | step: 39.48 6 [19:31:56<10:03:45, 60.58s/it] 65%|██████▌ | 1128/1726 [19:31:56<10:03:45, 60.58s/it] 65%|██████▌ | 1129/1726 [19:33:00<10:11:49, 61.49s/it] 65%|██████▌ | 1129/1726 [19:33:00<10:11:49, 61.49s/it] 65%|██████▌ | 1130/1726 [19:34:01<10:09:04, 61.32s/it] 65%|██████▌ | 1130/1726 [19:34:01<10:09:04, 61.32s/it] 66%|██████▌ | 1131/1726 [19:35:01<10:05:40, 61.08s/it] 66%|██████▌ | 1131/1726 [19:35:01<10:05:40, 61.08s/it] 66%|██████▌ | 1132/1726 [19:36:03<10:05:54, 61.20s/it] 66%|██████▌ | 1132/1726 [19:36:03<10:05:54, 61.20s/it] 66%|██████▌ | 1133/1726 [19:37:{'loss': 1.1506, 'learning_rate': 1.1158434337448822e-05, 'epoch': 0.66} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 20:14:31,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1432.47 | bwd_inner_microstep: 1432.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4057 [2024-06-10 20:14:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1546.30 | bwd_inner_microstep: 1546.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 20:14:35,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.25 | bwd_microstep: 1481.07 | bwd_inner_microstep: 1481.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 20:14:36,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.66 | bwd_microstep: 1278.80 | bwd_inner_microstep: 1278.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 20:14:38,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1871 [2024-06-10 20:14:39,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.70 | bwd_microstep: 742.39 | bwd_inner_microstep: 742.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4085 [2024-06-10 20:14:42,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.96 | bwd_microstep: 1588.82 | bwd_inner_microstep: 1588.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 20:14:43,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.30 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 20:14:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.11 | bwd_microstep: 801.94 | bwd_inner_microstep: 801.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1885 [2024-06-10 20:14:46,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.75 | bwd_microstep: 804.69 | bwd_inner_microstep: 804.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3502 [2024-06-10 20:14:48,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1334.10 | bwd_inner_microstep: 1334.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415 [2024-06-10 20:14:49,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.28 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3511 [2024-06-10 20:14:52,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.07 | bwd_microstep: 1681.33 | bwd_inner_microstep: 1681.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 20:14:54,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.88 | bwd_microstep: 1510.44 | bwd_inner_microstep: 1510.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 20:14:55,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.23 | bwd_microstep: 791.42 | bwd_inner_microstep: 791.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3912 [2024-06-10 20:14:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.61 | bwd_microstep: 1685.52 | bwd_inner_microstep: 1685.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021 [2024-06-10 20:14:58,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.51 | bwd_microstep: 713.28 | bwd_inner_microstep: 713.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 20:15:00,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1278.78 | bwd_inner_microstep: 1278.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 20:15:02,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-10 20:15:03,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 800.88 | bwd_inner_microstep: 800.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3427 [2024-06-10 20:15:05,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.33 | bwd_microstep: 1234.03 | bwd_inner_microstep: 1234.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3711 [2024-06-10 20:15:07,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1391.22 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514 [2024-06-10 20:15:08,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.05 | bwd_microstep: 1195.50 | bwd_inner_microstep: 1195.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 20:15:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.05 | bwd_microstep: 1278.35 | bwd_inner_microstep: 1278.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 20:15:12,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.09 | bwd_microstep: 1400.62 | bwd_inner_microstep: 1400.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3827 [2024-06-10 20:15:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.89 | bwd_microstep: 1265.40 | bwd_inner_microstep: 1265.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3779 [2024-06-10 20:15:16,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.56 | bwd_microstep: 1473.70 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 20:15:18,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.18 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2657 [2024-06-10 20:15:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.06 | bwd_microstep: 1155.96 | bwd_inner_microstep: 1155.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763 [2024-06-10 20:15:21,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1477.55 | bwd_inner_microstep: 1477.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3463 [2024-06-10 20:15:23,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.96 | bwd_microstep: 1538.53 | bwd_inner_microstep: 1538.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 20:15:30,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.43 | optimizer_step: 6.60 [2024-06-10 20:15:30,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 5660.66 | bwd_inner_microstep: 1525.98 | bwd_allreduce_microstep: 4134.60 | step_microstep: 39.97 [2024-06-10 20:15:30,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15387.11 | bwd: 45328.13 | bwd_inner: 41192.59 | bwd_allreduce: 4134.85 | step: 41.43 {'loss': 1.1378, 'learning_rate': 1.1124782845998632e-05, 'epoch': 0.66} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 20:15:31,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600 [2024-06-10 20:15:33,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 1435.63 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-10 20:15:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.13 | bwd_microstep: 1476.21 | bwd_inner_microstep: 1476.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864 [2024-06-10 20:15:36,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.07 | bwd_microstep: 704.48 | bwd_inner_microstep: 704.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-10 20:15:38,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.57 | bwd_microstep: 1444.87 | bwd_inner_microstep: 1444.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 20:15:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878 [2024-06-10 20:15:41,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.35 | bwd_microstep: 742.49 | bwd_inner_microstep: 742.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 20:15:43,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.25 | bwd_microstep: 1147.66 | bwd_inner_microstep: 1147.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 20:15:45,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1399.94 | bwd_inner_microstep: 1399.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1427 [2024-06-10 20:15:45,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.54 | bwd_microstep: 535.65 | bwd_inner_microstep: 535.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1897 [2024-06-10 20:15:47,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.55 | bwd_microstep: 746.30 | bwd_inner_microstep: 746.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3407 [2024-06-10 20:15:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1437.73 | bwd_inner_microstep: 1437.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656 [2024-06-10 20:15:51,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.08 | bwd_microstep: 1522.13 | bwd_inner_microstep: 1522.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499 [2024-06-10 20:15:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.22 | bwd_microstep: 1575.70 | bwd_inner_microstep: 1575.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 20:15:55,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.54 | bwd_microstep: 1353.54 | bwd_inner_microstep: 1353.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-10 20:15:56,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.99 | bwd_microstep: 697.98 | bwd_inner_microstep: 697.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 20:15:57,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.28 | bwd_microstep: 975.17 | bwd_inner_microstep: 975.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 20:15:59,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 1461.15 | bwd_inner_microstep: 1461.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-10 20:16:00,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.98 | bwd_microstep: 804.20 | bwd_inner_microstep: 804.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-10 20:16:02,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.41 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 20:16:04,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.28 | bwd_microstep: 1258.86 | bwd_inner_microstep: 1258.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 20:16:06,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3848 [2024-06-10 20:16:08,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1365.55 | bwd_inner_microstep: 1365.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 20:16:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1500.55 | bwd_inner_microstep: 1500.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3469 [2024-06-10 20:16:12,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.34 | bwd_microstep: 1215.62 | bwd_inner_microstep: 1215.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3713 [2024-06-10 20:16:14,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.31 | bwd_microstep: 1359.09 | bwd_inner_microstep: 1359.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632 [2024-06-10 20:16:16,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1542.51 | bwd_inner_microstep: 1542.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2268 [2024-06-10 20:16:17,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.99 | bwd_microstep: 935.92 | bwd_inner_microstep: 935.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 20:16:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.89 | bwd_microstep: 1645.57 | bwd_inner_microstep: 1645.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-10 20:16:21,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.00 | bwd_microstep: 1450.57 | bwd_inner_microstep: 1450.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775 [2024-06-10 20:16:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.60 | bwd_microstep: 1741.05 | bwd_inner_microstep: 1741.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809 [2024-06-10 20:16:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.36 | optimizer_step: 6.61 [2024-06-10 20:16:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.93 | bwd_microstep: 6565.02 | bwd_inner_microstep: 1788.55 | bwd_allreduce_microstep: 4776.39 | step_microstep: 38.85 [2024-06-10 20:16:31,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15219.80 | bwd: 45645.57 | bwd_inner: 40868.25 | bwd_allreduce: 4776.64 | step: 40.32 {'loss': 1.2201, 'learning_rate': 1.1091162613017113e-05, 'epoch': 0.66} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3555 [2024-06-10 20:16:33,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1420.34 | bwd_inner_microstep: 1420.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2977 [2024-06-10 20:16:34,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.18 | bwd_microstep: 1096.95 | bwd_inner_microstep: 1096.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-10 20:16:36,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.82 | bwd_microstep: 1309.86 | bwd_inner_microstep: 1309.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-10 20:16:38,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.87 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3771 [2024-06-10 20:16:40,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 4158 [2024-06-10 20:16:42,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.02 | bwd_microstep: 1347.65 | bwd_inner_microstep: 1347.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-10 20:16:43,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.13 | bwd_microstep: 789.56 | bwd_inner_microstep: 789.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768 [2024-06-10 20:16:45,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.73 | bwd_microstep: 1341.62 | bwd_inner_microstep: 1341.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 20:16:47,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.55 | bwd_microstep: 1148.26 | bwd_inner_microstep: 1148.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-10 20:16:49,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1429.57 | bwd_inner_microstep: 1429.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 20:16:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496 [2024-06-10 20:16:52,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.15 | bwd_microstep: 1330.73 | bwd_inner_microstep: 1330.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-10 20:16:54,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1484.01 | bwd_inner_microstep: 1483.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-10 20:16:56,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.02 | bwd_microstep: 1443.88 | bwd_inner_microstep: 1443.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3487 [2024-06-10 20:16:58,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.08 | bwd_microstep: 1439.93 | bwd_inner_microstep: 1439.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 20:17:00,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3624 [2024-06-10 20:17:03,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.47 | bwd_microstep: 1464.08 | bwd_inner_microstep: 1464.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 20:17:04,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1374.29 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499 [2024-06-10 20:17:06,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.59 | bwd_microstep: 1405.47 | bwd_inner_microstep: 1405.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 20:17:08,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1510.35 | bwd_inner_microstep: 1510.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2295 [2024-06-10 20:17:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 814.11 | bwd_inner_microstep: 814.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-10 20:17:11,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.47 | bwd_microstep: 1311.88 | bwd_inner_microstep: 1311.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708 [2024-06-10 20:17:13,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1434.67 | bwd_inner_microstep: 1434.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2280 [2024-06-10 20:17:14,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 786.06 | bwd_inner_microstep: 786.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 20:17:17,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.71 | bwd_inner_microstep: 1556.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 20:17:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1349.20 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 20:17:20,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.13 | bwd_microstep: 1374.76 | bwd_inner_microstep: 1374.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3669 [2024-06-10 20:17:22,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.86 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 20:17:25,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1598.55 | bwd_inner_microstep: 1598.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3448 [2024-06-10 20:17:26,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1300.31 | bwd_inner_microstep: 1300.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 20:17:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.13 | bwd_microstep: 1375.47 | bwd_inner_microstep: 1375.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3385 [2024-06-10 20:17:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 20:17:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.94 | bwd_microstep: 1535.19 | bwd_inner_microstep: 1230.41 | bwd_allreduce_microstep: 304.73 | step_microstep: 37.53 [2024-06-10 20:17:30,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16049.97 | bwd: 43128.23 | bwd_inner: 42822.61 | bwd_allreduce: 304.95 | step: 39.04 {'loss': 1.2306, 'learning_rate': 1.1057573756914573e-05, 'epoch': 0.66} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465 [2024-06-10 20:17:32,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1570.22 | bwd_inner_microstep: 1570.14 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864 [2024-06-10 20:17:33,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.01 | bwd_microstep: 676.57 | bwd_inner_microstep: 676.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 20:17:35,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.87 | bwd_microstep: 1353.80 | bwd_inner_microstep: 1353.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 20:17:37,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1282.69 | bwd_inner_microstep: 1282.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-10 20:17:39,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1437.01 | bwd_inner_microstep: 1436.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3489 [2024-06-10 20:17:41,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.71 | bwd_microstep: 1219.69 | bwd_inner_microstep: 1219.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-10 20:17:43,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.32 | bwd_microstep: 1291.48 | bwd_inner_microstep: 1291.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1965 [2024-06-10 20:17:44,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.09 | bwd_microstep: 732.33 | bwd_inner_microstep: 732.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 20:17:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.22 | bwd_microstep: 1381.22 | bwd_inner_microstep: 1381.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2462 [2024-06-10 20:17:47,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.72 | bwd_microstep: 921.69 | bwd_inner_microstep: 921.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 20:17:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.34 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 20:17:50,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.05 | bwd_microstep: 1392.89 | bwd_inner_microstep: 1392.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2893 [2024-06-10 20:17:52,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.47 | bwd_microstep: 1279.57 | bwd_inner_microstep: 1279.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3930 [2024-06-10 20:17:54,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.85 | bwd_microstep: 1619.40 | bwd_inner_microstep: 1619.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3434 [2024-06-10 20:17:56,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.07 | bwd_microstep: 1391.93 | bwd_inner_microstep: 1391.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3443 [2024-06-10 20:17:58,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2936 [2024-06-10 20:18:00,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.94 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 20:18:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.40 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-10 20:18:04,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.36 | bwd_microstep: 1578.97 | bwd_inner_microstep: 1578.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3463 [2024-06-10 20:18:06,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1359.31 | bwd_inner_microstep: 1359.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3608 [2024-06-10 20:18:08,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.19 | bwd_microstep: 1649.50 | bwd_inner_microstep: 1649.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-10 20:18:10,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1510.81 | bwd_inner_microstep: 1510.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509 [2024-06-10 20:18:12,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1580.49 | bwd_inner_microstep: 1580.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2071 [2024-06-10 20:18:14,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.34 | bwd_microstep: 879.70 | bwd_inner_microstep: 879.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-10 20:18:16,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.74 | bwd_microstep: 1327.13 | bwd_inner_microstep: 1327.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 20:18:18,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1498.41 | bwd_inner_microstep: 1498.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 20:18:20,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1459.79 | bwd_inner_microstep: 1459.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 20:18:22,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 20:18:24,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.11 | bwd_microstep: 1499.47 | bwd_inner_microstep: 1499.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 20:18:26,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.91 | bwd_microstep: 1415.45 | bwd_inner_microstep: 1415.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066 [2024-06-10 20:18:27,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 819.83 | bwd_inner_microstep: 819.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626 [2024-06-10 20:18:32,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.63 [2024-06-10 20:18:32,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 4658.55 | bwd_inner_microstep: 1737.10 | bwd_allreduce_microstep: 2921.40 | step_microstep: 37.70 [2024-06-10 20:18:32,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.52 | bwd: 45499.95 | bwd_inner: 42577.58 | bwd_allreduce: 2921.67 | step: 39.22 {'loss': 1.1829, 'learning_rate': 1.1024016395990758e-05, 'epoch': 0.66} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399 [2024-06-10 20:18:34,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.91 | bwd_microstep: 1359.31 | bwd_inner_microstep: 1359.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844 [2024-06-10 20:18:36,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1461.35 | bwd_inner_microstep: 1461.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 20:18:38,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1343.84 | bwd_inner_microstep: 1343.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2900 [2024-06-10 20:18:39,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.84 | bwd_microstep: 996.90 | bwd_inner_microstep: 996.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 20:18:41,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.62 | bwd_microstep: 1288.25 | bwd_inner_microstep: 1288.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 20:18:43,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1254.45 | bwd_inner_microstep: 1254.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-10 20:18:44,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.90 | bwd_microstep: 1149.50 | bwd_inner_microstep: 1149.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 20:18:46,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1384.91 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422 [2024-06-10 20:18:48,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.98 | bwd_microstep: 1185.83 | bwd_inner_microstep: 1185.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 20:18:50,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1341.48 | bwd_inner_microstep: 1341.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3484 [2024-06-10 20:18:52,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1622.53 | bwd_inner_microstep: 1622.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534 [2024-06-10 20:18:54,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.66 | bwd_microstep: 1584.20 | bwd_inner_microstep: 1584.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488 [2024-06-10 20:18:56,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.39 | bwd_microstep: 1442.60 | bwd_inner_microstep: 1442.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 20:18:58,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.76 | bwd_microstep: 1155.07 | bwd_inner_microstep: 1155.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509 [2024-06-10 20:18:59,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1250.43 | bwd_inner_microstep: 1250.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-10 20:19:01,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.59 | bwd_microstep: 1287.64 | bwd_inner_microstep: 1287.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-10 20:19:03,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.91 | bwd_microstep: 1613.83 | bwd_inner_microstep: 1613.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 20:19:05,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1248.37 | bwd_inner_microstep: 1248.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3434 [2024-06-10 20:19:07,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.73 | bwd_microstep: 1188.05 | bwd_inner_microstep: 1188.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-10 20:19:09,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.90 | bwd_microstep: 1502.76 | bwd_inner_microstep: 1502.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 20:19:11,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.05 | bwd_microstep: 1380.80 | bwd_inner_microstep: 1380.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 20:19:13,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.54 | bwd_microstep: 1498.30 | bwd_inner_microstep: 1498.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-10 20:19:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.76 | bwd_microstep: 1535.04 | bwd_inner_microstep: 1535.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634 [2024-06-10 20:19:17,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.34 | bwd_microstep: 1316.67 | bwd_inner_microstep: 1316.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-10 20:19:19,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.96 | bwd_microstep: 1402.46 | bwd_inner_microstep: 1402.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 20:19:20,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.27 | bwd_microstep: 810.04 | bwd_inner_microstep: 810.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3429 [2024-06-10 20:19:22,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.32 | bwd_microstep: 1203.85 | bwd_inner_microstep: 1203.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3381 [2024-06-10 20:19:23,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1439.12 | bwd_inner_microstep: 1439.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565 [2024-06-10 20:19:26,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.53 | bwd_microstep: 1566.07 | bwd_inner_microstep: 1566.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 20:19:28,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.81 | bwd_microstep: 1505.93 | bwd_inner_microstep: 1505.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 20:19:30,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.96 | bwd_microstep: 1411.82 | bwd_inner_microstep: 1411.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3800 [2024-06-10 20:19:33,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 20:19:33,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.36 | bwd_microstep: 2302.94 | bwd_inner_microstep: 1916.53 | bwd_allreduce_microstep: 386.35 | step_microstep: 37.51 [2024-06-10 20:19:33,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16286.87 | bwd: 44034.35 | bwd_inner: 43647.10 | bwd_allreduce: 386.57 | step: 38.97 05<10:09:25, 61.66s/it] 66%|██████▌ | 1133/1726 [19:37:05<10:09:25, 61.66s/it] 66%|██████▌ | 1134/1726 [19:38:06<10:06:33, 61.47s/it] 66%|██████▌ | 1134/1726 [19:38:06<10:06:33, 61.47s/it] 66%|██████▌ | 1135/1726 [19:39:08<10:04:41, 61.39s/it] 66%|██████▌ | 1135/1726 [19:39:08<10:04:41, 61.39s/it] 66%|██████▌ | 1136/1726 [19:40:07<9:58:06, 60.82s/it] 66%|██████▌ | 1136/1726 [19:40:07<9:58:06, 60.82s/it] 66%|██████▌ | 1137/1726 [19:41:09<9:59:38, 61.08s/it] 66%|██████▌ | 1137/1726 [19:41:09<9:59:38, 61.08s/it] 66%|██████▌ | 1138/1726 [19:42:09<9:57:21, 60{'loss': 1.2093, 'learning_rate': 1.0990490648434541e-05, 'epoch': 0.66} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3399 [2024-06-10 20:19:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1371.79 | bwd_inner_microstep: 1371.71 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 20:19:36,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2352 [2024-06-10 20:19:38,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.72 | bwd_microstep: 1050.93 | bwd_inner_microstep: 1050.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239 [2024-06-10 20:19:39,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.84 | bwd_microstep: 960.33 | bwd_inner_microstep: 960.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-10 20:19:40,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 707.95 | bwd_inner_microstep: 707.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 20:19:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1245.28 | bwd_inner_microstep: 1245.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3714 [2024-06-10 20:19:44,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.50 | bwd_microstep: 1493.13 | bwd_inner_microstep: 1493.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 20:19:46,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742 [2024-06-10 20:19:48,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.20 | bwd_microstep: 1638.32 | bwd_inner_microstep: 1638.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 20:19:50,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.66 | bwd_microstep: 1249.51 | bwd_inner_microstep: 1249.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3425 [2024-06-10 20:19:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.89 | bwd_microstep: 1214.39 | bwd_inner_microstep: 1214.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 20:19:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.64 | bwd_microstep: 1380.68 | bwd_inner_microstep: 1380.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956 [2024-06-10 20:19:54,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.16 | bwd_microstep: 891.28 | bwd_inner_microstep: 891.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3908 [2024-06-10 20:19:57,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.27 | bwd_microstep: 1735.93 | bwd_inner_microstep: 1735.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-10 20:19:59,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.00 | bwd_microstep: 1481.47 | bwd_inner_microstep: 1481.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 20:20:01,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.24 | bwd_microstep: 1306.06 | bwd_inner_microstep: 1306.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-10 20:20:03,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.39 | bwd_microstep: 1656.72 | bwd_inner_microstep: 1656.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949 [2024-06-10 20:20:04,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 698.83 | bwd_inner_microstep: 698.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3814 [2024-06-10 20:20:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1384.98 | bwd_inner_microstep: 1384.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-10 20:20:08,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1418.82 | bwd_inner_microstep: 1418.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 20:20:10,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1494.02 | bwd_inner_microstep: 1493.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2293 [2024-06-10 20:20:11,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 785.07 | bwd_inner_microstep: 785.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-10 20:20:13,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 20:20:15,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3804 [2024-06-10 20:20:17,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1581.80 | bwd_inner_microstep: 1581.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3548 [2024-06-10 20:20:19,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.26 | bwd_microstep: 1325.95 | bwd_inner_microstep: 1325.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3598 [2024-06-10 20:20:21,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.88 | bwd_microstep: 1533.86 | bwd_inner_microstep: 1533.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 20:20:23,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.86 | bwd_microstep: 1600.18 | bwd_inner_microstep: 1600.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2279 [2024-06-10 20:20:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.81 | bwd_microstep: 1074.41 | bwd_inner_microstep: 1074.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2221 [2024-06-10 20:20:26,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.63 | bwd_microstep: 1060.37 | bwd_inner_microstep: 1060.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565 [2024-06-10 20:20:28,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.59 | bwd_microstep: 1598.48 | bwd_inner_microstep: 1598.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581 [2024-06-10 20:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 20:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.84 | bwd_microstep: 4893.32 | bwd_inner_microstep: 1526.79 | bwd_allreduce_microstep: 3366.48 | step_microstep: 37.90 [2024-06-10 20:20:33,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15503.36 | bwd: 44958.77 | bwd_inner: 41591.32 | bwd_allreduce: 3366.74 | step: 39.43 {'loss': 1.1462, 'learning_rate': 1.095699663232342e-05, 'epoch': 0.66} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 20:20:35,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.72 | bwd_microstep: 1296.52 | bwd_inner_microstep: 1296.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951 [2024-06-10 20:20:37,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.15 | bwd_microstep: 1595.09 | bwd_inner_microstep: 1595.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 20:20:39,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.74 | bwd_inner_microstep: 1378.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 20:20:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.11 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 20:20:43,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1383.87 | bwd_inner_microstep: 1383.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 20:20:45,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.56 | bwd_microstep: 1150.67 | bwd_inner_microstep: 1150.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730 [2024-06-10 20:20:47,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.51 | bwd_microstep: 1634.60 | bwd_inner_microstep: 1634.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-10 20:20:48,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.95 | bwd_microstep: 795.57 | bwd_inner_microstep: 795.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3491 [2024-06-10 20:20:50,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.10 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495 [2024-06-10 20:20:52,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1575.86 | bwd_inner_microstep: 1575.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 20:20:54,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1477.58 | bwd_inner_microstep: 1477.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 20:20:56,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1447.79 | bwd_inner_microstep: 1447.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 20:20:58,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 804.89 | bwd_inner_microstep: 804.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3627 [2024-06-10 20:21:00,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.99 | bwd_microstep: 1705.32 | bwd_inner_microstep: 1705.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473 [2024-06-10 20:21:02,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.90 | bwd_microstep: 1437.08 | bwd_inner_microstep: 1437.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 20:21:04,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1375.67 | bwd_inner_microstep: 1375.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 20:21:06,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1395.46 | bwd_inner_microstep: 1395.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:21:07,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1289.66 | bwd_inner_microstep: 1289.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-10 20:21:10,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.40 | bwd_microstep: 1621.86 | bwd_inner_microstep: 1621.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 20:21:11,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1286.95 | bwd_inner_microstep: 1286.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2168 [2024-06-10 20:21:13,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.95 | bwd_microstep: 950.53 | bwd_inner_microstep: 950.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 20:21:15,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.34 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 20:21:17,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1280.11 | bwd_inner_microstep: 1280.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 20:21:19,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1392.88 | bwd_inner_microstep: 1392.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3803 [2024-06-10 20:21:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.85 | bwd_microstep: 1600.53 | bwd_inner_microstep: 1600.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 20:21:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1406.41 | bwd_inner_microstep: 1406.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 20:21:25,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1509.42 | bwd_inner_microstep: 1509.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 20:21:27,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1416.13 | bwd_inner_microstep: 1416.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3612 [2024-06-10 20:21:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1440.79 | bwd_inner_microstep: 1440.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 20:21:31,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743 [2024-06-10 20:21:33,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.44 | bwd_microstep: 1438.31 | bwd_inner_microstep: 1438.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068 [2024-06-10 20:21:37,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.14 | optimizer_step: 6.61 [2024-06-10 20:21:37,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.88 | bwd_microstep: 3903.11 | bwd_inner_microstep: 1044.29 | bwd_allreduce_microstep: 2858.77 | step_microstep: 38.83 [2024-06-10 20:21:37,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16369.06 | bwd: 46771.77 | bwd_inner: 43912.10 | bwd_allreduce: 2859.00 | step: 40.31 {'loss': 1.176, 'learning_rate': 1.0923534465623165e-05, 'epoch': 0.66} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 20:21:39,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1469.07 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-10 20:21:41,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.42 | bwd_microstep: 1273.84 | bwd_inner_microstep: 1273.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511 [2024-06-10 20:21:43,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.58 | bwd_microstep: 1345.97 | bwd_inner_microstep: 1345.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-10 20:21:45,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1542.37 | bwd_inner_microstep: 1542.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3741 [2024-06-10 20:21:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 20:21:49,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1484.05 | bwd_inner_microstep: 1484.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 20:21:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 793.14 | bwd_inner_microstep: 793.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 20:21:52,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1287.88 | bwd_inner_microstep: 1287.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3414 [2024-06-10 20:21:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.64 | bwd_microstep: 1200.16 | bwd_inner_microstep: 1200.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2097 [2024-06-10 20:21:54,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 759.49 | bwd_inner_microstep: 759.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 20:21:56,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.70 | bwd_microstep: 1258.52 | bwd_inner_microstep: 1258.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974 [2024-06-10 20:21:57,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.55 | bwd_microstep: 890.75 | bwd_inner_microstep: 890.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 20:21:59,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1410.36 | bwd_inner_microstep: 1410.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 20:22:01,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-10 20:22:03,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.78 | bwd_microstep: 1420.46 | bwd_inner_microstep: 1420.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 20:22:05,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1276.89 | bwd_inner_microstep: 1276.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3431 [2024-06-10 20:22:07,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.02 | bwd_microstep: 1309.68 | bwd_inner_microstep: 1309.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3454 [2024-06-10 20:22:08,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1286.46 | bwd_inner_microstep: 1286.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2433 [2024-06-10 20:22:10,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.58 | bwd_microstep: 1043.39 | bwd_inner_microstep: 1043.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2353 [2024-06-10 20:22:11,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.54 | bwd_microstep: 830.33 | bwd_inner_microstep: 830.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2301 [2024-06-10 20:22:12,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.65 | bwd_microstep: 912.16 | bwd_inner_microstep: 912.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-10 20:22:14,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.55 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3481 [2024-06-10 20:22:16,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1316.47 | bwd_inner_microstep: 1316.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 20:22:18,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1405.09 | bwd_inner_microstep: 1405.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 20:22:20,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 20:22:22,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1299.53 | bwd_inner_microstep: 1299.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 20:22:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.22 | bwd_microstep: 1477.36 | bwd_inner_microstep: 1477.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 20:22:26,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.39 | bwd_microstep: 1644.11 | bwd_inner_microstep: 1644.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3803 [2024-06-10 20:22:28,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.11 | bwd_microstep: 1723.39 | bwd_inner_microstep: 1723.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 20:22:30,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.25 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 20:22:32,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.12 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815 [2024-06-10 20:22:39,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-10 20:22:39,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.86 | bwd_microstep: 6479.32 | bwd_inner_microstep: 1987.76 | bwd_allreduce_microstep: 4491.51 | step_microstep: 37.95 [2024-06-10 20:22:39,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15664.23 | bwd: 46487.37 | bwd_inner: 41994.95 | bwd_allreduce: 4491.74 | step: 39.38 {'loss': 1.2104, 'learning_rate': 1.089010426618732e-05, 'epoch': 0.66} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 20:22:41,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1331.89 | bwd_inner_microstep: 1331.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2397 [2024-06-10 20:22:42,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.30 | bwd_microstep: 901.33 | bwd_inner_microstep: 901.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2718 [2024-06-10 20:22:44,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.05 | bwd_microstep: 1028.40 | bwd_inner_microstep: 1028.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 20:22:46,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.02 | bwd_microstep: 1145.78 | bwd_inner_microstep: 1145.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 20:22:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1389.12 | bwd_inner_microstep: 1389.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 20:22:49,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 20:22:51,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.36 | bwd_microstep: 1249.64 | bwd_inner_microstep: 1249.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026 [2024-06-10 20:22:52,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.02 | bwd_microstep: 807.88 | bwd_inner_microstep: 807.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-10 20:22:54,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1628.09 | bwd_inner_microstep: 1628.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3438 [2024-06-10 20:22:56,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.53 | bwd_microstep: 1541.15 | bwd_inner_microstep: 1541.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675 [2024-06-10 20:22:59,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.01 | bwd_microstep: 1718.63 | bwd_inner_microstep: 1718.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3966 [2024-06-10 20:23:01,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.00 | bwd_microstep: 1691.21 | bwd_inner_microstep: 1691.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-10 20:23:03,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.54 | bwd_microstep: 1513.36 | bwd_inner_microstep: 1513.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447 [2024-06-10 20:23:05,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.22 | bwd_microstep: 1373.50 | bwd_inner_microstep: 1373.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390 [2024-06-10 20:23:07,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1336.47 | bwd_inner_microstep: 1336.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3224 [2024-06-10 20:23:08,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.54 | bwd_microstep: 1173.37 | bwd_inner_microstep: 1173.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 20:23:10,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1379.21 | bwd_inner_microstep: 1379.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1976 [2024-06-10 20:23:11,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.04 | bwd_microstep: 704.20 | bwd_inner_microstep: 704.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 20:23:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1510.74 | bwd_inner_microstep: 1510.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870 [2024-06-10 20:23:15,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.28 | bwd_microstep: 1463.62 | bwd_inner_microstep: 1463.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 955 [2024-06-10 20:23:16,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.22 | bwd_microstep: 380.09 | bwd_inner_microstep: 380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 20:23:18,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.24 | bwd_microstep: 1248.40 | bwd_inner_microstep: 1248.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 20:23:20,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.09 | bwd_microstep: 1546.27 | bwd_inner_microstep: 1546.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 20:23:22,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1398.60 | bwd_inner_microstep: 1398.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-10 20:23:24,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821 [2024-06-10 20:23:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.36 | bwd_microstep: 1418.68 | bwd_inner_microstep: 1418.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768 [2024-06-10 20:23:28,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.09 | bwd_microstep: 1468.43 | bwd_inner_microstep: 1468.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 20:23:30,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1454.62 | bwd_inner_microstep: 1454.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643 [2024-06-10 20:23:32,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.04 | bwd_microstep: 1616.05 | bwd_inner_microstep: 1616.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057 [2024-06-10 20:23:33,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.75 | bwd_microstep: 847.93 | bwd_inner_microstep: 847.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-10 20:23:35,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.14 | bwd_microstep: 1193.00 | bwd_inner_microstep: 1192.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 20:23:42,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.22 | optimizer_step: 6.58 [2024-06-10 20:23:42,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.52 | bwd_microstep: 7030.13 | bwd_inner_microstep: 1691.40 | bwd_allreduce_microstep: 5338.68 | step_microstep: 37.92 [2024-06-10 20:23:42,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15558.35 | bwd: 47093.67 | bwd_inner: 41754.09 | bwd_allreduce: 5338.90 | step: 39.36 {'loss': 1.2498, 'learning_rate': 1.0856706151756902e-05, 'epoch': 0.66} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3418 [2024-06-10 20:23:44,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.33 | bwd_microstep: 1273.55 | bwd_inner_microstep: 1273.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 20:23:46,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.57 | bwd_microstep: 1277.47 | bwd_inner_microstep: 1277.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286 [2024-06-10 20:23:47,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.97 | bwd_microstep: 872.88 | bwd_inner_microstep: 872.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 20:23:49,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 20:23:51,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.43 | bwd_microstep: 1279.53 | bwd_inner_microstep: 1279.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-10 20:23:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1437.99 | bwd_inner_microstep: 1437.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-10 20:23:54,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 802.24 | bwd_inner_microstep: 802.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2937 [2024-06-10 20:23:55,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.09 | bwd_microstep: 1031.78 | bwd_inner_microstep: 1031.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 20:23:57,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.54 | bwd_microstep: 1428.48 | bwd_inner_microstep: 1428.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 20:23:59,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.07 | bwd_microstep: 1251.87 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 20:24:01,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1346.87 | bwd_inner_microstep: 1346.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 20:24:03,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1458.05 | bwd_inner_microstep: 1458.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 20:24:05,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.72 | bwd_microstep: 1616.82 | bwd_inner_microstep: 1616.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2797 [2024-06-10 20:24:07,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.80 | bwd_microstep: 1150.16 | bwd_inner_microstep: 1150.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1976 [2024-06-10 20:24:08,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.07 | bwd_microstep: 891.51 | bwd_inner_microstep: 891.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649 [2024-06-10 20:24:10,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.83 | bwd_microstep: 1516.23 | bwd_inner_microstep: 1516.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474 [2024-06-10 20:24:12,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.44 | bwd_microstep: 1581.73 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3525 [2024-06-10 20:24:14,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1519.37 | bwd_inner_microstep: 1519.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534 [2024-06-10 20:24:16,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.13 | bwd_microstep: 1227.24 | bwd_inner_microstep: 1227.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-10 20:24:18,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1370.95 | bwd_inner_microstep: 1370.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 20:24:20,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.39 | bwd_microstep: 1450.99 | bwd_inner_microstep: 1450.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3400 [2024-06-10 20:24:22,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.41 | bwd_microstep: 1205.01 | bwd_inner_microstep: 1204.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539 [2024-06-10 20:24:23,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.35 | bwd_microstep: 1197.76 | bwd_inner_microstep: 1197.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3809 [2024-06-10 20:24:26,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.40 | bwd_microstep: 1716.85 | bwd_inner_microstep: 1716.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181 [2024-06-10 20:24:27,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.16 | bwd_microstep: 864.51 | bwd_inner_microstep: 864.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 20:24:29,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1348.74 | bwd_inner_microstep: 1348.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 20:24:31,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.62 | bwd_microstep: 1401.06 | bwd_inner_microstep: 1401.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 20:24:33,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.01 | bwd_microstep: 1626.91 | bwd_inner_microstep: 1626.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 20:24:35,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.37 | bwd_microstep: 1651.57 | bwd_inner_microstep: 1651.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569 [2024-06-10 20:24:37,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.00 | bwd_microstep: 1203.98 | bwd_inner_microstep: 1203.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-10 20:24:39,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1499.71 | bwd_inner_microstep: 1499.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2049 [2024-06-10 20:24:46,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.35 | optimizer_step: 6.64 [2024-06-10 20:24:46,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.55 | bwd_microstep: 7063.24 | bwd_inner_microstep: 1078.65 | bwd_allreduce_microstep: 5984.53 | step_microstep: 42.19 [2024-06-10 20:24:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15681.59 | bwd: 47943.55 | bwd_inner: 41958.10 | bwd_allreduce: 5984.76 | step: 43.67 .96s/it] 66%|██████▌ | 1138/1726 [19:42:09<9:57:21, 60.96s/it] 66%|██████▌ | 1139/1726 [19:43:10<9:55:53, 60.91s/it] 66%|██████▌ | 1139/1726 [19:43:10<9:55:53, 60.91s/it] 66%|██████▌ | 1140/1726 [19:44:14<10:02:25, 61.68s/it] 66%|██████▌ | 1140/1726 [19:44:14<10:02:25, 61.68s/it] 66%|██████▌ | 1141/1726 [19:45:16<10:03:43, 61.92s/it] 66%|██████▌ | 1141/1726 [19:45:16<10:03:43, 61.92s/it] 66%|██████▌ | 1142/1726 [19:46:19<10:05:46, 62.24s/it] 66%|██████▌ | 1142/1726 [19:46:19<10:05:46, 62.24s/it] 66%|██████▌ | 1143/1726 [19:47:23<10:09:45, 62.75s/it] {'loss': 1.163, 'learning_rate': 1.0823340239959883e-05, 'epoch': 0.66} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3475 [2024-06-10 20:24:48,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.14 | bwd_microstep: 1567.51 | bwd_inner_microstep: 1567.36 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 20:24:50,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1252.20 | bwd_inner_microstep: 1252.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2400 [2024-06-10 20:24:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.70 | bwd_microstep: 901.87 | bwd_inner_microstep: 901.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3801 [2024-06-10 20:24:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1480.15 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:24:55,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1381.42 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238 [2024-06-10 20:24:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.74 | bwd_microstep: 961.03 | bwd_inner_microstep: 961.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 20:24:59,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1348.45 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 20:25:01,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1388.69 | bwd_inner_microstep: 1388.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 20:25:02,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 793.50 | bwd_inner_microstep: 793.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948 [2024-06-10 20:25:03,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.38 | bwd_microstep: 730.30 | bwd_inner_microstep: 730.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 20:25:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.45 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 20:25:06,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.45 | bwd_microstep: 799.61 | bwd_inner_microstep: 799.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3444 [2024-06-10 20:25:07,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1205.83 | bwd_inner_microstep: 1205.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3984 [2024-06-10 20:25:10,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.49 | bwd_microstep: 1810.60 | bwd_inner_microstep: 1810.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652 [2024-06-10 20:25:12,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.81 | bwd_microstep: 1716.76 | bwd_inner_microstep: 1716.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3675 [2024-06-10 20:25:14,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521 [2024-06-10 20:25:16,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1322.23 | bwd_inner_microstep: 1322.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 20:25:18,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.47 | bwd_microstep: 1257.92 | bwd_inner_microstep: 1257.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 20:25:20,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.43 | bwd_microstep: 1409.79 | bwd_inner_microstep: 1409.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3609 [2024-06-10 20:25:22,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.29 | bwd_microstep: 1673.22 | bwd_inner_microstep: 1673.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 20:25:24,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1401.84 | bwd_inner_microstep: 1401.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-10 20:25:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1456.51 | bwd_inner_microstep: 1456.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812 [2024-06-10 20:25:28,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.17 | bwd_microstep: 1514.41 | bwd_inner_microstep: 1514.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525 [2024-06-10 20:25:30,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.95 | bwd_microstep: 1328.12 | bwd_inner_microstep: 1328.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 20:25:32,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.07 | bwd_microstep: 1298.66 | bwd_inner_microstep: 1298.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454 [2024-06-10 20:25:33,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1317.82 | bwd_inner_microstep: 1317.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-10 20:25:36,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.51 | bwd_microstep: 1582.10 | bwd_inner_microstep: 1582.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3799 [2024-06-10 20:25:38,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1622.38 | bwd_inner_microstep: 1622.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-10 20:25:40,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.24 | bwd_microstep: 1445.62 | bwd_inner_microstep: 1445.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3748 [2024-06-10 20:25:42,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.04 | bwd_microstep: 1738.33 | bwd_inner_microstep: 1738.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-10 20:25:44,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1548.13 | bwd_inner_microstep: 1548.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 20:25:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.07 | optimizer_gradients: 4.04 | optimizer_step: 6.63 [2024-06-10 20:25:46,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1516.48 | bwd_inner_microstep: 1508.79 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.74 [2024-06-10 20:25:46,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16240.64 | bwd: 43512.32 | bwd_inner: 43503.66 | bwd_allreduce: 7.93 | step: 39.26 {'loss': 1.1666, 'learning_rate': 1.0790006648310828e-05, 'epoch': 0.66} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1887 [2024-06-10 20:25:47,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.29 | bwd_microstep: 773.58 | bwd_inner_microstep: 773.50 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 20:25:49,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1355.63 | bwd_inner_microstep: 1355.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3952 [2024-06-10 20:25:51,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.66 | bwd_microstep: 1501.93 | bwd_inner_microstep: 1501.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 20:25:53,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480 [2024-06-10 20:25:55,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.82 | bwd_microstep: 1186.01 | bwd_inner_microstep: 1185.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 20:25:57,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1301.24 | bwd_inner_microstep: 1301.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-10 20:25:59,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.34 | bwd_microstep: 1635.82 | bwd_inner_microstep: 1635.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4084 [2024-06-10 20:26:01,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.48 | bwd_microstep: 1729.08 | bwd_inner_microstep: 1729.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3712 [2024-06-10 20:26:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.91 | bwd_microstep: 1631.23 | bwd_inner_microstep: 1631.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968 [2024-06-10 20:26:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.89 | bwd_microstep: 709.06 | bwd_inner_microstep: 709.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3458 [2024-06-10 20:26:06,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1423.70 | bwd_inner_microstep: 1423.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 20:26:08,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1343.45 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397 [2024-06-10 20:26:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.54 | bwd_microstep: 1439.79 | bwd_inner_microstep: 1439.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730 [2024-06-10 20:26:13,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.09 | bwd_microstep: 1730.10 | bwd_inner_microstep: 1730.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518 [2024-06-10 20:26:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.92 | bwd_microstep: 1584.45 | bwd_inner_microstep: 1584.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 20:26:17,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.96 | bwd_microstep: 1608.84 | bwd_inner_microstep: 1608.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3751 [2024-06-10 20:26:20,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.20 | bwd_microstep: 1844.23 | bwd_inner_microstep: 1844.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3646 [2024-06-10 20:26:22,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.86 | bwd_microstep: 1542.46 | bwd_inner_microstep: 1542.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037 [2024-06-10 20:26:23,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.63 | bwd_microstep: 717.94 | bwd_inner_microstep: 717.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 618 [2024-06-10 20:26:23,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.56 | bwd_microstep: 260.31 | bwd_inner_microstep: 260.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 20:26:25,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1461.47 | bwd_inner_microstep: 1461.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 20:26:27,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.65 | bwd_microstep: 1379.89 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 20:26:29,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1422.58 | bwd_inner_microstep: 1422.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446 [2024-06-10 20:26:31,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.69 | bwd_microstep: 1312.54 | bwd_inner_microstep: 1312.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 20:26:33,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609 [2024-06-10 20:26:35,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.63 | bwd_microstep: 1370.54 | bwd_inner_microstep: 1370.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 20:26:37,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1551.16 | bwd_inner_microstep: 1551.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1907 [2024-06-10 20:26:38,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.27 | bwd_microstep: 685.78 | bwd_inner_microstep: 685.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 20:26:39,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.10 | bwd_microstep: 1157.01 | bwd_inner_microstep: 1156.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3789 [2024-06-10 20:26:42,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.94 | bwd_microstep: 1718.14 | bwd_inner_microstep: 1718.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 20:26:44,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.38 | bwd_microstep: 1453.79 | bwd_inner_microstep: 1453.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3427 [2024-06-10 20:26:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.27 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-10 20:26:49,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 4740.03 | bwd_inner_microstep: 1695.19 | bwd_allreduce_microstep: 3044.79 | step_microstep: 38.40 [2024-06-10 20:26:49,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16028.88 | bwd: 46235.06 | bwd_inner: 43189.30 | bwd_allreduce: 3045.06 | step: 39.84 {'loss': 1.1101, 'learning_rate': 1.0756705494210489e-05, 'epoch': 0.66} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 20:26:51,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.14 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3549 [2024-06-10 20:26:53,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1359.37 | bwd_inner_microstep: 1359.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 20:26:55,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3798 [2024-06-10 20:26:57,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.70 | bwd_microstep: 1647.54 | bwd_inner_microstep: 1647.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 20:26:59,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1285.35 | bwd_inner_microstep: 1285.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 20:27:01,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1384.18 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 20:27:02,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1247.49 | bwd_inner_microstep: 1247.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-10 20:27:04,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.08 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3577 [2024-06-10 20:27:06,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.54 | bwd_microstep: 1207.79 | bwd_inner_microstep: 1207.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 20:27:08,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1388.34 | bwd_inner_microstep: 1388.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688 [2024-06-10 20:27:10,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2187 [2024-06-10 20:27:11,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.52 | bwd_microstep: 859.37 | bwd_inner_microstep: 859.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637 [2024-06-10 20:27:13,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1319.19 | bwd_inner_microstep: 1319.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990 [2024-06-10 20:27:14,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.62 | bwd_microstep: 898.60 | bwd_inner_microstep: 898.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 20:27:16,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3126 [2024-06-10 20:27:17,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.83 | bwd_microstep: 1062.97 | bwd_inner_microstep: 1062.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3827 [2024-06-10 20:27:20,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.77 | bwd_microstep: 1761.23 | bwd_inner_microstep: 1761.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 20:27:22,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3822 [2024-06-10 20:27:24,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1498.10 | bwd_inner_microstep: 1498.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3536 [2024-06-10 20:27:26,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1420.42 | bwd_inner_microstep: 1420.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 20:27:27,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.57 | bwd_microstep: 1256.48 | bwd_inner_microstep: 1256.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 20:27:29,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1399.50 | bwd_inner_microstep: 1399.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012 [2024-06-10 20:27:30,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.88 | bwd_microstep: 806.38 | bwd_inner_microstep: 806.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2020 [2024-06-10 20:27:32,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.41 | bwd_microstep: 899.33 | bwd_inner_microstep: 899.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3603 [2024-06-10 20:27:34,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.08 | bwd_microstep: 1565.28 | bwd_inner_microstep: 1565.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 20:27:36,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.23 | bwd_microstep: 1651.64 | bwd_inner_microstep: 1651.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 20:27:38,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.25 | bwd_microstep: 1451.86 | bwd_inner_microstep: 1451.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-10 20:27:39,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.65 | bwd_microstep: 810.06 | bwd_inner_microstep: 810.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-10 20:27:41,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.24 | bwd_microstep: 1489.27 | bwd_inner_microstep: 1489.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 20:27:43,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1249.95 | bwd_inner_microstep: 1249.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 20:27:45,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1401.49 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 20:27:51,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 20:27:51,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 5061.60 | bwd_inner_microstep: 1687.96 | bwd_allreduce_microstep: 3373.58 | step_microstep: 38.19 [2024-06-10 20:27:51,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.88 | bwd: 45557.72 | bwd_inner: 42183.22 | bwd_allreduce: 3373.82 | step: 39.66 {'loss': 1.2439, 'learning_rate': 1.0723436894945345e-05, 'epoch': 0.66} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-10 20:27:52,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.83 | bwd_microstep: 1302.98 | bwd_inner_microstep: 1302.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3964 [2024-06-10 20:27:55,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.84 | bwd_microstep: 1596.09 | bwd_inner_microstep: 1596.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3835 [2024-06-10 20:27:57,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.53 | bwd_microstep: 1484.61 | bwd_inner_microstep: 1484.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 20:27:59,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1341.56 | bwd_inner_microstep: 1341.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 20:28:00,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1275.94 | bwd_inner_microstep: 1275.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743 [2024-06-10 20:28:02,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.28 | bwd_microstep: 1438.28 | bwd_inner_microstep: 1438.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2729 [2024-06-10 20:28:04,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.37 | bwd_microstep: 1037.83 | bwd_inner_microstep: 1037.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 20:28:05,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-10 20:28:07,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1411.97 | bwd_inner_microstep: 1411.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2089 [2024-06-10 20:28:09,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 866.89 | bwd_inner_microstep: 866.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 20:28:10,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1255.35 | bwd_inner_microstep: 1255.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3486 [2024-06-10 20:28:12,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.30 | bwd_microstep: 1404.11 | bwd_inner_microstep: 1404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 20:28:14,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1381.86 | bwd_inner_microstep: 1381.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435 [2024-06-10 20:28:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.56 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650 [2024-06-10 20:28:18,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.75 | bwd_microstep: 1711.93 | bwd_inner_microstep: 1711.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 20:28:20,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1343.32 | bwd_inner_microstep: 1343.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2078 [2024-06-10 20:28:22,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.33 | bwd_microstep: 1011.11 | bwd_inner_microstep: 1011.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 20:28:23,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.07 | bwd_microstep: 1293.33 | bwd_inner_microstep: 1293.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 20:28:26,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.93 | bwd_microstep: 1659.23 | bwd_inner_microstep: 1659.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3651 [2024-06-10 20:28:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1446.75 | bwd_inner_microstep: 1446.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 20:28:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.13 | bwd_microstep: 1634.47 | bwd_inner_microstep: 1634.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843 [2024-06-10 20:28:32,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1562.06 | bwd_inner_microstep: 1562.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 923 [2024-06-10 20:28:33,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 156.34 | bwd_microstep: 405.80 | bwd_inner_microstep: 405.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2303 [2024-06-10 20:28:34,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.67 | bwd_microstep: 848.83 | bwd_inner_microstep: 848.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-10 20:28:36,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1496.94 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3607 [2024-06-10 20:28:38,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1243.39 | bwd_inner_microstep: 1243.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 20:28:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.08 | bwd_microstep: 1463.15 | bwd_inner_microstep: 1463.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598 [2024-06-10 20:28:42,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1450.81 | bwd_inner_microstep: 1450.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 20:28:44,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603 [2024-06-10 20:28:46,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.51 | bwd_microstep: 1463.18 | bwd_inner_microstep: 1463.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 20:28:47,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.45 | bwd_microstep: 1348.62 | bwd_inner_microstep: 1348.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-10 20:28:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-10 20:28:51,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.68 | bwd_microstep: 2583.20 | bwd_inner_microstep: 1324.83 | bwd_allreduce_microstep: 1258.32 | step_microstep: 37.78 [2024-06-10 20:28:51,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15854.33 | bwd: 43735.90 | bwd_inner: 42476.68 | bwd_allreduce: 1258.55 | step: 39.18 {'loss': 1.2171, 'learning_rate': 1.0690200967687234e-05, 'epoch': 0.66} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 20:28:53,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1475.69 | bwd_inner_microstep: 1475.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3987 [2024-06-10 20:28:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.67 | bwd_microstep: 1701.44 | bwd_inner_microstep: 1701.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 20:28:57,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 20:28:59,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1554.38 | bwd_inner_microstep: 1554.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 20:29:01,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1545.40 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3748 [2024-06-10 20:29:04,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.38 | bwd_microstep: 1845.31 | bwd_inner_microstep: 1845.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 20:29:05,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1252.85 | bwd_inner_microstep: 1252.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1366 [2024-06-10 20:29:06,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.66 | bwd_microstep: 519.59 | bwd_inner_microstep: 519.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 20:29:08,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1376.25 | bwd_inner_microstep: 1376.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440 [2024-06-10 20:29:10,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1313.34 | bwd_inner_microstep: 1313.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 20:29:12,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 20:29:13,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1381.71 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3613 [2024-06-10 20:29:16,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.25 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 20:29:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.57 | bwd_microstep: 1350.99 | bwd_inner_microstep: 1350.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-10 20:29:19,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.57 | bwd_microstep: 890.81 | bwd_inner_microstep: 890.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3655 [2024-06-10 20:29:21,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.26 | bwd_microstep: 1817.63 | bwd_inner_microstep: 1817.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 20:29:23,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1345.00 | bwd_inner_microstep: 1344.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1963 [2024-06-10 20:29:24,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.75 | bwd_microstep: 702.34 | bwd_inner_microstep: 702.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 20:29:26,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1508.65 | bwd_inner_microstep: 1508.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 20:29:28,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1398.19 | bwd_inner_microstep: 1398.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-10 20:29:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.84 | bwd_microstep: 684.43 | bwd_inner_microstep: 684.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 20:29:31,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.81 | bwd_microstep: 1382.48 | bwd_inner_microstep: 1382.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-10 20:29:33,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3668 [2024-06-10 20:29:34,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.92 | bwd_microstep: 1230.10 | bwd_inner_microstep: 1230.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474 [2024-06-10 20:29:36,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1230.04 | bwd_inner_microstep: 1230.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679 [2024-06-10 20:29:38,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1423.43 | bwd_inner_microstep: 1423.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 20:29:40,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.23 | bwd_microstep: 1473.28 | bwd_inner_microstep: 1473.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1982 [2024-06-10 20:29:41,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.04 | bwd_microstep: 927.26 | bwd_inner_microstep: 927.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431 [2024-06-10 20:29:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.22 | bwd_microstep: 1375.69 | bwd_inner_microstep: 1375.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 20:29:45,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1493.06 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 20:29:47,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1528.28 | bwd_inner_microstep: 1528.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3460 [2024-06-10 20:29:53,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.19 | optimizer_step: 6.62 [2024-06-10 20:29:53,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 4567.51 | bwd_inner_microstep: 1696.95 | bwd_allreduce_microstep: 2870.51 | step_microstep: 38.04 [2024-06-10 20:29:53,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15968.36 | bwd: 45779.63 | bwd_inner: 42908.21 | bwd_allreduce: 2870.74 | step: 39.47 66%|██████▌ | 1143/1726 [19:47:23<10:09:45, 62.75s/it] 66%|██████▋ | 1144/1726 [19:48:23<10:00:56, 61.95s/it] 66%|██████▋ | 1144/1726 [19:48:23<10:00:56, 61.95s/it] 66%|██████▋ | 1145/1726 [19:49:26<10:01:46, 62.15s/it] 66%|██████▋ | 1145/1726 [19:49:26<10:01:46, 62.15s/it] 66%|██████▋ | 1146/1726 [19:50:27<9:59:15, 61.99s/it] 66%|██████▋ | 1146/1726 [19:50:27<9:59:15, 61.99s/it] 66%|██████▋ | 1147/1726 [19:51:27<9:52:12, 61.37s/it] 66%|██████▋ | 1147/1726 [19:51:27<9:52:12, 61.37s/it] 67%|██████▋ | 1148/1726 [19:52:29<9:53:13, 61.58s/it] {'loss': 1.1854, 'learning_rate': 1.0656997829492912e-05, 'epoch': 0.67} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 20:29:55,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1459.90 | bwd_inner_microstep: 1459.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3388 [2024-06-10 20:29:56,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.26 | bwd_microstep: 1140.83 | bwd_inner_microstep: 1140.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 20:29:58,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1338.28 | bwd_inner_microstep: 1338.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868 [2024-06-10 20:30:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.90 | bwd_microstep: 1561.27 | bwd_inner_microstep: 1561.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 20:30:02,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1381.88 | bwd_inner_microstep: 1381.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 20:30:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.09 | bwd_microstep: 1285.84 | bwd_inner_microstep: 1285.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-10 20:30:06,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.23 | bwd_microstep: 1187.26 | bwd_inner_microstep: 1187.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739 [2024-06-10 20:30:08,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.67 | bwd_microstep: 1634.39 | bwd_inner_microstep: 1634.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 20:30:10,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1247.59 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 20:30:11,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.12 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 852 [2024-06-10 20:30:12,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 134.01 | bwd_microstep: 347.91 | bwd_inner_microstep: 347.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 20:30:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1247.55 | bwd_inner_microstep: 1247.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446 [2024-06-10 20:30:15,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1192.79 | bwd_inner_microstep: 1192.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1977 [2024-06-10 20:30:17,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.40 | bwd_microstep: 856.10 | bwd_inner_microstep: 856.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3698 [2024-06-10 20:30:19,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1653.59 | bwd_inner_microstep: 1653.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 20:30:21,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1377.23 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492 [2024-06-10 20:30:22,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.37 | bwd_microstep: 1190.45 | bwd_inner_microstep: 1190.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 20:30:24,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.01 | bwd_microstep: 1286.30 | bwd_inner_microstep: 1286.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-10 20:30:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1384.10 | bwd_inner_microstep: 1384.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 20:30:28,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3622 [2024-06-10 20:30:30,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.68 | bwd_microstep: 1371.88 | bwd_inner_microstep: 1371.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625 [2024-06-10 20:30:32,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1442.45 | bwd_inner_microstep: 1442.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3646 [2024-06-10 20:30:34,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1472.61 | bwd_inner_microstep: 1472.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3491 [2024-06-10 20:30:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.84 | bwd_microstep: 1252.19 | bwd_inner_microstep: 1252.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-10 20:30:37,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.71 | bwd_microstep: 917.05 | bwd_inner_microstep: 917.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3721 [2024-06-10 20:30:39,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.88 | bwd_microstep: 1583.88 | bwd_inner_microstep: 1583.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 20:30:41,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.35 | bwd_microstep: 1351.87 | bwd_inner_microstep: 1351.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1844 [2024-06-10 20:30:42,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.84 | bwd_microstep: 671.08 | bwd_inner_microstep: 671.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 20:30:44,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1495.31 | bwd_inner_microstep: 1495.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233 [2024-06-10 20:30:45,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.20 | bwd_microstep: 1060.07 | bwd_inner_microstep: 1060.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-10 20:30:47,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1448.12 | bwd_inner_microstep: 1448.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 20:30:54,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.10 | optimizer_step: 6.58 [2024-06-10 20:30:54,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.63 | bwd_microstep: 5784.29 | bwd_inner_microstep: 911.37 | bwd_allreduce_microstep: 4872.87 | step_microstep: 37.95 [2024-06-10 20:30:54,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15163.78 | bwd: 45403.95 | bwd_inner: 40530.18 | bwd_allreduce: 4873.10 | step: 39.41 {'loss': 1.2743, 'learning_rate': 1.0623827597303679e-05, 'epoch': 0.67} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3457 [2024-06-10 20:30:55,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1232.77 | bwd_inner_microstep: 1232.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3392 [2024-06-10 20:30:57,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1302.73 | bwd_inner_microstep: 1302.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 20:30:59,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.31 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 20:31:01,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.11 | bwd_microstep: 1380.79 | bwd_inner_microstep: 1380.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 854 [2024-06-10 20:31:01,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.59 | bwd_microstep: 346.98 | bwd_inner_microstep: 346.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-10 20:31:03,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.44 | bwd_microstep: 1185.64 | bwd_inner_microstep: 1185.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-10 20:31:04,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.27 | bwd_microstep: 789.62 | bwd_inner_microstep: 789.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400 [2024-06-10 20:31:06,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.77 | bwd_microstep: 1179.29 | bwd_inner_microstep: 1179.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 20:31:08,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.91 | bwd_microstep: 1189.91 | bwd_inner_microstep: 1189.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3971 [2024-06-10 20:31:10,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.04 | bwd_microstep: 1650.72 | bwd_inner_microstep: 1650.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-10 20:31:11,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.89 | bwd_microstep: 912.74 | bwd_inner_microstep: 912.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3491 [2024-06-10 20:31:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.90 | bwd_microstep: 1531.47 | bwd_inner_microstep: 1531.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 20:31:15,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1377.91 | bwd_inner_microstep: 1377.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 20:31:17,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 20:31:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.00 | bwd_microstep: 1251.51 | bwd_inner_microstep: 1251.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2112 [2024-06-10 20:31:20,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.37 | bwd_microstep: 982.21 | bwd_inner_microstep: 982.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 20:31:22,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1624.64 | bwd_inner_microstep: 1624.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 20:31:24,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.10 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-10 20:31:26,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.32 | bwd_microstep: 1611.05 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 20:31:28,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.21 | bwd_inner_microstep: 1251.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3822 [2024-06-10 20:31:30,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.04 | bwd_microstep: 1622.60 | bwd_inner_microstep: 1622.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 20:31:32,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.79 | bwd_inner_microstep: 1287.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070 [2024-06-10 20:31:33,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.79 | bwd_microstep: 848.11 | bwd_inner_microstep: 848.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3679 [2024-06-10 20:31:35,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.48 | bwd_microstep: 1326.90 | bwd_inner_microstep: 1326.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 20:31:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 20:31:39,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1399.17 | bwd_inner_microstep: 1399.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3380 [2024-06-10 20:31:41,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1304.01 | bwd_inner_microstep: 1303.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3418 [2024-06-10 20:31:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.23 | bwd_microstep: 1376.87 | bwd_inner_microstep: 1376.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-10 20:31:44,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.31 | bwd_microstep: 1004.34 | bwd_inner_microstep: 1004.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768 [2024-06-10 20:31:46,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.23 | bwd_microstep: 1374.50 | bwd_inner_microstep: 1374.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3572 [2024-06-10 20:31:48,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.82 | bwd_microstep: 1646.36 | bwd_inner_microstep: 1646.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3555 [2024-06-10 20:31:55,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.38 | optimizer_step: 6.60 [2024-06-10 20:31:55,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.99 | bwd_microstep: 6173.92 | bwd_inner_microstep: 1376.05 | bwd_allreduce_microstep: 4797.80 | step_microstep: 38.79 [2024-06-10 20:31:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.83 | bwd: 45882.92 | bwd_inner: 41084.19 | bwd_allreduce: 4798.05 | step: 40.27 {'loss': 1.203, 'learning_rate': 1.059069038794489e-05, 'epoch': 0.67} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3473 [2024-06-10 20:31:57,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.32 | bwd_microstep: 1432.03 | bwd_inner_microstep: 1432.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128 [2024-06-10 20:31:58,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.73 | bwd_microstep: 924.16 | bwd_inner_microstep: 924.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4510 [2024-06-10 20:32:01,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.71 | bwd_microstep: 1640.28 | bwd_inner_microstep: 1640.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3876 [2024-06-10 20:32:03,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.32 | bwd_microstep: 1683.67 | bwd_inner_microstep: 1683.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2404 [2024-06-10 20:32:04,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.99 | bwd_microstep: 1001.96 | bwd_inner_microstep: 1001.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 20:32:06,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1344.20 | bwd_inner_microstep: 1344.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 20:32:08,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1283.94 | bwd_inner_microstep: 1283.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948 [2024-06-10 20:32:09,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 790.94 | bwd_inner_microstep: 790.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748 [2024-06-10 20:32:11,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.97 | bwd_microstep: 1635.26 | bwd_inner_microstep: 1635.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 20:32:13,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.79 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 20:32:15,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.61 | bwd_microstep: 1612.56 | bwd_inner_microstep: 1612.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3752 [2024-06-10 20:32:18,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.80 | bwd_microstep: 1680.52 | bwd_inner_microstep: 1680.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 20:32:20,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.02 | bwd_microstep: 1629.75 | bwd_inner_microstep: 1629.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 20:32:22,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.14 | bwd_microstep: 1341.48 | bwd_inner_microstep: 1341.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 20:32:24,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1487.56 | bwd_inner_microstep: 1487.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674 [2024-06-10 20:32:26,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.92 | bwd_microstep: 1689.29 | bwd_inner_microstep: 1689.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 20:32:27,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 793.87 | bwd_inner_microstep: 793.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-10 20:32:29,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.57 | bwd_microstep: 1181.93 | bwd_inner_microstep: 1181.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 20:32:31,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.29 | bwd_microstep: 1279.60 | bwd_inner_microstep: 1279.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 20:32:33,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1556.47 | bwd_inner_microstep: 1556.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2135 [2024-06-10 20:32:34,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.04 | bwd_microstep: 736.65 | bwd_inner_microstep: 736.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2000 [2024-06-10 20:32:35,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.33 | bwd_microstep: 737.22 | bwd_inner_microstep: 737.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 20:32:37,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1534.07 | bwd_inner_microstep: 1534.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 20:32:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.04 | bwd_microstep: 1498.31 | bwd_inner_microstep: 1498.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 20:32:41,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2192 [2024-06-10 20:32:42,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.80 | bwd_microstep: 795.60 | bwd_inner_microstep: 795.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 20:32:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1256.99 | bwd_inner_microstep: 1256.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 20:32:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1656.32 | bwd_inner_microstep: 1656.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2077 [2024-06-10 20:32:47,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.52 | bwd_microstep: 997.78 | bwd_inner_microstep: 997.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 20:32:49,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.72 | bwd_microstep: 1600.74 | bwd_inner_microstep: 1600.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 20:32:52,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.48 | bwd_microstep: 1652.50 | bwd_inner_microstep: 1652.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2368 [2024-06-10 20:32:59,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 20:32:59,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.64 | bwd_microstep: 6859.02 | bwd_inner_microstep: 1277.64 | bwd_allreduce_microstep: 5581.31 | step_microstep: 38.74 [2024-06-10 20:32:59,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15709.37 | bwd: 47878.55 | bwd_inner: 42296.31 | bwd_allreduce: 5581.55 | step: 40.17 {'loss': 1.1962, 'learning_rate': 1.055758631812565e-05, 'epoch': 0.67} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3470 [2024-06-10 20:33:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1494.26 | bwd_inner_microstep: 1494.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-10 20:33:03,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.55 | bwd_microstep: 1157.26 | bwd_inner_microstep: 1157.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 20:33:04,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1275.43 | bwd_inner_microstep: 1275.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 20:33:06,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.74 | bwd_microstep: 1477.48 | bwd_inner_microstep: 1477.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3776 [2024-06-10 20:33:09,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1488.22 | bwd_inner_microstep: 1488.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750 [2024-06-10 20:33:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.89 | bwd_microstep: 1432.25 | bwd_inner_microstep: 1432.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-10 20:33:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.83 | bwd_microstep: 794.63 | bwd_inner_microstep: 794.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 20:33:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.58 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 20:33:16,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.50 | bwd_microstep: 1526.98 | bwd_inner_microstep: 1526.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 20:33:18,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.67 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-10 20:33:19,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.94 | bwd_microstep: 798.21 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-10 20:33:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.87 | bwd_microstep: 689.20 | bwd_inner_microstep: 689.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3678 [2024-06-10 20:33:22,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1402.71 | bwd_inner_microstep: 1402.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2774 [2024-06-10 20:33:23,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.47 | bwd_microstep: 1245.99 | bwd_inner_microstep: 1245.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 20:33:25,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1384.69 | bwd_inner_microstep: 1384.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663 [2024-06-10 20:33:28,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.81 | bwd_microstep: 1716.73 | bwd_inner_microstep: 1716.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 20:33:29,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 798.04 | bwd_inner_microstep: 798.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3675 [2024-06-10 20:33:30,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1230.70 | bwd_inner_microstep: 1230.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-10 20:33:32,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.38 | bwd_microstep: 1295.45 | bwd_inner_microstep: 1295.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3611 [2024-06-10 20:33:34,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.91 | bwd_microstep: 1212.76 | bwd_inner_microstep: 1212.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144 [2024-06-10 20:33:35,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.01 | bwd_microstep: 834.69 | bwd_inner_microstep: 834.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 20:33:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.13 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-10 20:33:39,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1637.63 | bwd_inner_microstep: 1637.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 20:33:41,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.42 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 20:33:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.35 | bwd_microstep: 697.29 | bwd_inner_microstep: 697.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 20:33:44,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2627 [2024-06-10 20:33:45,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.84 | bwd_microstep: 1017.02 | bwd_inner_microstep: 1016.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596 [2024-06-10 20:33:47,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1601.20 | bwd_inner_microstep: 1601.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 20:33:49,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1376.47 | bwd_inner_microstep: 1376.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574 [2024-06-10 20:33:51,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.23 | bwd_microstep: 1554.63 | bwd_inner_microstep: 1554.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3612 [2024-06-10 20:33:54,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.58 | bwd_microstep: 1604.79 | bwd_inner_microstep: 1604.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3426 [2024-06-10 20:33:58,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.60 [2024-06-10 20:33:58,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.72 | bwd_microstep: 3987.27 | bwd_inner_microstep: 1599.53 | bwd_allreduce_microstep: 2387.69 | step_microstep: 38.07 [2024-06-10 20:33:58,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15325.94 | bwd: 43488.17 | bwd_inner: 41099.56 | bwd_allreduce: 2387.91 | step: 39.59 {'loss': 1.2328, 'learning_rate': 1.0524515504438302e-05, 'epoch': 0.67} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4539 [2024-06-10 20:34:01,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 711.65 | bwd_microstep: 1930.23 | bwd_inner_microstep: 1930.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-10 20:34:02,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.21 | bwd_microstep: 725.15 | bwd_inner_microstep: 725.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 20:34:04,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.21 | bwd_microstep: 1481.73 | bwd_inner_microstep: 1481.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 20:34:06,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-10 20:34:08,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 859 [2024-06-10 20:34:08,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 136.05 | bwd_microstep: 347.95 | bwd_inner_microstep: 347.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 20:34:10,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.12 | bwd_microstep: 1244.21 | bwd_inner_microstep: 1244.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3697 [2024-06-10 20:34:12,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.48 | bwd_microstep: 1420.90 | bwd_inner_microstep: 1420.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870 [2024-06-10 20:34:13,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 679.57 | bwd_inner_microstep: 679.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-10 20:34:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.26 | bwd_microstep: 1412.29 | bwd_inner_microstep: 1412.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 20:34:17,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.60 | bwd_microstep: 1310.27 | bwd_inner_microstep: 1310.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1964 [2024-06-10 20:34:18,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.19 | bwd_microstep: 920.89 | bwd_inner_microstep: 920.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 20:34:19,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.66 | bwd_microstep: 889.88 | bwd_inner_microstep: 889.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2780 [2024-06-10 20:34:20,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.22 | bwd_microstep: 954.32 | bwd_inner_microstep: 954.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3651 [2024-06-10 20:34:23,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.51 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 20:34:25,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1490.52 | bwd_inner_microstep: 1490.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-10 20:34:27,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1434.66 | bwd_inner_microstep: 1434.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636 [2024-06-10 20:34:29,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.14 | bwd_microstep: 1611.82 | bwd_inner_microstep: 1611.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-10 20:34:31,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1309.02 | bwd_inner_microstep: 1309.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-10 20:34:32,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.82 | bwd_microstep: 1196.05 | bwd_inner_microstep: 1196.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 20:34:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1276.96 | bwd_inner_microstep: 1276.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 20:34:36,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.10 | bwd_microstep: 1555.52 | bwd_inner_microstep: 1555.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3890 [2024-06-10 20:34:38,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.11 | bwd_microstep: 1489.90 | bwd_inner_microstep: 1489.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-10 20:34:40,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1599.50 | bwd_inner_microstep: 1599.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-10 20:34:43,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.22 | bwd_microstep: 1575.27 | bwd_inner_microstep: 1575.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 20:34:45,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1391.25 | bwd_inner_microstep: 1391.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817 [2024-06-10 20:34:47,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1484.85 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 20:34:49,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1454.26 | bwd_inner_microstep: 1454.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 20:34:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1380.19 | bwd_inner_microstep: 1380.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-10 20:34:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.94 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-10 20:34:55,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1593.22 | bwd_inner_microstep: 1593.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-10 20:35:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.29 | optimizer_step: 6.59 [2024-06-10 20:35:01,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.73 | bwd_microstep: 5494.82 | bwd_inner_microstep: 933.32 | bwd_allreduce_microstep: 4561.44 | step_microstep: 39.07 [2024-06-10 20:35:01,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15708.15 | bwd: 46639.03 | bwd_inner: 42076.68 | bwd_allreduce: 4561.67 | step: 40.53 67%|██████▋ | 1148/1726 [19:52:29<9:53:13, 61.58s/it] 67%|██████▋ | 1149/1726 [19:53:30<9:50:12, 61.37s/it] 67%|██████▋ | 1149/1726 [19:53:30<9:50:12, 61.37s/it] 67%|██████▋ | 1150/1726 [19:54:32<9:49:42, 61.43s/it] 67%|██████▋ | 1150/1726 [19:54:32<9:49:42, 61.43s/it] 67%|██████▋ | 1151/1726 [19:55:36<9:55:51, 62.18s/it] 67%|██████▋ | 1151/1726 [19:55:36<9:55:51, 62.18s/it] 67%|██████▋ | 1152/1726 [19:56:35<9:46:05, 61.26s/it] 67%|██████▋ | 1152/1726 [19:56:35<9:46:05, 61.26s/it] 67%|██████▋ | 1153/1726 [19:57:38<9:49:06, 61.69s/it] {'loss': 1.1984, 'learning_rate': 1.0491478063358096e-05, 'epoch': 0.67} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405 [2024-06-10 20:35:03,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1436.23 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 20:35:05,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.94 | bwd_microstep: 1245.64 | bwd_inner_microstep: 1245.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882 [2024-06-10 20:35:06,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.95 | bwd_microstep: 711.04 | bwd_inner_microstep: 711.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 20:35:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.27 | bwd_microstep: 1350.91 | bwd_inner_microstep: 1350.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2637 [2024-06-10 20:35:09,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.96 | bwd_microstep: 1112.46 | bwd_inner_microstep: 1112.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414 [2024-06-10 20:35:11,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.06 | bwd_microstep: 1209.16 | bwd_inner_microstep: 1209.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2238 [2024-06-10 20:35:12,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.66 | bwd_microstep: 959.56 | bwd_inner_microstep: 959.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 20:35:13,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.66 | bwd_microstep: 970.46 | bwd_inner_microstep: 970.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 20:35:15,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.05 | bwd_microstep: 1457.13 | bwd_inner_microstep: 1457.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2210 [2024-06-10 20:35:16,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.85 | bwd_microstep: 891.39 | bwd_inner_microstep: 891.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4057 [2024-06-10 20:35:19,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.40 | bwd_microstep: 1622.52 | bwd_inner_microstep: 1622.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-10 20:35:21,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.67 | bwd_microstep: 1418.40 | bwd_inner_microstep: 1418.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3670 [2024-06-10 20:35:23,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1366.92 | bwd_inner_microstep: 1366.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 20:35:24,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487 [2024-06-10 20:35:26,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1406.26 | bwd_inner_microstep: 1406.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639 [2024-06-10 20:35:28,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1406.73 | bwd_inner_microstep: 1406.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3438 [2024-06-10 20:35:30,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.42 | bwd_microstep: 1215.04 | bwd_inner_microstep: 1215.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3671 [2024-06-10 20:35:32,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1454.42 | bwd_inner_microstep: 1454.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3626 [2024-06-10 20:35:34,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.20 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 20:35:36,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1454.40 | bwd_inner_microstep: 1454.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 20:35:38,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1299.21 | bwd_inner_microstep: 1299.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 20:35:40,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1555.06 | bwd_inner_microstep: 1555.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 20:35:42,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1392.69 | bwd_inner_microstep: 1392.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 20:35:44,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1557.62 | bwd_inner_microstep: 1557.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150 [2024-06-10 20:35:45,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.30 | bwd_microstep: 948.92 | bwd_inner_microstep: 948.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 20:35:47,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1258.51 | bwd_inner_microstep: 1258.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817 [2024-06-10 20:35:49,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1479.23 | bwd_inner_microstep: 1479.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3721 [2024-06-10 20:35:51,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.81 | bwd_microstep: 1467.82 | bwd_inner_microstep: 1467.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3377 [2024-06-10 20:35:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.33 | bwd_microstep: 1365.58 | bwd_inner_microstep: 1365.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3396 [2024-06-10 20:35:55,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1277.39 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-10 20:35:57,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.49 | bwd_microstep: 1747.99 | bwd_inner_microstep: 1747.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 20:36:01,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.05 | optimizer_step: 6.61 [2024-06-10 20:36:01,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.84 | bwd_microstep: 3197.21 | bwd_inner_microstep: 1664.48 | bwd_allreduce_microstep: 1532.67 | step_microstep: 37.58 [2024-06-10 20:36:01,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15816.53 | bwd: 43945.66 | bwd_inner: 42412.08 | bwd_allreduce: 1532.90 | step: 39.00 {'loss': 1.1819, 'learning_rate': 1.0458474111242723e-05, 'epoch': 0.67} dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2677 [2024-06-10 20:36:02,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.24 | bwd_microstep: 1015.32 | bwd_inner_microstep: 1015.15 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3904 [2024-06-10 20:36:05,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1684.83 | bwd_inner_microstep: 1684.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 20:36:07,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.33 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3892 [2024-06-10 20:36:09,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1583.81 | bwd_inner_microstep: 1583.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-10 20:36:11,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1402.39 | bwd_inner_microstep: 1402.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 20:36:13,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937 [2024-06-10 20:36:14,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.55 | bwd_microstep: 759.52 | bwd_inner_microstep: 759.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1926 [2024-06-10 20:36:15,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.24 | bwd_microstep: 725.27 | bwd_inner_microstep: 725.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896 [2024-06-10 20:36:16,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.61 | bwd_microstep: 776.55 | bwd_inner_microstep: 776.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 20:36:18,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1384.85 | bwd_inner_microstep: 1384.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3670 [2024-06-10 20:36:19,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.12 | bwd_microstep: 1228.96 | bwd_inner_microstep: 1228.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 20:36:22,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-10 20:36:24,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.04 | bwd_microstep: 1616.45 | bwd_inner_microstep: 1616.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 20:36:26,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.27 | bwd_microstep: 1343.71 | bwd_inner_microstep: 1343.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2954 [2024-06-10 20:36:27,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.85 | bwd_microstep: 1131.42 | bwd_inner_microstep: 1131.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3837 [2024-06-10 20:36:29,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.20 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2081 [2024-06-10 20:36:30,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.86 | bwd_microstep: 916.51 | bwd_inner_microstep: 916.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 20:36:32,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1381.10 | bwd_inner_microstep: 1381.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516 [2024-06-10 20:36:34,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1414.65 | bwd_inner_microstep: 1414.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2494 [2024-06-10 20:36:35,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.41 | bwd_microstep: 955.69 | bwd_inner_microstep: 955.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834 [2024-06-10 20:36:38,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.35 | bwd_microstep: 1657.22 | bwd_inner_microstep: 1657.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 20:36:40,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1285.03 | bwd_inner_microstep: 1285.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3681 [2024-06-10 20:36:42,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.57 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415 [2024-06-10 20:36:44,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.72 | bwd_microstep: 1443.20 | bwd_inner_microstep: 1443.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3815 [2024-06-10 20:36:46,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.33 | bwd_microstep: 1719.60 | bwd_inner_microstep: 1719.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 20:36:48,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.76 | bwd_microstep: 1553.40 | bwd_inner_microstep: 1553.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 20:36:50,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.15 | bwd_microstep: 1280.80 | bwd_inner_microstep: 1280.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3728 [2024-06-10 20:36:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1560.76 | bwd_inner_microstep: 1560.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 20:36:54,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1348.86 | bwd_inner_microstep: 1348.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 20:36:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1546.32 | bwd_inner_microstep: 1546.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-10 20:36:58,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1551.73 | bwd_inner_microstep: 1551.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3782 [2024-06-10 20:37:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-10 20:37:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.99 | bwd_microstep: 5024.77 | bwd_inner_microstep: 1410.28 | bwd_allreduce_microstep: 3614.44 | step_microstep: 37.88 [2024-06-10 20:37:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15966.37 | bwd: 46483.80 | bwd_inner: 42868.34 | bwd_allreduce: 3614.73 | step: 39.45 {'loss': 1.2227, 'learning_rate': 1.0425503764331925e-05, 'epoch': 0.67} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-10 20:37:05,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 778.62 | bwd_inner_microstep: 778.56 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 20:37:06,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1243.23 | bwd_inner_microstep: 1243.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 20:37:09,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.48 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 20:37:10,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1388.03 | bwd_inner_microstep: 1388.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 20:37:12,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.02 | bwd_microstep: 1248.05 | bwd_inner_microstep: 1248.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 20:37:13,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.16 | bwd_microstep: 677.83 | bwd_inner_microstep: 677.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3747 [2024-06-10 20:37:15,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1435.30 | bwd_inner_microstep: 1435.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-10 20:37:17,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.31 | bwd_microstep: 1149.53 | bwd_inner_microstep: 1149.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 20:37:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1382.99 | bwd_inner_microstep: 1382.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942 [2024-06-10 20:37:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.00 | bwd_microstep: 821.79 | bwd_inner_microstep: 821.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3845 [2024-06-10 20:37:22,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1484.35 | bwd_inner_microstep: 1484.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3670 [2024-06-10 20:37:24,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 20:37:26,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-10 20:37:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1318.02 | bwd_inner_microstep: 1317.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461 [2024-06-10 20:37:29,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3826 [2024-06-10 20:37:31,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.45 | bwd_microstep: 1320.41 | bwd_inner_microstep: 1320.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 20:37:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.35 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3952 [2024-06-10 20:37:35,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1599.25 | bwd_inner_microstep: 1599.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 20:37:37,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1607.89 | bwd_inner_microstep: 1607.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 20:37:39,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1278.69 | bwd_inner_microstep: 1278.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1987 [2024-06-10 20:37:40,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.00 | bwd_microstep: 735.66 | bwd_inner_microstep: 735.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454 [2024-06-10 20:37:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.57 | bwd_microstep: 1157.98 | bwd_inner_microstep: 1157.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 20:37:44,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1393.38 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-10 20:37:46,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.62 | bwd_microstep: 1436.35 | bwd_inner_microstep: 1436.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 20:37:47,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.03 | bwd_microstep: 795.36 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551 [2024-06-10 20:37:49,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 20:37:50,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 875.67 | bwd_inner_microstep: 875.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558 [2024-06-10 20:37:52,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1429.19 | bwd_inner_microstep: 1429.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3566 [2024-06-10 20:37:54,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1558.46 | bwd_inner_microstep: 1558.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 20:37:56,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3795 [2024-06-10 20:37:58,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.45 | bwd_microstep: 1514.07 | bwd_inner_microstep: 1514.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 20:38:06,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.56 [2024-06-10 20:38:06,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 7172.61 | bwd_inner_microstep: 1579.35 | bwd_allreduce_microstep: 5593.20 | step_microstep: 38.26 [2024-06-10 20:38:06,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15256.79 | bwd: 46290.32 | bwd_inner: 40696.17 | bwd_allreduce: 5593.46 | step: 39.73 {'loss': 1.191, 'learning_rate': 1.0392567138747101e-05, 'epoch': 0.67} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 20:38:07,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1237.80 | bwd_inner_microstep: 1237.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 20:38:09,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1276.05 | bwd_inner_microstep: 1276.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2431 [2024-06-10 20:38:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.49 | bwd_microstep: 1032.86 | bwd_inner_microstep: 1032.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3892 [2024-06-10 20:38:12,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.17 | bwd_microstep: 1411.78 | bwd_inner_microstep: 1411.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:38:14,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1374.95 | bwd_inner_microstep: 1374.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 20:38:16,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.00 | bwd_microstep: 1536.04 | bwd_inner_microstep: 1536.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 20:38:18,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-10 20:38:20,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.32 | bwd_microstep: 1190.37 | bwd_inner_microstep: 1190.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 20:38:22,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1389.85 | bwd_inner_microstep: 1389.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3482 [2024-06-10 20:38:24,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1510.07 | bwd_inner_microstep: 1510.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3496 [2024-06-10 20:38:26,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.65 | bwd_microstep: 1365.98 | bwd_inner_microstep: 1365.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 894 [2024-06-10 20:38:26,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.31 | bwd_microstep: 367.68 | bwd_inner_microstep: 367.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1969 [2024-06-10 20:38:27,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.38 | bwd_microstep: 764.57 | bwd_inner_microstep: 764.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3694 [2024-06-10 20:38:30,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1557.17 | bwd_inner_microstep: 1557.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3735 [2024-06-10 20:38:32,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.45 | bwd_microstep: 1696.02 | bwd_inner_microstep: 1696.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 20:38:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.90 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4004 [2024-06-10 20:38:36,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.93 | bwd_microstep: 1812.81 | bwd_inner_microstep: 1812.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 20:38:38,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3552 [2024-06-10 20:38:40,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 20:38:42,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1403.55 | bwd_inner_microstep: 1403.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 20:38:44,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.81 | bwd_microstep: 1284.49 | bwd_inner_microstep: 1284.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-10 20:38:45,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.03 | bwd_inner_microstep: 974.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:38:47,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 20:38:48,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.74 | bwd_microstep: 973.62 | bwd_inner_microstep: 973.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 20:38:51,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1655.91 | bwd_inner_microstep: 1655.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 20:38:53,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-10 20:38:54,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.20 | bwd_microstep: 1293.51 | bwd_inner_microstep: 1293.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 20:38:56,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1395.13 | bwd_inner_microstep: 1395.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3438 [2024-06-10 20:38:58,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1448.11 | bwd_inner_microstep: 1448.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 20:39:00,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.87 | bwd_microstep: 1253.32 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3611 [2024-06-10 20:39:02,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1707.42 | bwd_inner_microstep: 1707.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3453 [2024-06-10 20:39:06,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.84 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 20:39:06,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.60 | bwd_microstep: 2561.30 | bwd_inner_microstep: 1565.45 | bwd_allreduce_microstep: 995.81 | step_microstep: 39.74 [2024-06-10 20:39:06,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15946.78 | bwd: 43795.76 | bwd_inner: 42799.06 | bwd_allreduce: 996.03 | step: 41.23 {'loss': 1.2321, 'learning_rate': 1.035966435049086e-05, 'epoch': 0.67} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 20:39:08,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.82 | bwd_microstep: 1473.82 | bwd_inner_microstep: 1473.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 20:39:10,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4505 [2024-06-10 20:39:12,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.15 | bwd_microstep: 1638.02 | bwd_inner_microstep: 1638.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 20:39:14,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.56 | bwd_microstep: 1478.99 | bwd_inner_microstep: 1478.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2248 [2024-06-10 20:39:15,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.15 | bwd_microstep: 964.86 | bwd_inner_microstep: 964.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-10 20:39:17,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.69 | bwd_microstep: 1637.92 | bwd_inner_microstep: 1637.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 20:39:19,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.88 | bwd_microstep: 1339.85 | bwd_inner_microstep: 1339.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 20:39:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.37 | bwd_microstep: 677.96 | bwd_inner_microstep: 677.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 20:39:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 20:39:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1410.20 | bwd_inner_microstep: 1410.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-10 20:39:25,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1247.97 | bwd_inner_microstep: 1247.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721 [2024-06-10 20:39:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.21 | bwd_microstep: 1622.49 | bwd_inner_microstep: 1622.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 20:39:29,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1485.02 | bwd_inner_microstep: 1484.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3506 [2024-06-10 20:39:31,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.89 | bwd_microstep: 1429.10 | bwd_inner_microstep: 1429.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 20:39:33,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1251.91 | bwd_inner_microstep: 1251.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-10 20:39:35,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.36 | bwd_microstep: 1421.31 | bwd_inner_microstep: 1421.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-10 20:39:37,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1990 [2024-06-10 20:39:38,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.13 | bwd_microstep: 737.01 | bwd_inner_microstep: 736.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827 [2024-06-10 20:39:40,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1387.78 | bwd_inner_microstep: 1387.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1987 [2024-06-10 20:39:41,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.34 | bwd_microstep: 705.82 | bwd_inner_microstep: 705.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705 [2024-06-10 20:39:43,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.80 | bwd_microstep: 1527.55 | bwd_inner_microstep: 1527.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3562 [2024-06-10 20:39:45,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1264.17 | bwd_inner_microstep: 1264.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3708 [2024-06-10 20:39:47,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.57 | bwd_microstep: 1265.08 | bwd_inner_microstep: 1265.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684 [2024-06-10 20:39:48,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1322.99 | bwd_inner_microstep: 1322.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2005 [2024-06-10 20:39:49,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.44 | bwd_microstep: 771.05 | bwd_inner_microstep: 771.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2963 [2024-06-10 20:39:51,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.15 | bwd_microstep: 1137.98 | bwd_inner_microstep: 1137.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3790 [2024-06-10 20:39:53,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.28 | bwd_microstep: 1455.63 | bwd_inner_microstep: 1455.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3859 [2024-06-10 20:39:55,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.44 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3616 [2024-06-10 20:39:57,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1542.46 | bwd_inner_microstep: 1542.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032 [2024-06-10 20:39:58,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.74 | bwd_microstep: 902.76 | bwd_inner_microstep: 902.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-10 20:40:01,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.94 | bwd_microstep: 1638.16 | bwd_inner_microstep: 1638.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3596 [2024-06-10 20:40:07,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 20:40:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.49 | bwd_microstep: 5836.19 | bwd_inner_microstep: 1811.82 | bwd_allreduce_microstep: 4024.32 | step_microstep: 38.13 [2024-06-10 20:40:07,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15494.55 | bwd: 45627.68 | bwd_inner: 41602.46 | bwd_allreduce: 4024.54 | step: 39.63 {'loss': 1.2522, 'learning_rate': 1.0326795515446666e-05, 'epoch': 0.67} 67%|██████▋ | 1153/1726 [19:57:38<9:49:06, 61.69s/it] 67%|██████▋ | 1154/1726 [19:58:38<9:43:30, 61.21s/it] 67%|██████▋ | 1154/1726 [19:58:38<9:43:30, 61.21s/it] 67%|██████▋ | 1155/1726 [19:59:40<9:46:59, 61.68s/it] 67%|██████▋ | 1155/1726 [19:59:40<9:46:59, 61.68s/it] 67%|██████▋ | 1156/1726 [20:00:42<9:46:29, 61.74s/it] 67%|██████▋ | 1156/1726 [20:00:42<9:46:29, 61.74s/it] 67%|██████▋ | 1157/1726 [20:01:42<9:40:43, 61.24s/it] 67%|██████▋ | 1157/1726 [20:01:42<9:40:43, 61.24s/it] 67%|██████▋ | 1158/1726 [20:02:44<9:40:19, 61.30s/it] 67%|█�dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 20:40:09,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1442.73 | bwd_inner_microstep: 1442.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3904 [2024-06-10 20:40:11,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.87 | bwd_microstep: 1581.82 | bwd_inner_microstep: 1581.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3856 [2024-06-10 20:40:14,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1662.75 | bwd_inner_microstep: 1662.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 20:40:15,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1287.24 | bwd_inner_microstep: 1287.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 20:40:17,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.09 | bwd_microstep: 1383.86 | bwd_inner_microstep: 1383.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 20:40:19,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.95 | bwd_microstep: 1488.35 | bwd_inner_microstep: 1488.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 20:40:21,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.74 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:40:23,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.13 | bwd_microstep: 1385.49 | bwd_inner_microstep: 1385.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 20:40:25,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1385.68 | bwd_inner_microstep: 1385.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3493 [2024-06-10 20:40:27,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.20 | bwd_microstep: 1219.27 | bwd_inner_microstep: 1219.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 20:40:29,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.27 | bwd_microstep: 1475.97 | bwd_inner_microstep: 1475.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 20:40:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1256.84 | bwd_inner_microstep: 1256.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 20:40:33,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.75 | bwd_microstep: 1631.52 | bwd_inner_microstep: 1631.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 20:40:35,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.29 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 20:40:37,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.25 | bwd_microstep: 1416.56 | bwd_inner_microstep: 1416.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3678 [2024-06-10 20:40:39,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.09 | bwd_microstep: 1456.11 | bwd_inner_microstep: 1456.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138 [2024-06-10 20:40:40,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.22 | bwd_microstep: 832.25 | bwd_inner_microstep: 832.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3839 [2024-06-10 20:40:42,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.91 | bwd_microstep: 1464.48 | bwd_inner_microstep: 1464.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1875 [2024-06-10 20:40:43,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 715.19 | bwd_inner_microstep: 715.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130 [2024-06-10 20:40:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.16 | bwd_microstep: 836.82 | bwd_inner_microstep: 836.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 20:40:46,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.08 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073 [2024-06-10 20:40:47,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.68 | bwd_microstep: 818.14 | bwd_inner_microstep: 818.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3552 [2024-06-10 20:40:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1262.06 | bwd_inner_microstep: 1262.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765 [2024-06-10 20:40:51,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-10 20:40:53,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.17 | bwd_microstep: 1217.87 | bwd_inner_microstep: 1217.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 20:40:55,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.63 | bwd_microstep: 1557.64 | bwd_inner_microstep: 1557.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 20:40:57,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.37 | bwd_microstep: 1597.42 | bwd_inner_microstep: 1597.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3810 [2024-06-10 20:40:59,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1321.76 | bwd_inner_microstep: 1321.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3764 [2024-06-10 20:41:01,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.82 | bwd_microstep: 1474.25 | bwd_inner_microstep: 1474.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 20:41:03,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 1544.05 | bwd_inner_microstep: 1544.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891 [2024-06-10 20:41:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.26 | bwd_microstep: 775.20 | bwd_inner_microstep: 775.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3382 [2024-06-10 20:41:09,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.61 [2024-06-10 20:41:09,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.85 | bwd_microstep: 3987.64 | bwd_inner_microstep: 1518.08 | bwd_allreduce_microstep: 2469.49 | step_microstep: 38.63 [2024-06-10 20:41:09,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15953.36 | bwd: 45228.96 | bwd_inner: 42758.52 | bwd_allreduce: 2469.73 | step: 40.19 {'loss': 1.2004, 'learning_rate': 1.0293960749378384e-05, 'epoch': 0.67} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421 [2024-06-10 20:41:11,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.12 | bwd_microstep: 1436.36 | bwd_inner_microstep: 1436.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-10 20:41:12,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.92 | bwd_microstep: 1372.34 | bwd_inner_microstep: 1372.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 20:41:14,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1343.92 | bwd_inner_microstep: 1343.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 20:41:16,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1377.72 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1950 [2024-06-10 20:41:17,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.07 | bwd_microstep: 729.13 | bwd_inner_microstep: 729.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-10 20:41:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.55 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 20:41:21,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1246.76 | bwd_inner_microstep: 1246.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443 [2024-06-10 20:41:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1449.98 | bwd_inner_microstep: 1449.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919 [2024-06-10 20:41:24,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.50 | bwd_microstep: 782.19 | bwd_inner_microstep: 782.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 20:41:26,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1399.68 | bwd_inner_microstep: 1399.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 20:41:28,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1298.16 | bwd_inner_microstep: 1298.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906 [2024-06-10 20:41:29,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.49 | bwd_microstep: 685.29 | bwd_inner_microstep: 685.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3659 [2024-06-10 20:41:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.26 | bwd_microstep: 1354.97 | bwd_inner_microstep: 1354.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2132 [2024-06-10 20:41:32,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.82 | bwd_microstep: 799.79 | bwd_inner_microstep: 799.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3656 [2024-06-10 20:41:34,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1442.61 | bwd_inner_microstep: 1442.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-10 20:41:36,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1407.37 | bwd_inner_microstep: 1407.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 20:41:38,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.08 | bwd_microstep: 1522.40 | bwd_inner_microstep: 1522.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-10 20:41:40,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.06 | bwd_microstep: 1615.48 | bwd_inner_microstep: 1615.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 20:41:42,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2696 [2024-06-10 20:41:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.16 | bwd_microstep: 1035.23 | bwd_inner_microstep: 1035.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2427 [2024-06-10 20:41:45,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.50 | bwd_microstep: 942.81 | bwd_inner_microstep: 942.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 20:41:47,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1513.31 | bwd_inner_microstep: 1513.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-10 20:41:49,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1459.51 | bwd_inner_microstep: 1459.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2426 [2024-06-10 20:41:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 939.58 | bwd_inner_microstep: 939.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-10 20:41:51,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.12 | bwd_microstep: 879.42 | bwd_inner_microstep: 879.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 20:41:54,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3832 [2024-06-10 20:41:55,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.80 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-10 20:41:57,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1422.30 | bwd_inner_microstep: 1422.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 20:41:59,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1406.52 | bwd_inner_microstep: 1406.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-10 20:42:01,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 974.95 | bwd_inner_microstep: 974.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812 [2024-06-10 20:42:03,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 20:42:08,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-10 20:42:08,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 4526.83 | bwd_inner_microstep: 1521.13 | bwd_allreduce_microstep: 3005.64 | step_microstep: 37.72 [2024-06-10 20:42:08,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15167.86 | bwd: 43631.16 | bwd_inner: 40624.61 | bwd_allreduce: 3005.87 | step: 39.19 {'loss': 1.1851, 'learning_rate': 1.0261160167929884e-05, 'epoch': 0.67} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 20:42:10,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.14 | bwd_microstep: 1380.39 | bwd_inner_microstep: 1380.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1870 [2024-06-10 20:42:11,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.49 | bwd_microstep: 736.74 | bwd_inner_microstep: 736.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3932 [2024-06-10 20:42:13,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.27 | bwd_microstep: 1589.73 | bwd_inner_microstep: 1589.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 20:42:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.48 | bwd_microstep: 1474.55 | bwd_inner_microstep: 1474.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-10 20:42:17,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1395.82 | bwd_inner_microstep: 1395.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 20:42:19,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.45 | bwd_microstep: 1650.51 | bwd_inner_microstep: 1650.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 20:42:21,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.41 | bwd_microstep: 1247.19 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 20:42:23,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1281.81 | bwd_inner_microstep: 1281.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2890 [2024-06-10 20:42:24,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.30 | bwd_microstep: 1087.26 | bwd_inner_microstep: 1087.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-10 20:42:26,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1392.13 | bwd_inner_microstep: 1392.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-10 20:42:28,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.23 | bwd_microstep: 1527.13 | bwd_inner_microstep: 1527.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-10 20:42:30,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.11 | bwd_microstep: 1523.93 | bwd_inner_microstep: 1523.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2173 [2024-06-10 20:42:31,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 883.13 | bwd_inner_microstep: 883.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-10 20:42:34,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1580.23 | bwd_inner_microstep: 1580.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2710 [2024-06-10 20:42:35,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.32 | bwd_microstep: 1128.98 | bwd_inner_microstep: 1128.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3417 [2024-06-10 20:42:37,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1536.93 | bwd_inner_microstep: 1536.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3643 [2024-06-10 20:42:40,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.96 | bwd_microstep: 1679.21 | bwd_inner_microstep: 1679.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 20:42:42,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1482.23 | bwd_inner_microstep: 1482.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2318 [2024-06-10 20:42:43,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.04 | bwd_microstep: 825.05 | bwd_inner_microstep: 825.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3668 [2024-06-10 20:42:45,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.90 | bwd_microstep: 1448.88 | bwd_inner_microstep: 1448.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 20:42:47,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.51 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433 [2024-06-10 20:42:49,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.97 | bwd_microstep: 1439.68 | bwd_inner_microstep: 1439.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-10 20:42:50,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1307.16 | bwd_inner_microstep: 1307.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-10 20:42:52,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-10 20:42:54,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1397.46 | bwd_inner_microstep: 1397.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3830 [2024-06-10 20:42:56,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.59 | bwd_microstep: 1585.72 | bwd_inner_microstep: 1585.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604 [2024-06-10 20:42:57,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.30 | bwd_microstep: 257.97 | bwd_inner_microstep: 257.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569 [2024-06-10 20:42:59,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1526.89 | bwd_inner_microstep: 1526.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2916 [2024-06-10 20:43:00,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.02 | bwd_microstep: 1281.59 | bwd_inner_microstep: 1281.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3887 [2024-06-10 20:43:03,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.05 | bwd_microstep: 1788.16 | bwd_inner_microstep: 1788.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764 [2024-06-10 20:43:05,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.87 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 20:43:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.59 [2024-06-10 20:43:10,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 4008.40 | bwd_inner_microstep: 2107.61 | bwd_allreduce_microstep: 1900.74 | step_microstep: 37.69 [2024-06-10 20:43:10,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16134.67 | bwd: 45631.15 | bwd_inner: 43729.50 | bwd_allreduce: 1900.97 | step: 39.13 {'loss': 1.2438, 'learning_rate': 1.0228393886624639e-05, 'epoch': 0.67} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 20:43:12,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1366.50 | bwd_inner_microstep: 1366.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3982 [2024-06-10 20:43:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1502.39 | bwd_inner_microstep: 1502.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3396 [2024-06-10 20:43:16,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1306.53 | bwd_inner_microstep: 1306.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3793 [2024-06-10 20:43:17,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.69 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3874 [2024-06-10 20:43:19,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.00 | bwd_microstep: 1448.30 | bwd_inner_microstep: 1448.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870 [2024-06-10 20:43:20,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.85 | bwd_microstep: 679.39 | bwd_inner_microstep: 679.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 20:43:22,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1407.30 | bwd_inner_microstep: 1407.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 20:43:24,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1249.35 | bwd_inner_microstep: 1249.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 20:43:26,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1282.72 | bwd_inner_microstep: 1282.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 20:43:28,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729 [2024-06-10 20:43:30,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1367.04 | bwd_inner_microstep: 1367.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415 [2024-06-10 20:43:31,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.01 | bwd_microstep: 1153.95 | bwd_inner_microstep: 1153.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994 [2024-06-10 20:43:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.91 | bwd_microstep: 830.91 | bwd_inner_microstep: 830.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656 [2024-06-10 20:43:34,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.04 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 20:43:36,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1380.35 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2637 [2024-06-10 20:43:37,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.09 | bwd_microstep: 1016.65 | bwd_inner_microstep: 1016.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1926 [2024-06-10 20:43:38,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.13 | bwd_microstep: 726.16 | bwd_inner_microstep: 726.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3665 [2024-06-10 20:43:41,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.97 | bwd_microstep: 1820.89 | bwd_inner_microstep: 1820.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3661 [2024-06-10 20:43:43,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.90 | bwd_microstep: 1819.96 | bwd_inner_microstep: 1819.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 20:43:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.14 | bwd_microstep: 1436.53 | bwd_inner_microstep: 1436.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 20:43:47,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.74 | bwd_inner_microstep: 1379.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 20:43:49,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1438.47 | bwd_inner_microstep: 1438.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-10 20:43:51,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1279.44 | bwd_inner_microstep: 1279.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3604 [2024-06-10 20:43:53,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1608.35 | bwd_inner_microstep: 1608.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2225 [2024-06-10 20:43:54,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.67 | bwd_microstep: 862.66 | bwd_inner_microstep: 862.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-10 20:43:55,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.95 | bwd_microstep: 808.61 | bwd_inner_microstep: 808.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 20:43:57,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.14 | bwd_microstep: 1158.20 | bwd_inner_microstep: 1158.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080 [2024-06-10 20:43:58,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.73 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2037 [2024-06-10 20:43:59,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.55 | bwd_microstep: 903.31 | bwd_inner_microstep: 903.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-10 20:44:02,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.87 | bwd_microstep: 1601.06 | bwd_inner_microstep: 1601.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828 [2024-06-10 20:44:04,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.42 | bwd_microstep: 1489.39 | bwd_inner_microstep: 1489.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3581 [2024-06-10 20:44:12,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.34 | optimizer_step: 6.63 [2024-06-10 20:44:12,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 7463.33 | bwd_inner_microstep: 1611.46 | bwd_allreduce_microstep: 5851.80 | step_microstep: 38.87 [2024-06-10 20:44:12,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.90 | bwd: 46372.29 | bwd_inner: 40519.57 | bwd_allreduce: 5852.04 | step: 40.35 {'loss': 1.2067, 'learning_rate': 1.0195662020865333e-05, 'epoch': 0.67} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 20:44:14,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1464.36 | bwd_inner_microstep: 1464.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 20:44:16,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1391.81 | bwd_inner_microstep: 1391.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 20:44:17,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.21 | bwd_microstep: 1281.10 | bwd_inner_microstep: 1281.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3962 [2024-06-10 20:44:20,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.62 | bwd_microstep: 1593.52 | bwd_inner_microstep: 1593.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 20:44:21,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1249.34 | bwd_inner_microstep: 1249.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-10 20:44:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.67 | bwd_microstep: 1543.35 | bwd_inner_microstep: 1543.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 20:44:25,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.32 | bwd_microstep: 1147.14 | bwd_inner_microstep: 1147.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-10 20:44:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1435.00 | bwd_inner_microstep: 1434.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-10 20:44:28,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.13 | bwd_microstep: 800.10 | bwd_inner_microstep: 800.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-10 20:44:30,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.96 | bwd_microstep: 1625.73 | bwd_inner_microstep: 1625.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 20:44:32,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1384.70 | bwd_inner_microstep: 1384.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 880 [2024-06-10 20:44:33,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.97 | bwd_microstep: 367.97 | bwd_inner_microstep: 367.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3959 [2024-06-10 20:44:35,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.97 | bwd_microstep: 1627.73 | bwd_inner_microstep: 1627.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1966 [2024-06-10 20:44:36,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.21 | bwd_microstep: 825.27 | bwd_inner_microstep: 825.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 20:44:38,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.00 | bwd_microstep: 1347.05 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2126 [2024-06-10 20:44:39,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.11 | bwd_microstep: 828.56 | bwd_inner_microstep: 828.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 20:44:41,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1378.15 | bwd_inner_microstep: 1378.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 20:44:43,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1989 [2024-06-10 20:44:44,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.52 | bwd_microstep: 769.90 | bwd_inner_microstep: 769.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3453 [2024-06-10 20:44:46,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.26 | bwd_microstep: 1321.80 | bwd_inner_microstep: 1321.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999 [2024-06-10 20:44:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.25 | bwd_microstep: 707.39 | bwd_inner_microstep: 707.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 20:44:49,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1496.93 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3775 [2024-06-10 20:44:51,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.77 | bwd_microstep: 1345.83 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 20:44:53,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1293.87 | bwd_inner_microstep: 1293.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3573 [2024-06-10 20:44:54,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.95 | bwd_microstep: 1206.53 | bwd_inner_microstep: 1206.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-10 20:44:57,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1509.80 | bwd_inner_microstep: 1509.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 20:44:58,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.76 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2196 [2024-06-10 20:45:00,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.32 | bwd_microstep: 797.74 | bwd_inner_microstep: 797.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 20:45:02,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.62 | bwd_microstep: 1645.52 | bwd_inner_microstep: 1645.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760 [2024-06-10 20:45:04,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.27 | bwd_microstep: 1642.15 | bwd_inner_microstep: 1642.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 20:45:06,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.76 | bwd_microstep: 1297.58 | bwd_inner_microstep: 1297.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3847 [2024-06-10 20:45:15,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-10 20:45:15,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.85 | bwd_microstep: 8371.88 | bwd_inner_microstep: 1997.93 | bwd_allreduce_microstep: 6373.90 | step_microstep: 37.85 [2024-06-10 20:45:15,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15381.28 | bwd: 47613.27 | bwd_inner: 41238.47 | bwd_allreduce: 6374.13 | step: 39.37 {'loss': 1.1528, 'learning_rate': 1.0162964685933426e-05, 'epoch': 0.67} �████▋ | 1158/1726 [20:02:44<9:40:19, 61.30s/it] 67%|██████▋ | 1159/1726 [20:03:45<9:39:55, 61.37s/it] 67%|██████▋ | 1159/1726 [20:03:45<9:39:55, 61.37s/it] 67%|██████▋ | 1160/1726 [20:04:44<9:32:32, 60.69s/it] 67%|██████▋ | 1160/1726 [20:04:44<9:32:32, 60.69s/it] 67%|██████▋ | 1161/1726 [20:05:47<9:35:29, 61.11s/it] 67%|██████▋ | 1161/1726 [20:05:47<9:35:29, 61.11s/it] 67%|██████▋ | 1162/1726 [20:06:48<9:36:28, 61.33s/it] 67%|██████▋ | 1162/1726 [20:06:48<9:36:28, 61.33s/it] 67%|██████▋ | 1163/1726 [20:07:52<9:41:03, 61.93s/it] 67%|██████▋ | 116dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 20:45:17,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.60 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 20:45:19,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893 [2024-06-10 20:45:21,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.19 | bwd_microstep: 1580.53 | bwd_inner_microstep: 1580.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 20:45:23,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1342.65 | bwd_inner_microstep: 1342.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 20:45:25,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1487.16 | bwd_inner_microstep: 1487.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 20:45:27,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1243.01 | bwd_inner_microstep: 1242.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3858 [2024-06-10 20:45:29,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1659.13 | bwd_inner_microstep: 1659.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 20:45:31,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.61 | bwd_microstep: 1346.67 | bwd_inner_microstep: 1346.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-10 20:45:33,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1408.68 | bwd_inner_microstep: 1408.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 20:45:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1296.40 | bwd_inner_microstep: 1296.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755 [2024-06-10 20:45:36,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.60 | bwd_microstep: 1467.78 | bwd_inner_microstep: 1467.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 20:45:38,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1391.38 | bwd_inner_microstep: 1391.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3676 [2024-06-10 20:45:41,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.07 | bwd_microstep: 1585.09 | bwd_inner_microstep: 1585.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2151 [2024-06-10 20:45:42,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.12 | bwd_microstep: 1043.27 | bwd_inner_microstep: 1043.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-10 20:45:44,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1440.80 | bwd_inner_microstep: 1440.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2936 [2024-06-10 20:45:46,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.96 | bwd_microstep: 1131.07 | bwd_inner_microstep: 1131.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 20:45:47,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-10 20:45:50,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.44 | bwd_microstep: 1583.16 | bwd_inner_microstep: 1583.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3461 [2024-06-10 20:45:52,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.30 | bwd_microstep: 1608.07 | bwd_inner_microstep: 1608.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2097 [2024-06-10 20:45:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.08 | bwd_microstep: 729.30 | bwd_inner_microstep: 729.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605 [2024-06-10 20:45:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.77 | bwd_microstep: 1538.32 | bwd_inner_microstep: 1538.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 20:45:57,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1556.25 | bwd_inner_microstep: 1556.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3537 [2024-06-10 20:45:59,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1227.87 | bwd_inner_microstep: 1227.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472 [2024-06-10 20:46:01,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.15 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3818 [2024-06-10 20:46:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.36 | bwd_microstep: 1489.08 | bwd_inner_microstep: 1489.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3780 [2024-06-10 20:46:05,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.00 | bwd_microstep: 1579.76 | bwd_inner_microstep: 1579.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2168 [2024-06-10 20:46:06,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.97 | bwd_microstep: 804.86 | bwd_inner_microstep: 804.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2035 [2024-06-10 20:46:07,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.59 | bwd_microstep: 807.15 | bwd_inner_microstep: 807.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 20:46:09,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3590 [2024-06-10 20:46:12,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.34 | bwd_microstep: 1807.52 | bwd_inner_microstep: 1807.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3441 [2024-06-10 20:46:14,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.68 | bwd_microstep: 1479.45 | bwd_inner_microstep: 1479.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2207 [2024-06-10 20:46:17,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.58 [2024-06-10 20:46:17,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.22 | bwd_microstep: 2945.80 | bwd_inner_microstep: 1048.01 | bwd_allreduce_microstep: 1897.74 | step_microstep: 37.72 [2024-06-10 20:46:17,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16233.50 | bwd: 45483.12 | bwd_inner: 43584.47 | bwd_allreduce: 1897.96 | step: 39.19 {'loss': 1.2032, 'learning_rate': 1.0130301996988755e-05, 'epoch': 0.67} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905 [2024-06-10 20:46:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.29 | bwd_microstep: 770.83 | bwd_inner_microstep: 770.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 20:46:20,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1376.88 | bwd_inner_microstep: 1376.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-10 20:46:22,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-10 20:46:24,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.69 | bwd_microstep: 1636.71 | bwd_inner_microstep: 1636.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 20:46:26,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1383.04 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 20:46:28,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-10 20:46:30,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.39 | bwd_microstep: 1631.64 | bwd_inner_microstep: 1631.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2632 [2024-06-10 20:46:31,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.79 | bwd_microstep: 949.72 | bwd_inner_microstep: 949.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2467 [2024-06-10 20:46:33,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.62 | bwd_microstep: 953.72 | bwd_inner_microstep: 953.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422 [2024-06-10 20:46:34,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.21 | bwd_microstep: 1184.87 | bwd_inner_microstep: 1184.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912 [2024-06-10 20:46:35,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.89 | bwd_microstep: 719.12 | bwd_inner_microstep: 719.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3408 [2024-06-10 20:46:37,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1371.97 | bwd_inner_microstep: 1371.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3544 [2024-06-10 20:46:39,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.80 | bwd_microstep: 1638.83 | bwd_inner_microstep: 1638.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415 [2024-06-10 20:46:41,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.74 | bwd_microstep: 1438.84 | bwd_inner_microstep: 1438.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3458 [2024-06-10 20:46:43,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.95 | bwd_microstep: 1434.78 | bwd_inner_microstep: 1434.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652 [2024-06-10 20:46:46,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1614.59 | bwd_inner_microstep: 1614.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2662 [2024-06-10 20:46:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.46 | bwd_microstep: 1132.83 | bwd_inner_microstep: 1132.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 20:46:49,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.93 | bwd_microstep: 1522.76 | bwd_inner_microstep: 1522.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3829 [2024-06-10 20:46:52,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.72 | bwd_microstep: 1691.20 | bwd_inner_microstep: 1691.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3532 [2024-06-10 20:46:54,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.39 | bwd_microstep: 1358.63 | bwd_inner_microstep: 1358.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-10 20:46:55,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1300.83 | bwd_inner_microstep: 1300.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3530 [2024-06-10 20:46:57,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.30 | bwd_microstep: 1198.01 | bwd_inner_microstep: 1197.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 20:46:59,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1396.33 | bwd_inner_microstep: 1396.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 20:47:01,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1496.25 | bwd_inner_microstep: 1496.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 20:47:03,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.45 | bwd_microstep: 1508.10 | bwd_inner_microstep: 1508.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-10 20:47:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1308.71 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 20:47:07,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1386.93 | bwd_inner_microstep: 1386.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3722 [2024-06-10 20:47:09,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.52 | bwd_microstep: 1338.85 | bwd_inner_microstep: 1338.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2195 [2024-06-10 20:47:10,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.16 | bwd_microstep: 828.94 | bwd_inner_microstep: 828.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3675 [2024-06-10 20:47:12,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2058 [2024-06-10 20:47:13,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.78 | bwd_microstep: 911.48 | bwd_inner_microstep: 911.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 20:47:19,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 20:47:19,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.27 | bwd_microstep: 4975.04 | bwd_inner_microstep: 1684.76 | bwd_allreduce_microstep: 3290.22 | step_microstep: 37.78 [2024-06-10 20:47:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15794.41 | bwd: 45579.74 | bwd_inner: 42288.60 | bwd_allreduce: 3290.45 | step: 39.22 {'loss': 1.1463, 'learning_rate': 1.0097674069069132e-05, 'epoch': 0.67} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562 [2024-06-10 20:47:21,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.83 | bwd_microstep: 1417.92 | bwd_inner_microstep: 1417.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3736 [2024-06-10 20:47:23,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.90 | bwd_microstep: 1732.45 | bwd_inner_microstep: 1732.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 20:47:25,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1391.76 | bwd_inner_microstep: 1391.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 20:47:27,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1475.70 | bwd_inner_microstep: 1475.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-10 20:47:29,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1474.77 | bwd_inner_microstep: 1474.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 20:47:31,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.00 | bwd_microstep: 1249.11 | bwd_inner_microstep: 1249.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:47:33,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 20:47:35,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1496.72 | bwd_inner_microstep: 1496.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 20:47:36,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1247.74 | bwd_inner_microstep: 1247.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 20:47:38,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1341.79 | bwd_inner_microstep: 1341.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 20:47:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.57 | bwd_microstep: 1154.92 | bwd_inner_microstep: 1154.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3440 [2024-06-10 20:47:42,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.55 | bwd_microstep: 1296.88 | bwd_inner_microstep: 1296.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2054 [2024-06-10 20:47:43,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.54 | bwd_microstep: 848.19 | bwd_inner_microstep: 848.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3567 [2024-06-10 20:47:45,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.01 | bwd_microstep: 1562.92 | bwd_inner_microstep: 1562.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492 [2024-06-10 20:47:47,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.24 | bwd_microstep: 1429.25 | bwd_inner_microstep: 1429.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 20:47:49,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1375.44 | bwd_inner_microstep: 1375.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539 [2024-06-10 20:47:51,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1459.89 | bwd_inner_microstep: 1459.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3483 [2024-06-10 20:47:53,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.02 | bwd_microstep: 1433.26 | bwd_inner_microstep: 1433.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 20:47:55,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1384.43 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-10 20:47:57,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.71 | bwd_microstep: 1552.57 | bwd_inner_microstep: 1552.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 20:47:59,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-10 20:48:01,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1512.16 | bwd_inner_microstep: 1512.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944 [2024-06-10 20:48:02,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.34 | bwd_microstep: 758.81 | bwd_inner_microstep: 758.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-10 20:48:04,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.41 | bwd_microstep: 1495.53 | bwd_inner_microstep: 1495.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2005 [2024-06-10 20:48:05,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.73 | bwd_microstep: 896.36 | bwd_inner_microstep: 896.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004 [2024-06-10 20:48:07,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.15 | bwd_microstep: 895.34 | bwd_inner_microstep: 895.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-10 20:48:09,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.49 | bwd_microstep: 1749.13 | bwd_inner_microstep: 1749.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-10 20:48:11,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.82 | bwd_microstep: 1483.68 | bwd_inner_microstep: 1483.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 20:48:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.50 | bwd_microstep: 1653.65 | bwd_inner_microstep: 1653.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-10 20:48:15,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1530.10 | bwd_inner_microstep: 1530.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 20:48:17,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.94 | bwd_microstep: 1390.99 | bwd_inner_microstep: 1390.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466 [2024-06-10 20:48:22,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-10 20:48:22,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.14 | bwd_microstep: 3826.25 | bwd_inner_microstep: 1356.34 | bwd_allreduce_microstep: 2469.86 | step_microstep: 37.85 [2024-06-10 20:48:22,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16350.84 | bwd: 46297.00 | bwd_inner: 43826.24 | bwd_allreduce: 2470.09 | step: 39.25 {'loss': 1.1702, 'learning_rate': 1.006508101708997e-05, 'epoch': 0.68} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 20:48:24,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.65 | bwd_microstep: 1473.67 | bwd_inner_microstep: 1473.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 20:48:26,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1374.90 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3854 [2024-06-10 20:48:28,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1465.71 | bwd_inner_microstep: 1465.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 20:48:29,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1276.94 | bwd_inner_microstep: 1276.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4144 [2024-06-10 20:48:32,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.62 | bwd_microstep: 1638.44 | bwd_inner_microstep: 1638.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-10 20:48:34,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.02 | bwd_microstep: 1347.74 | bwd_inner_microstep: 1347.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 20:48:35,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3445 [2024-06-10 20:48:37,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.98 | bwd_microstep: 1377.37 | bwd_inner_microstep: 1377.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897 [2024-06-10 20:48:38,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.66 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 20:48:40,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1299.72 | bwd_inner_microstep: 1299.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3496 [2024-06-10 20:48:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.95 | bwd_microstep: 1331.04 | bwd_inner_microstep: 1331.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2931 [2024-06-10 20:48:44,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.37 | bwd_microstep: 1092.30 | bwd_inner_microstep: 1092.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-10 20:48:46,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.92 | bwd_microstep: 1611.77 | bwd_inner_microstep: 1611.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510 [2024-06-10 20:48:48,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.85 | bwd_microstep: 1582.48 | bwd_inner_microstep: 1582.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975 [2024-06-10 20:48:49,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 891.43 | bwd_inner_microstep: 891.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3472 [2024-06-10 20:48:51,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1360.00 | bwd_inner_microstep: 1359.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519 [2024-06-10 20:48:53,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.23 | bwd_microstep: 1319.31 | bwd_inner_microstep: 1319.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3963 [2024-06-10 20:48:55,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.82 | bwd_microstep: 1510.52 | bwd_inner_microstep: 1510.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 20:48:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1251.78 | bwd_inner_microstep: 1251.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925 [2024-06-10 20:48:58,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.45 | bwd_microstep: 725.66 | bwd_inner_microstep: 725.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-10 20:49:00,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1495.66 | bwd_inner_microstep: 1495.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 20:49:02,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.58 | bwd_microstep: 1498.05 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-10 20:49:04,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1393.96 | bwd_inner_microstep: 1393.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 20:49:06,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.28 | bwd_microstep: 1405.23 | bwd_inner_microstep: 1405.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-10 20:49:08,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.87 | bwd_microstep: 1358.16 | bwd_inner_microstep: 1358.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 20:49:10,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.84 | bwd_microstep: 1396.60 | bwd_inner_microstep: 1396.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 20:49:12,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1449.06 | bwd_inner_microstep: 1449.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-10 20:49:14,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.10 | bwd_microstep: 1481.35 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 20:49:16,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 20:49:18,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.51 | bwd_microstep: 1502.89 | bwd_inner_microstep: 1502.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426 [2024-06-10 20:49:20,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.39 | bwd_microstep: 1545.12 | bwd_inner_microstep: 1545.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 20:49:22,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.03 | optimizer_step: 6.60 [2024-06-10 20:49:22,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1839.65 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 448.66 | step_microstep: 37.46 [2024-06-10 20:49:22,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16310.07 | bwd: 44003.37 | bwd_inner: 43553.82 | bwd_allreduce: 448.89 | step: 38.90 {'loss': 1.2102, 'learning_rate': 1.0032522955843822e-05, 'epoch': 0.68} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-10 20:49:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.29 | bwd_microstep: 1332.17 | bwd_inner_microstep: 1332.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3999 [2024-06-10 20:49:26,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1408.63 | bwd_inner_microstep: 1408.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911 [2024-06-10 20:49:28,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.14 | bwd_microstep: 1588.74 | bwd_inner_microstep: 1588.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4295 [2024-06-10 20:49:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.69 | bwd_microstep: 1779.46 | bwd_inner_microstep: 1779.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 20:49:33,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.08 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 20:49:34,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1282.30 | bwd_inner_microstep: 1282.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 20:49:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1395.51 | bwd_inner_microstep: 1395.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 20:49:37,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.32 | bwd_microstep: 789.64 | bwd_inner_microstep: 789.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-10 20:49:39,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.69 | bwd_microstep: 1184.55 | bwd_inner_microstep: 1184.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 20:49:41,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1249.09 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584 [2024-06-10 20:49:43,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1396.32 | bwd_inner_microstep: 1396.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-10 20:49:45,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.78 | bwd_microstep: 1523.00 | bwd_inner_microstep: 1522.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3493 [2024-06-10 20:49:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1315.23 | bwd_inner_microstep: 1315.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2937 [2024-06-10 20:49:48,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.37 | bwd_microstep: 1130.22 | bwd_inner_microstep: 1130.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 20:49:50,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1287.45 | bwd_inner_microstep: 1287.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 20:49:52,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 20:49:54,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1383.58 | bwd_inner_microstep: 1383.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 20:49:56,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1391.95 | bwd_inner_microstep: 1391.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833 [2024-06-10 20:49:58,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.85 | bwd_microstep: 1465.95 | bwd_inner_microstep: 1465.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-10 20:50:00,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1415.17 | bwd_inner_microstep: 1415.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867 [2024-06-10 20:50:02,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.48 | bwd_microstep: 1669.28 | bwd_inner_microstep: 1669.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531 [2024-06-10 20:50:04,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.98 | bwd_microstep: 1353.18 | bwd_inner_microstep: 1353.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 20:50:06,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 20:50:08,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1498.18 | bwd_inner_microstep: 1498.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 20:50:10,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.20 | bwd_microstep: 1396.79 | bwd_inner_microstep: 1396.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3634 [2024-06-10 20:50:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.93 | bwd_microstep: 1476.79 | bwd_inner_microstep: 1476.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3457 [2024-06-10 20:50:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.30 | bwd_microstep: 1215.17 | bwd_inner_microstep: 1215.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2666 [2024-06-10 20:50:15,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.57 | bwd_microstep: 1120.46 | bwd_inner_microstep: 1120.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 20:50:17,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1481.85 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 20:50:19,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.02 | bwd_microstep: 1650.10 | bwd_inner_microstep: 1650.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2081 [2024-06-10 20:50:21,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.09 | bwd_microstep: 1012.07 | bwd_inner_microstep: 1012.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3454 [2024-06-10 20:50:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-10 20:50:25,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 3468.29 | bwd_inner_microstep: 1691.01 | bwd_allreduce_microstep: 1777.23 | step_microstep: 37.94 [2024-06-10 20:50:25,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16384.80 | bwd: 45678.32 | bwd_inner: 43900.18 | bwd_allreduce: 1777.46 | step: 39.51 {'loss': 1.2357, 'learning_rate': 1.0000000000000006e-05, 'epoch': 0.68} 3/1726 [20:07:52<9:41:03, 61.93s/it] 67%|██████▋ | 1164/1726 [20:08:54<9:40:23, 61.96s/it] 67%|██████▋ | 1164/1726 [20:08:54<9:40:23, 61.96s/it] 67%|██████▋ | 1165/1726 [20:09:55<9:38:36, 61.88s/it] 67%|██████▋ | 1165/1726 [20:09:55<9:38:36, 61.88s/it] 68%|██████▊ | 1166/1726 [20:10:58<9:40:39, 62.21s/it] 68%|██████▊ | 1166/1726 [20:10:58<9:40:39, 62.21s/it] 68%|██████▊ | 1167/1726 [20:11:59<9:35:13, 61.74s/it] 68%|██████▊ | 1167/1726 [20:11:59<9:35:13, 61.74s/it] 68%|██████▊ | 1168/1726 [20:13:01<9:36:01, 61.94s/it] 68%|██████▊ | 1168/1726 [20:13:01<9:36:01dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1984 [2024-06-10 20:50:26,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.15 | bwd_microstep: 882.26 | bwd_inner_microstep: 882.18 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 20:50:27,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.81 | bwd_microstep: 788.41 | bwd_inner_microstep: 788.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2322 [2024-06-10 20:50:28,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.59 | bwd_microstep: 885.03 | bwd_inner_microstep: 885.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4273 [2024-06-10 20:50:30,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.11 | bwd_microstep: 1599.73 | bwd_inner_microstep: 1599.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 20:50:33,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.49 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4024 [2024-06-10 20:50:35,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.27 | bwd_microstep: 1519.36 | bwd_inner_microstep: 1519.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-10 20:50:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 804.14 | bwd_inner_microstep: 804.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 20:50:37,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 791.95 | bwd_inner_microstep: 791.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697 [2024-06-10 20:50:39,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.24 | bwd_microstep: 1457.89 | bwd_inner_microstep: 1457.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3653 [2024-06-10 20:50:41,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.00 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-10 20:50:43,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.84 | bwd_microstep: 1720.61 | bwd_inner_microstep: 1720.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2115 [2024-06-10 20:50:45,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.77 | bwd_microstep: 873.88 | bwd_inner_microstep: 873.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3399 [2024-06-10 20:50:47,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.69 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650 [2024-06-10 20:50:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.62 | bwd_microstep: 1719.32 | bwd_inner_microstep: 1719.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 20:50:51,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.02 | bwd_microstep: 1378.86 | bwd_inner_microstep: 1378.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502 [2024-06-10 20:50:53,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.30 | bwd_microstep: 1415.49 | bwd_inner_microstep: 1415.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512 [2024-06-10 20:50:55,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.61 | bwd_microstep: 1318.48 | bwd_inner_microstep: 1318.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 20:50:56,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1282.21 | bwd_inner_microstep: 1282.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 20:50:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1396.82 | bwd_inner_microstep: 1396.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 20:51:00,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 20:51:02,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1490.77 | bwd_inner_microstep: 1490.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 20:51:04,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.41 | bwd_microstep: 1297.04 | bwd_inner_microstep: 1297.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610 [2024-06-10 20:51:06,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1341.98 | bwd_inner_microstep: 1341.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271 [2024-06-10 20:51:07,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.60 | bwd_microstep: 877.65 | bwd_inner_microstep: 877.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754 [2024-06-10 20:51:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.17 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 20:51:11,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1379.07 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2569 [2024-06-10 20:51:13,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.57 | bwd_microstep: 975.04 | bwd_inner_microstep: 975.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603 [2024-06-10 20:51:14,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.09 | bwd_microstep: 1430.72 | bwd_inner_microstep: 1430.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3816 [2024-06-10 20:51:17,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.84 | bwd_microstep: 1615.58 | bwd_inner_microstep: 1615.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 20:51:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1649.74 | bwd_inner_microstep: 1649.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3770 [2024-06-10 20:51:21,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1570.89 | bwd_inner_microstep: 1570.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2259 [2024-06-10 20:51:26,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.58 [2024-06-10 20:51:26,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.13 | bwd_microstep: 4646.79 | bwd_inner_microstep: 1134.87 | bwd_allreduce_microstep: 3511.86 | step_microstep: 37.75 [2024-06-10 20:51:26,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15649.09 | bwd: 45506.01 | bwd_inner: 41993.17 | bwd_allreduce: 3512.13 | step: 39.27 {'loss': 1.1798, 'learning_rate': 9.967512264104204e-06, 'epoch': 0.68} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3473 [2024-06-10 20:51:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.84 | bwd_microstep: 1563.72 | bwd_inner_microstep: 1563.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3905 [2024-06-10 20:51:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1388.13 | bwd_inner_microstep: 1388.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3849 [2024-06-10 20:51:32,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1488.10 | bwd_inner_microstep: 1488.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855 [2024-06-10 20:51:35,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.56 | bwd_microstep: 1657.35 | bwd_inner_microstep: 1657.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 20:51:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.80 | bwd_microstep: 792.43 | bwd_inner_microstep: 792.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 20:51:38,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1483.97 | bwd_inner_microstep: 1483.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-10 20:51:40,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.95 | bwd_microstep: 1647.71 | bwd_inner_microstep: 1647.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436 [2024-06-10 20:51:42,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.27 | bwd_microstep: 1284.47 | bwd_inner_microstep: 1284.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1960 [2024-06-10 20:51:43,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.52 | bwd_microstep: 765.71 | bwd_inner_microstep: 765.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2484 [2024-06-10 20:51:44,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.45 | bwd_microstep: 956.02 | bwd_inner_microstep: 956.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457 [2024-06-10 20:51:46,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1403.42 | bwd_inner_microstep: 1403.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3745 [2024-06-10 20:51:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.32 | bwd_microstep: 1566.73 | bwd_inner_microstep: 1566.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2140 [2024-06-10 20:51:50,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.51 | bwd_microstep: 928.65 | bwd_inner_microstep: 928.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 20:51:51,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1339.93 | bwd_inner_microstep: 1339.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2106 [2024-06-10 20:51:53,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.58 | bwd_microstep: 1014.42 | bwd_inner_microstep: 1014.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3845 [2024-06-10 20:51:55,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.25 | bwd_microstep: 1761.38 | bwd_inner_microstep: 1761.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684 [2024-06-10 20:51:57,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1527.15 | bwd_inner_microstep: 1527.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3491 [2024-06-10 20:51:59,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.51 | bwd_microstep: 1320.14 | bwd_inner_microstep: 1320.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 20:52:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.84 | bwd_microstep: 1519.00 | bwd_inner_microstep: 1518.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-10 20:52:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.73 | bwd_microstep: 1486.26 | bwd_inner_microstep: 1486.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 20:52:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1291.86 | bwd_inner_microstep: 1291.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 20:52:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.23 | bwd_microstep: 1310.21 | bwd_inner_microstep: 1310.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-10 20:52:09,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.15 | bwd_microstep: 1693.76 | bwd_inner_microstep: 1693.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 20:52:12,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.34 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 20:52:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 20:52:15,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1400.87 | bwd_inner_microstep: 1400.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-10 20:52:17,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.84 | bwd_microstep: 1397.32 | bwd_inner_microstep: 1397.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 20:52:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.38 | bwd_microstep: 1284.06 | bwd_inner_microstep: 1284.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589 [2024-06-10 20:52:21,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.82 | bwd_microstep: 1606.06 | bwd_inner_microstep: 1606.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 20:52:23,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.93 | bwd_microstep: 1540.68 | bwd_inner_microstep: 1540.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3802 [2024-06-10 20:52:26,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.20 | bwd_microstep: 1718.72 | bwd_inner_microstep: 1718.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 20:52:28,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-10 20:52:28,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.39 | bwd_microstep: 1471.27 | bwd_inner_microstep: 1463.56 | bwd_allreduce_microstep: 7.66 | step_microstep: 37.77 [2024-06-10 20:52:28,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16644.62 | bwd: 44651.66 | bwd_inner: 44643.10 | bwd_allreduce: 7.89 | step: 39.17 {'loss': 1.1453, 'learning_rate': 9.935059862578047e-06, 'epoch': 0.68} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931 [2024-06-10 20:52:29,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.09 | bwd_microstep: 880.27 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 20:52:31,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1297.94 | bwd_inner_microstep: 1297.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4313 [2024-06-10 20:52:33,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.52 | bwd_microstep: 1586.15 | bwd_inner_microstep: 1586.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 20:52:35,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1482.38 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 20:52:37,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1281.74 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 20:52:38,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 788.75 | bwd_inner_microstep: 788.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 20:52:40,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.49 | bwd_microstep: 1376.29 | bwd_inner_microstep: 1376.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-10 20:52:42,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1250.89 | bwd_inner_microstep: 1250.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4117 [2024-06-10 20:52:44,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.74 | bwd_microstep: 1532.34 | bwd_inner_microstep: 1532.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3548 [2024-06-10 20:52:46,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.03 | bwd_microstep: 1279.69 | bwd_inner_microstep: 1279.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3427 [2024-06-10 20:52:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.78 | bwd_microstep: 1298.15 | bwd_inner_microstep: 1298.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 20:52:49,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1431.42 | bwd_inner_microstep: 1431.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2944 [2024-06-10 20:52:51,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 434.80 | bwd_microstep: 1148.24 | bwd_inner_microstep: 1148.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-10 20:52:52,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.39 | bwd_microstep: 797.45 | bwd_inner_microstep: 797.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-10 20:52:54,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.94 | bwd_microstep: 1633.17 | bwd_inner_microstep: 1633.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561 [2024-06-10 20:52:56,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1330.23 | bwd_inner_microstep: 1330.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3712 [2024-06-10 20:52:58,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.98 | bwd_microstep: 1724.12 | bwd_inner_microstep: 1724.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988 [2024-06-10 20:53:00,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.55 | bwd_microstep: 831.95 | bwd_inner_microstep: 831.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2152 [2024-06-10 20:53:01,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.87 | bwd_microstep: 851.04 | bwd_inner_microstep: 851.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3539 [2024-06-10 20:53:03,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.03 | bwd_microstep: 1661.78 | bwd_inner_microstep: 1661.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3848 [2024-06-10 20:53:05,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.93 | bwd_microstep: 1695.05 | bwd_inner_microstep: 1695.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2918 [2024-06-10 20:53:07,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.36 | bwd_microstep: 1284.85 | bwd_inner_microstep: 1284.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3725 [2024-06-10 20:53:09,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.49 | bwd_microstep: 1585.45 | bwd_inner_microstep: 1585.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1264 [2024-06-10 20:53:10,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 186.07 | bwd_microstep: 487.09 | bwd_inner_microstep: 487.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 20:53:12,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.50 | bwd_microstep: 1511.47 | bwd_inner_microstep: 1511.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3809 [2024-06-10 20:53:14,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.10 | bwd_microstep: 1688.33 | bwd_inner_microstep: 1688.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 20:53:16,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 20:53:18,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1350.83 | bwd_inner_microstep: 1350.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 20:53:20,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.16 | bwd_microstep: 1398.88 | bwd_inner_microstep: 1398.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 20:53:22,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.58 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-10 20:53:24,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.14 | bwd_microstep: 1604.91 | bwd_inner_microstep: 1604.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3380 [2024-06-10 20:53:28,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.57 [2024-06-10 20:53:28,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 2569.57 | bwd_inner_microstep: 1628.03 | bwd_allreduce_microstep: 941.49 | step_microstep: 37.78 [2024-06-10 20:53:28,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15847.83 | bwd: 43489.89 | bwd_inner: 42547.43 | bwd_allreduce: 941.76 | step: 39.32 {'loss': 1.1812, 'learning_rate': 9.902642909718737e-06, 'epoch': 0.68} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918 [2024-06-10 20:53:29,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.98 | bwd_microstep: 776.86 | bwd_inner_microstep: 776.78 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4007 [2024-06-10 20:53:31,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.01 | bwd_microstep: 1711.96 | bwd_inner_microstep: 1711.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 20:53:33,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.21 | bwd_microstep: 1483.29 | bwd_inner_microstep: 1483.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:53:35,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.48 | bwd_microstep: 1277.08 | bwd_inner_microstep: 1277.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 20:53:37,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 20:53:38,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.93 | bwd_microstep: 1155.92 | bwd_inner_microstep: 1155.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 20:53:40,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.08 | bwd_microstep: 1150.44 | bwd_inner_microstep: 1150.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3486 [2024-06-10 20:53:42,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.15 | bwd_microstep: 1249.90 | bwd_inner_microstep: 1249.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-10 20:53:44,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1452.28 | bwd_inner_microstep: 1452.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3499 [2024-06-10 20:53:46,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.36 | bwd_microstep: 1513.62 | bwd_inner_microstep: 1513.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455 [2024-06-10 20:53:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.35 | bwd_microstep: 1329.07 | bwd_inner_microstep: 1329.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3513 [2024-06-10 20:53:49,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.47 | bwd_microstep: 1252.78 | bwd_inner_microstep: 1252.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460 [2024-06-10 20:53:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.78 | bwd_microstep: 1212.67 | bwd_inner_microstep: 1212.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2412 [2024-06-10 20:53:52,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.71 | bwd_microstep: 971.20 | bwd_inner_microstep: 971.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1992 [2024-06-10 20:53:53,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 287.31 | bwd_microstep: 750.96 | bwd_inner_microstep: 750.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 20:53:56,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.66 | bwd_microstep: 1614.05 | bwd_inner_microstep: 1614.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 20:53:58,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.87 | bwd_microstep: 1514.29 | bwd_inner_microstep: 1514.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 20:54:00,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.51 | bwd_microstep: 1512.13 | bwd_inner_microstep: 1512.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3548 [2024-06-10 20:54:02,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.29 | bwd_microstep: 1442.04 | bwd_inner_microstep: 1442.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1985 [2024-06-10 20:54:03,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.96 | bwd_microstep: 767.20 | bwd_inner_microstep: 767.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 641 [2024-06-10 20:54:03,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 108.71 | bwd_microstep: 272.89 | bwd_inner_microstep: 272.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 20:54:05,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1498.97 | bwd_inner_microstep: 1498.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2436 [2024-06-10 20:54:07,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.30 | bwd_microstep: 976.96 | bwd_inner_microstep: 976.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 20:54:09,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1378.82 | bwd_inner_microstep: 1378.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-10 20:54:11,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1603.55 | bwd_inner_microstep: 1603.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3732 [2024-06-10 20:54:13,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.34 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 20:54:15,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1556.77 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 20:54:16,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 791.77 | bwd_inner_microstep: 791.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3437 [2024-06-10 20:54:18,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1281.47 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547 [2024-06-10 20:54:20,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.31 | bwd_microstep: 1457.07 | bwd_inner_microstep: 1457.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3609 [2024-06-10 20:54:22,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 20:54:30,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-10 20:54:30,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.99 | bwd_microstep: 7328.73 | bwd_inner_microstep: 1749.62 | bwd_allreduce_microstep: 5579.06 | step_microstep: 37.86 [2024-06-10 20:54:30,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15214.85 | bwd: 46570.74 | bwd_inner: 40990.72 | bwd_allreduce: 5579.33 | step: 39.50 {'loss': 1.155, 'learning_rate': 9.870261519698612e-06, 'epoch': 0.68} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396 [2024-06-10 20:54:31,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.78 | bwd_microstep: 1328.20 | bwd_inner_microstep: 1328.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3387 [2024-06-10 20:54:33,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.04 | bwd_microstep: 1143.05 | bwd_inner_microstep: 1143.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-10 20:54:35,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1479.05 | bwd_inner_microstep: 1479.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 20:54:37,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.15 | bwd_microstep: 1143.00 | bwd_inner_microstep: 1142.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3423 [2024-06-10 20:54:38,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1210.82 | bwd_inner_microstep: 1210.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566 [2024-06-10 20:54:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.77 | bwd_microstep: 1203.02 | bwd_inner_microstep: 1202.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 20:54:42,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 20:54:43,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 791.64 | bwd_inner_microstep: 791.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 20:54:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1384.90 | bwd_inner_microstep: 1384.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533 [2024-06-10 20:54:47,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.79 | bwd_microstep: 1230.97 | bwd_inner_microstep: 1230.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-10 20:54:48,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1282.40 | bwd_inner_microstep: 1282.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3659 [2024-06-10 20:54:51,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.18 | bwd_microstep: 1475.44 | bwd_inner_microstep: 1475.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3954 [2024-06-10 20:54:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.47 | bwd_microstep: 1433.67 | bwd_inner_microstep: 1433.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 20:54:54,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1396.71 | bwd_inner_microstep: 1396.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 20:54:56,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1399.29 | bwd_inner_microstep: 1399.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 20:54:58,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.98 | bwd_microstep: 1452.66 | bwd_inner_microstep: 1452.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 20:55:00,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1386.49 | bwd_inner_microstep: 1386.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436 [2024-06-10 20:55:02,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1155.24 | bwd_inner_microstep: 1155.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-10 20:55:03,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.09 | bwd_microstep: 798.30 | bwd_inner_microstep: 798.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 20:55:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1387.76 | bwd_inner_microstep: 1387.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 20:55:07,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.11 | bwd_microstep: 1459.40 | bwd_inner_microstep: 1459.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-10 20:55:09,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1610.80 | bwd_inner_microstep: 1610.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-10 20:55:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.18 | bwd_microstep: 685.42 | bwd_inner_microstep: 685.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 20:55:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.46 | bwd_microstep: 1281.88 | bwd_inner_microstep: 1281.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 20:55:14,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.89 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3598 [2024-06-10 20:55:16,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.86 | bwd_microstep: 1637.88 | bwd_inner_microstep: 1637.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 20:55:18,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 20:55:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.25 | bwd_microstep: 1496.12 | bwd_inner_microstep: 1496.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1859 [2024-06-10 20:55:21,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.40 | bwd_microstep: 706.38 | bwd_inner_microstep: 706.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089 [2024-06-10 20:55:22,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.50 | bwd_microstep: 852.38 | bwd_inner_microstep: 852.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3746 [2024-06-10 20:55:24,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.00 | bwd_microstep: 1682.44 | bwd_inner_microstep: 1682.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3773 [2024-06-10 20:55:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-10 20:55:30,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.33 | bwd_microstep: 4694.04 | bwd_inner_microstep: 2391.07 | bwd_allreduce_microstep: 2302.91 | step_microstep: 37.69 [2024-06-10 20:55:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15494.60 | bwd: 44113.59 | bwd_inner: 41809.77 | bwd_allreduce: 2303.14 | step: 39.19 {'loss': 1.2294, 'learning_rate': 9.837915806564753e-06, 'epoch': 0.68} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 20:55:32,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1471.32 | bwd_inner_microstep: 1471.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 20:55:33,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1276.53 | bwd_inner_microstep: 1276.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 20:55:35,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1343.16 | bwd_inner_microstep: 1343.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-10 20:55:38,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.85 | bwd_microstep: 1660.18 | bwd_inner_microstep: 1660.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 20:55:39,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1394.41 | bwd_inner_microstep: 1394.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-10 20:55:40,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.36 | bwd_microstep: 678.06 | bwd_inner_microstep: 678.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 20:55:42,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.24 | bwd_microstep: 1396.72 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 20:55:44,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1381.55 | bwd_inner_microstep: 1381.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 20:55:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1388.96 | bwd_inner_microstep: 1388.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 20:55:48,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 20:55:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1391.62 | bwd_inner_microstep: 1391.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513 [2024-06-10 20:55:52,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.02 | bwd_microstep: 1549.77 | bwd_inner_microstep: 1549.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3503 [2024-06-10 20:55:54,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1553.39 | bwd_inner_microstep: 1553.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2656 [2024-06-10 20:55:56,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.14 | bwd_microstep: 1019.30 | bwd_inner_microstep: 1019.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-10 20:55:58,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.62 | bwd_microstep: 1600.98 | bwd_inner_microstep: 1600.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 20:56:00,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1342.89 | bwd_inner_microstep: 1342.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637 [2024-06-10 20:56:02,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.36 | bwd_microstep: 1436.21 | bwd_inner_microstep: 1436.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 20:56:03,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1259.49 | bwd_inner_microstep: 1259.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-10 20:56:05,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.79 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 20:56:07,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.45 | bwd_microstep: 1159.64 | bwd_inner_microstep: 1159.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 20:56:09,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1516.11 | bwd_inner_microstep: 1516.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-10 20:56:11,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.29 | bwd_microstep: 1254.33 | bwd_inner_microstep: 1254.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495 [2024-06-10 20:56:13,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1577.85 | bwd_inner_microstep: 1577.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-10 20:56:15,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1429.45 | bwd_inner_microstep: 1429.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 20:56:17,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.16 | bwd_microstep: 1590.07 | bwd_inner_microstep: 1590.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-10 20:56:19,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.03 | bwd_microstep: 1587.00 | bwd_inner_microstep: 1586.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 20:56:21,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1517.03 | bwd_inner_microstep: 1517.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 20:56:23,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.64 | bwd_microstep: 1470.80 | bwd_inner_microstep: 1470.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 20:56:25,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:56:27,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.72 | bwd_microstep: 1286.90 | bwd_inner_microstep: 1286.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 20:56:29,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1509.05 | bwd_inner_microstep: 1509.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 20:56:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-10 20:56:31,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1319.28 | bwd_inner_microstep: 1310.43 | bwd_allreduce_microstep: 8.81 | step_microstep: 39.48 [2024-06-10 20:56:31,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16588.57 | bwd: 44422.85 | bwd_inner: 44413.14 | bwd_allreduce: 9.04 | step: 40.94 , 61.94s/it] 68%|██████▊ | 1169/1726 [20:14:03<9:33:45, 61.80s/it] 68%|██████▊ | 1169/1726 [20:14:03<9:33:45, 61.80s/it] 68%|██████▊ | 1170/1726 [20:15:05<9:32:14, 61.75s/it] 68%|██████▊ | 1170/1726 [20:15:05<9:32:14, 61.75s/it] 68%|██████▊ | 1171/1726 [20:16:04<9:25:27, 61.13s/it] 68%|██████▊ | 1171/1726 [20:16:04<9:25:27, 61.13s/it] 68%|██████▊ | 1172/1726 [20:17:06<9:27:11, 61.43s/it] 68%|██████▊ | 1172/1726 [20:17:06<9:27:11, 61.43s/it] 68%|██████▊ | 1173/1726 [20:18:06<9:22:02, 60.98s/it] 68%|██████▊ | 1173/1726 [20:18:06<9:22:02, 60.98s/it] 68%|██{'loss': 1.2026, 'learning_rate': 9.805605884238587e-06, 'epoch': 0.68} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456 [2024-06-10 20:56:33,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1547.40 | bwd_inner_microstep: 1547.20 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3973 [2024-06-10 20:56:35,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.94 | bwd_microstep: 1704.37 | bwd_inner_microstep: 1704.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3798 [2024-06-10 20:56:37,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1479.20 | bwd_inner_microstep: 1479.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 20:56:39,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-10 20:56:41,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.93 | bwd_microstep: 1278.60 | bwd_inner_microstep: 1278.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529 [2024-06-10 20:56:43,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.74 | bwd_microstep: 1195.69 | bwd_inner_microstep: 1195.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 20:56:44,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1255.75 | bwd_inner_microstep: 1255.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-10 20:56:45,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.28 | bwd_microstep: 699.41 | bwd_inner_microstep: 699.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-10 20:56:47,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.87 | bwd_microstep: 1403.24 | bwd_inner_microstep: 1403.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 20:56:49,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.57 | bwd_microstep: 1522.08 | bwd_inner_microstep: 1522.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 20:56:51,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.67 | bwd_microstep: 1523.58 | bwd_inner_microstep: 1523.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 20:56:53,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.85 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 20:56:55,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.74 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1910 [2024-06-10 20:56:56,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.43 | bwd_microstep: 751.29 | bwd_inner_microstep: 751.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1967 [2024-06-10 20:56:57,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.51 | bwd_microstep: 891.75 | bwd_inner_microstep: 891.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 20:56:59,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.02 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-10 20:57:01,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.72 | bwd_microstep: 1409.05 | bwd_inner_microstep: 1409.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 20:57:03,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.93 | bwd_microstep: 1335.97 | bwd_inner_microstep: 1335.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107 [2024-06-10 20:57:04,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.55 | bwd_microstep: 921.73 | bwd_inner_microstep: 921.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1975 [2024-06-10 20:57:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.56 | bwd_microstep: 769.28 | bwd_inner_microstep: 769.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-10 20:57:08,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.81 | bwd_microstep: 1585.18 | bwd_inner_microstep: 1585.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 20:57:10,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.37 | bwd_microstep: 1594.63 | bwd_inner_microstep: 1594.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3106 [2024-06-10 20:57:11,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3268 [2024-06-10 20:57:13,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.01 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 20:57:15,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.23 | bwd_microstep: 1491.86 | bwd_inner_microstep: 1491.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3577 [2024-06-10 20:57:17,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.49 | bwd_microstep: 1527.10 | bwd_inner_microstep: 1527.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3814 [2024-06-10 20:57:19,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.58 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 20:57:21,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.16 | bwd_microstep: 1423.48 | bwd_inner_microstep: 1423.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 20:57:23,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1498.86 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178 [2024-06-10 20:57:25,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.09 | bwd_microstep: 954.04 | bwd_inner_microstep: 954.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-10 20:57:26,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.42 | bwd_microstep: 1158.62 | bwd_inner_microstep: 1158.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3717 [2024-06-10 20:57:31,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 20:57:31,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.14 | bwd_microstep: 3844.94 | bwd_inner_microstep: 1856.74 | bwd_allreduce_microstep: 1988.14 | step_microstep: 39.03 [2024-06-10 20:57:31,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15596.93 | bwd: 43847.45 | bwd_inner: 41858.24 | bwd_allreduce: 1988.46 | step: 40.60 {'loss': 1.1539, 'learning_rate': 9.77333186651551e-06, 'epoch': 0.68} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 20:57:33,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1269.82 | bwd_inner_microstep: 1269.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3946 [2024-06-10 20:57:35,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1525.81 | bwd_inner_microstep: 1525.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 20:57:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1549.96 | bwd_inner_microstep: 1549.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-10 20:57:39,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1296.38 | bwd_inner_microstep: 1296.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782 [2024-06-10 20:57:41,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.06 | bwd_microstep: 1476.77 | bwd_inner_microstep: 1476.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742 [2024-06-10 20:57:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.55 | bwd_microstep: 1632.40 | bwd_inner_microstep: 1632.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 20:57:44,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.72 | bwd_microstep: 792.59 | bwd_inner_microstep: 792.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432 [2024-06-10 20:57:46,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.57 | bwd_microstep: 1153.37 | bwd_inner_microstep: 1153.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 20:57:47,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:57:49,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1384.88 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 20:57:50,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978 [2024-06-10 20:57:53,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.45 | bwd_microstep: 1605.17 | bwd_inner_microstep: 1605.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 20:57:54,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.88 | bwd_microstep: 1378.67 | bwd_inner_microstep: 1378.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-10 20:57:56,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.22 | bwd_microstep: 895.55 | bwd_inner_microstep: 895.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3911 [2024-06-10 20:57:58,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.55 | bwd_microstep: 1732.71 | bwd_inner_microstep: 1732.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3508 [2024-06-10 20:58:00,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.74 | bwd_microstep: 1428.57 | bwd_inner_microstep: 1428.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3534 [2024-06-10 20:58:02,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1228.81 | bwd_inner_microstep: 1228.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3622 [2024-06-10 20:58:04,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.13 | bwd_microstep: 1314.41 | bwd_inner_microstep: 1314.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 20:58:06,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1500.33 | bwd_inner_microstep: 1500.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 20:58:08,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1491.65 | bwd_inner_microstep: 1491.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610 [2024-06-10 20:58:10,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.12 | bwd_microstep: 1605.98 | bwd_inner_microstep: 1605.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3637 [2024-06-10 20:58:12,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.99 | bwd_microstep: 1313.12 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 20:58:14,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1294.15 | bwd_inner_microstep: 1294.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-10 20:58:16,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1587.88 | bwd_inner_microstep: 1587.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 20:58:18,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1555.10 | bwd_inner_microstep: 1555.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 20:58:20,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1396.39 | bwd_inner_microstep: 1396.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2141 [2024-06-10 20:58:21,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.03 | bwd_microstep: 893.87 | bwd_inner_microstep: 893.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3432 [2024-06-10 20:58:23,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.01 | bwd_microstep: 1315.69 | bwd_inner_microstep: 1315.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3802 [2024-06-10 20:58:25,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1418.05 | bwd_inner_microstep: 1418.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590 [2024-06-10 20:58:27,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.09 | bwd_microstep: 1700.95 | bwd_inner_microstep: 1700.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3444 [2024-06-10 20:58:29,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.43 | bwd_microstep: 1513.66 | bwd_inner_microstep: 1513.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3743 [2024-06-10 20:58:31,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.66 [2024-06-10 20:58:31,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.73 | bwd_microstep: 1654.18 | bwd_inner_microstep: 1645.93 | bwd_allreduce_microstep: 8.19 | step_microstep: 39.37 [2024-06-10 20:58:31,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16444.26 | bwd: 43979.69 | bwd_inner: 43970.59 | bwd_allreduce: 8.42 | step: 40.87 {'loss': 1.1792, 'learning_rate': 9.74109386706443e-06, 'epoch': 0.68} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3063 [2024-06-10 20:58:33,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.30 | bwd_microstep: 1170.67 | bwd_inner_microstep: 1170.44 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.22 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3986 [2024-06-10 20:58:36,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.16 | bwd_microstep: 1746.05 | bwd_inner_microstep: 1746.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801 [2024-06-10 20:58:38,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.09 | bwd_microstep: 1647.77 | bwd_inner_microstep: 1647.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-10 20:58:40,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1297.73 | bwd_inner_microstep: 1297.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 20:58:42,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1397.23 | bwd_inner_microstep: 1397.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 20:58:42,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.32 | bwd_microstep: 678.38 | bwd_inner_microstep: 678.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 20:58:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.27 | bwd_microstep: 1284.71 | bwd_inner_microstep: 1284.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 20:58:46,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1242.85 | bwd_inner_microstep: 1242.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 20:58:48,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.82 | bwd_microstep: 1387.83 | bwd_inner_microstep: 1387.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688 [2024-06-10 20:58:50,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.96 | bwd_microstep: 1522.73 | bwd_inner_microstep: 1522.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 20:58:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.39 | bwd_microstep: 1150.62 | bwd_inner_microstep: 1150.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-10 20:58:53,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.29 | bwd_microstep: 1276.54 | bwd_inner_microstep: 1276.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2928 [2024-06-10 20:58:55,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.30 | bwd_microstep: 1094.42 | bwd_inner_microstep: 1094.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676 [2024-06-10 20:58:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.17 | bwd_microstep: 1548.76 | bwd_inner_microstep: 1548.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3507 [2024-06-10 20:58:59,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1444.21 | bwd_inner_microstep: 1444.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 20:59:00,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.53 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 20:59:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.23 | bwd_microstep: 1293.06 | bwd_inner_microstep: 1293.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 20:59:04,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1286.20 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 20:59:06,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1348.41 | bwd_inner_microstep: 1348.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2968 [2024-06-10 20:59:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.82 | bwd_microstep: 1201.12 | bwd_inner_microstep: 1201.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 20:59:09,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.30 | bwd_microstep: 1499.89 | bwd_inner_microstep: 1499.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 20:59:11,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1449.01 | bwd_inner_microstep: 1448.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-10 20:59:13,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.67 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832 [2024-06-10 20:59:16,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.20 | bwd_microstep: 1583.89 | bwd_inner_microstep: 1583.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 20:59:17,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1251.49 | bwd_inner_microstep: 1251.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 20:59:19,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.16 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3592 [2024-06-10 20:59:21,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1241.62 | bwd_inner_microstep: 1241.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3749 [2024-06-10 20:59:23,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3559 [2024-06-10 20:59:25,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1423.43 | bwd_inner_microstep: 1423.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3541 [2024-06-10 20:59:27,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1462.81 | bwd_inner_microstep: 1462.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776 [2024-06-10 20:59:29,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.77 | bwd_microstep: 1743.87 | bwd_inner_microstep: 1743.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3575 [2024-06-10 20:59:33,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.12 | optimizer_step: 6.59 [2024-06-10 20:59:33,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.24 | bwd_microstep: 2835.41 | bwd_inner_microstep: 1723.07 | bwd_allreduce_microstep: 1112.28 | step_microstep: 38.87 [2024-06-10 20:59:33,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16215.92 | bwd: 44501.24 | bwd_inner: 43387.86 | bwd_allreduce: 1112.62 | step: 40.59 {'loss': 1.2482, 'learning_rate': 9.70889199942743e-06, 'epoch': 0.68} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 20:59:34,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4056 [2024-06-10 20:59:37,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.18 | bwd_microstep: 1616.29 | bwd_inner_microstep: 1616.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 20:59:38,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1375.92 | bwd_inner_microstep: 1375.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-10 20:59:41,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1550.84 | bwd_inner_microstep: 1550.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 20:59:42,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.20 | bwd_microstep: 1148.95 | bwd_inner_microstep: 1148.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 20:59:44,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.18 | bwd_microstep: 1285.13 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2212 [2024-06-10 20:59:45,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.58 | bwd_microstep: 957.73 | bwd_inner_microstep: 957.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 20:59:47,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 20:59:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.60 | bwd_microstep: 1159.97 | bwd_inner_microstep: 1159.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 20:59:51,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.43 | bwd_microstep: 1386.95 | bwd_inner_microstep: 1386.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991 [2024-06-10 20:59:51,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.18 | bwd_microstep: 707.96 | bwd_inner_microstep: 707.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1359 [2024-06-10 20:59:52,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.61 | bwd_microstep: 521.70 | bwd_inner_microstep: 521.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-10 20:59:53,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.70 | bwd_microstep: 806.38 | bwd_inner_microstep: 806.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 20:59:55,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1389.56 | bwd_inner_microstep: 1389.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 20:59:57,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 20:59:59,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1374.82 | bwd_inner_microstep: 1374.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 21:00:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1399.52 | bwd_inner_microstep: 1399.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 21:00:03,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932 [2024-06-10 21:00:04,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.61 | bwd_microstep: 700.47 | bwd_inner_microstep: 700.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 21:00:06,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.07 | bwd_microstep: 1652.30 | bwd_inner_microstep: 1652.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 21:00:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.50 | bwd_microstep: 1558.79 | bwd_inner_microstep: 1558.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 21:00:10,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.20 | bwd_microstep: 1480.32 | bwd_inner_microstep: 1480.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 21:00:12,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1508.49 | bwd_inner_microstep: 1508.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 21:00:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1497.00 | bwd_inner_microstep: 1496.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091 [2024-06-10 21:00:16,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.22 | bwd_microstep: 823.41 | bwd_inner_microstep: 823.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605 [2024-06-10 21:00:18,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.05 | bwd_microstep: 1435.15 | bwd_inner_microstep: 1435.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3787 [2024-06-10 21:00:20,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.05 | bwd_microstep: 1695.96 | bwd_inner_microstep: 1695.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2189 [2024-06-10 21:00:21,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 908.33 | bwd_inner_microstep: 908.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1908 [2024-06-10 21:00:22,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.18 | bwd_microstep: 750.49 | bwd_inner_microstep: 750.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 21:00:25,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1607.08 | bwd_inner_microstep: 1607.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592 [2024-06-10 21:00:27,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.09 | bwd_microstep: 1703.71 | bwd_inner_microstep: 1703.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 21:00:34,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-10 21:00:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 6092.78 | bwd_inner_microstep: 1542.70 | bwd_allreduce_microstep: 4550.01 | step_microstep: 38.55 [2024-06-10 21:00:34,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15240.43 | bwd: 45382.13 | bwd_inner: 40831.19 | bwd_allreduce: 4550.25 | step: 40.08 {'loss': 1.1832, 'learning_rate': 9.676726377019296e-06, 'epoch': 0.68} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 21:00:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1339.01 | bwd_inner_microstep: 1338.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2966 [2024-06-10 21:00:37,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.03 | bwd_microstep: 1141.31 | bwd_inner_microstep: 1141.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4384 [2024-06-10 21:00:39,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 668.23 | bwd_microstep: 1808.82 | bwd_inner_microstep: 1808.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 21:00:41,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.09 | bwd_microstep: 1279.27 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 21:00:43,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.61 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3738 [2024-06-10 21:00:45,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1334.11 | bwd_inner_microstep: 1334.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 21:00:47,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1383.40 | bwd_inner_microstep: 1383.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 21:00:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.75 | bwd_microstep: 1281.15 | bwd_inner_microstep: 1281.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708 [2024-06-10 21:00:50,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1329.95 | bwd_inner_microstep: 1329.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 21:00:52,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.71 | bwd_microstep: 1159.87 | bwd_inner_microstep: 1159.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 21:00:54,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1157.19 | bwd_inner_microstep: 1157.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 21:00:55,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.29 | bwd_microstep: 1283.63 | bwd_inner_microstep: 1283.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 21:00:57,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.35 | bwd_microstep: 1254.47 | bwd_inner_microstep: 1254.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 21:00:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.26 | bwd_microstep: 1382.57 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 21:01:01,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.78 | bwd_microstep: 1383.63 | bwd_inner_microstep: 1383.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505 [2024-06-10 21:01:03,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.38 | bwd_microstep: 1551.67 | bwd_inner_microstep: 1551.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3606 [2024-06-10 21:01:05,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1481.21 | bwd_inner_microstep: 1481.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-10 21:01:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1417.61 | bwd_inner_microstep: 1417.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3635 [2024-06-10 21:01:10,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.15 | bwd_microstep: 1813.55 | bwd_inner_microstep: 1813.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829 [2024-06-10 21:01:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.71 | bwd_microstep: 1756.92 | bwd_inner_microstep: 1756.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 21:01:13,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.81 | bwd_microstep: 698.90 | bwd_inner_microstep: 698.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2373 [2024-06-10 21:01:14,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 901.46 | bwd_inner_microstep: 901.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3455 [2024-06-10 21:01:16,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.09 | bwd_microstep: 1192.29 | bwd_inner_microstep: 1192.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 21:01:18,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1253.24 | bwd_inner_microstep: 1253.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3795 [2024-06-10 21:01:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.70 | bwd_microstep: 1354.93 | bwd_inner_microstep: 1354.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2522 [2024-06-10 21:01:21,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.88 | bwd_microstep: 1059.66 | bwd_inner_microstep: 1059.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3485 [2024-06-10 21:01:23,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1335.84 | bwd_inner_microstep: 1335.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 21:01:25,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1398.09 | bwd_inner_microstep: 1398.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3552 [2024-06-10 21:01:26,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1282.15 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 21:01:28,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.70 | bwd_microstep: 1406.44 | bwd_inner_microstep: 1406.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 21:01:31,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1595.58 | bwd_inner_microstep: 1595.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-10 21:01:35,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 21:01:35,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.31 | bwd_microstep: 4015.07 | bwd_inner_microstep: 1798.60 | bwd_allreduce_microstep: 2216.41 | step_microstep: 39.38 [2024-06-10 21:01:35,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.83 | bwd: 45313.95 | bwd_inner: 43096.64 | bwd_allreduce: 2216.64 | step: 41.17 ████▊ | 1174/1726 [20:19:08<9:22:03, 61.09s/it] 68%|██████▊ | 1174/1726 [20:19:08<9:22:03, 61.09s/it] 68%|██████▊ | 1175/1726 [20:20:07<9:17:25, 60.70s/it] 68%|██████▊ | 1175/1726 [20:20:07<9:17:25, 60.70s/it] 68%|██████▊ | 1176/1726 [20:21:08<9:16:35, 60.72s/it] 68%|██████▊ | 1176/1726 [20:21:08<9:16:35, 60.72s/it] 68%|██████▊ | 1177/1726 [20:22:09<9:16:30, 60.82s/it] 68%|██████▊ | 1177/1726 [20:22:09<9:16:30, 60.82s/it] 68%|██████▊ | 1178/1726 [20:23:10<9:15:51, 60.86s/it] 68%|██████▊ | 1178/1726 [20:23:10<9:15:51, 60.86s/it] 68%|██████▊ | 1179{'loss': 1.1488, 'learning_rate': 9.644597113127206e-06, 'epoch': 0.68} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 21:01:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.55 | bwd_microstep: 1331.73 | bwd_inner_microstep: 1331.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908 [2024-06-10 21:01:39,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1586.63 | bwd_inner_microstep: 1586.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 21:01:41,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1550.69 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.37 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4160 [2024-06-10 21:01:44,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.81 | bwd_microstep: 1641.57 | bwd_inner_microstep: 1641.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 21:01:46,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1354.26 | bwd_inner_microstep: 1354.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-10 21:01:48,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.93 | bwd_microstep: 1538.50 | bwd_inner_microstep: 1538.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-10 21:01:50,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929 [2024-06-10 21:01:51,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.90 | bwd_microstep: 727.23 | bwd_inner_microstep: 727.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442 [2024-06-10 21:01:53,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.04 | bwd_microstep: 1450.59 | bwd_inner_microstep: 1450.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2131 [2024-06-10 21:01:54,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.57 | bwd_microstep: 863.85 | bwd_inner_microstep: 863.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-10 21:01:56,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1486.39 | bwd_inner_microstep: 1486.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 21:01:58,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 21:01:59,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.31 | bwd_microstep: 794.95 | bwd_inner_microstep: 794.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691 [2024-06-10 21:02:01,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.50 | bwd_microstep: 1629.54 | bwd_inner_microstep: 1629.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-10 21:02:03,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.27 | bwd_microstep: 1464.42 | bwd_inner_microstep: 1464.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3645 [2024-06-10 21:02:06,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.59 | bwd_microstep: 1560.78 | bwd_inner_microstep: 1560.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 21:02:07,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1380.31 | bwd_inner_microstep: 1380.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2416 [2024-06-10 21:02:09,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.55 | bwd_microstep: 842.84 | bwd_inner_microstep: 842.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 21:02:10,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1283.34 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 21:02:12,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1284.38 | bwd_inner_microstep: 1284.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 21:02:14,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.31 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 21:02:16,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1252.18 | bwd_inner_microstep: 1252.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 21:02:18,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1402.92 | bwd_inner_microstep: 1402.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-10 21:02:20,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1412.67 | bwd_inner_microstep: 1412.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 21:02:21,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 21:02:23,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1382.27 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3561 [2024-06-10 21:02:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1365.71 | bwd_inner_microstep: 1365.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 21:02:27,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.49 | bwd_microstep: 1341.38 | bwd_inner_microstep: 1341.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-10 21:02:29,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.10 | bwd_microstep: 1305.94 | bwd_inner_microstep: 1305.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412 [2024-06-10 21:02:31,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.20 | bwd_microstep: 1309.84 | bwd_inner_microstep: 1309.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2046 [2024-06-10 21:02:32,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.67 | bwd_microstep: 953.78 | bwd_inner_microstep: 953.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 21:02:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-10 21:02:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 2158.09 | bwd_inner_microstep: 1815.43 | bwd_allreduce_microstep: 342.60 | step_microstep: 38.88 [2024-06-10 21:02:35,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15975.29 | bwd: 43253.73 | bwd_inner: 42910.18 | bwd_allreduce: 342.83 | step: 40.80 {'loss': 1.1653, 'learning_rate': 9.612504320910249e-06, 'epoch': 0.68} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 21:02:37,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.24 | bwd_microstep: 1471.83 | bwd_inner_microstep: 1471.61 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.16 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2339 [2024-06-10 21:02:38,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.75 | bwd_microstep: 949.92 | bwd_inner_microstep: 949.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 21:02:40,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.54 | bwd_microstep: 1340.79 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430 [2024-06-10 21:02:42,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.91 | bwd_microstep: 1180.94 | bwd_inner_microstep: 1180.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4088 [2024-06-10 21:02:44,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.44 | bwd_microstep: 1625.23 | bwd_inner_microstep: 1625.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-10 21:02:46,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.84 | bwd_microstep: 1149.25 | bwd_inner_microstep: 1149.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 21:02:47,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1248.45 | bwd_inner_microstep: 1248.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 21:02:49,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3496 [2024-06-10 21:02:51,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1415.50 | bwd_inner_microstep: 1415.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583 [2024-06-10 21:02:53,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1453.09 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3708 [2024-06-10 21:02:55,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.42 | bwd_microstep: 1694.69 | bwd_inner_microstep: 1694.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674 [2024-06-10 21:02:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.96 | bwd_microstep: 1584.58 | bwd_inner_microstep: 1584.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 21:03:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 21:03:01,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1286.53 | bwd_inner_microstep: 1286.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 21:03:03,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1248.93 | bwd_inner_microstep: 1248.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 21:03:05,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.96 | bwd_microstep: 1285.38 | bwd_inner_microstep: 1285.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806 [2024-06-10 21:03:07,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.63 | bwd_microstep: 1754.26 | bwd_inner_microstep: 1754.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643 [2024-06-10 21:03:09,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.16 | bwd_inner_microstep: 1446.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 21:03:11,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1492.73 | bwd_inner_microstep: 1492.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2296 [2024-06-10 21:03:13,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 977.66 | bwd_inner_microstep: 977.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 21:03:15,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1293.95 | bwd_inner_microstep: 1293.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658 [2024-06-10 21:03:17,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1521.53 | bwd_inner_microstep: 1521.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 21:03:19,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1511.00 | bwd_inner_microstep: 1510.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 21:03:21,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1411.79 | bwd_inner_microstep: 1411.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2192 [2024-06-10 21:03:22,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.10 | bwd_microstep: 771.17 | bwd_inner_microstep: 771.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 21:03:24,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1289.74 | bwd_inner_microstep: 1289.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-10 21:03:26,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 21:03:28,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566 [2024-06-10 21:03:30,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1596.90 | bwd_inner_microstep: 1596.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-10 21:03:32,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.37 | bwd_microstep: 1543.73 | bwd_inner_microstep: 1543.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-10 21:03:34,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1310.23 | bwd_inner_microstep: 1310.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 21:03:36,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.65 [2024-06-10 21:03:36,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.78 | bwd_microstep: 1585.16 | bwd_inner_microstep: 1577.27 | bwd_allreduce_microstep: 7.83 | step_microstep: 38.69 [2024-06-10 21:03:36,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16499.18 | bwd: 44179.88 | bwd_inner: 44170.98 | bwd_allreduce: 8.15 | step: 40.27 {'loss': 1.2132, 'learning_rate': 9.580448113399069e-06, 'epoch': 0.68} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-10 21:03:37,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 899.53 | bwd_inner_microstep: 899.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:03:39,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.73 | bwd_microstep: 1374.96 | bwd_inner_microstep: 1374.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-10 21:03:41,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.12 | bwd_microstep: 1243.76 | bwd_inner_microstep: 1243.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482 [2024-06-10 21:03:43,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.53 | bwd_microstep: 1313.83 | bwd_inner_microstep: 1313.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 21:03:45,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.11 | bwd_microstep: 1482.02 | bwd_inner_microstep: 1481.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 21:03:47,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1492.56 | bwd_inner_microstep: 1492.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-10 21:03:48,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1280.10 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-10 21:03:51,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1484.47 | bwd_inner_microstep: 1484.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 21:03:53,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.54 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 21:03:54,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1388.77 | bwd_inner_microstep: 1388.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507 [2024-06-10 21:03:56,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1315.77 | bwd_inner_microstep: 1315.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3657 [2024-06-10 21:03:58,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.56 | bwd_microstep: 1322.06 | bwd_inner_microstep: 1322.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-10 21:04:00,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.45 | bwd_microstep: 1618.06 | bwd_inner_microstep: 1618.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 819 [2024-06-10 21:04:01,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 131.30 | bwd_microstep: 341.46 | bwd_inner_microstep: 341.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 21:04:03,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1350.72 | bwd_inner_microstep: 1350.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2887 [2024-06-10 21:04:04,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.97 | bwd_microstep: 1184.76 | bwd_inner_microstep: 1184.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 21:04:06,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1377.32 | bwd_inner_microstep: 1377.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 21:04:08,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.70 | bwd_microstep: 1475.97 | bwd_inner_microstep: 1475.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 21:04:10,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.10 | bwd_microstep: 1389.00 | bwd_inner_microstep: 1388.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3481 [2024-06-10 21:04:12,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.13 | bwd_microstep: 1243.04 | bwd_inner_microstep: 1243.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 21:04:14,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1293.93 | bwd_inner_microstep: 1293.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 21:04:16,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.29 | bwd_microstep: 1558.41 | bwd_inner_microstep: 1558.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2252 [2024-06-10 21:04:17,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.99 | bwd_microstep: 872.97 | bwd_inner_microstep: 872.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 21:04:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1399.92 | bwd_inner_microstep: 1399.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-10 21:04:21,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.73 | bwd_microstep: 1411.70 | bwd_inner_microstep: 1411.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 21:04:23,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1398.87 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 21:04:25,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1461.77 | bwd_inner_microstep: 1461.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 21:04:27,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 1385.27 | bwd_inner_microstep: 1385.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-10 21:04:29,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.41 | bwd_microstep: 1322.26 | bwd_inner_microstep: 1322.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-10 21:04:31,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.09 | bwd_microstep: 1493.01 | bwd_inner_microstep: 1492.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3811 [2024-06-10 21:04:33,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.63 | bwd_microstep: 1856.40 | bwd_inner_microstep: 1856.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-10 21:04:38,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.58 [2024-06-10 21:04:38,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 3730.16 | bwd_inner_microstep: 1698.38 | bwd_allreduce_microstep: 2031.72 | step_microstep: 38.92 [2024-06-10 21:04:38,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16142.78 | bwd: 45242.68 | bwd_inner: 43210.01 | bwd_allreduce: 2031.97 | step: 40.40 {'loss': 1.2194, 'learning_rate': 9.54842860349548e-06, 'epoch': 0.68} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 21:04:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.31 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 21:04:42,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.87 | bwd_microstep: 1486.80 | bwd_inner_microstep: 1486.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 21:04:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1406.05 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865 [2024-06-10 21:04:46,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1523.99 | bwd_inner_microstep: 1523.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-10 21:04:47,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.13 | bwd_microstep: 679.52 | bwd_inner_microstep: 679.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 21:04:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3778 [2024-06-10 21:04:50,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.62 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1347.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 21:04:52,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.01 | bwd_microstep: 1148.47 | bwd_inner_microstep: 1148.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-10 21:04:54,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 21:04:56,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1250.36 | bwd_inner_microstep: 1250.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 21:04:57,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1249.77 | bwd_inner_microstep: 1249.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069 [2024-06-10 21:04:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.62 | bwd_microstep: 821.25 | bwd_inner_microstep: 821.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1963 [2024-06-10 21:05:00,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.25 | bwd_microstep: 856.88 | bwd_inner_microstep: 856.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 21:05:02,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.46 | bwd_microstep: 1456.08 | bwd_inner_microstep: 1456.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-10 21:05:04,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.51 | bwd_microstep: 1519.60 | bwd_inner_microstep: 1519.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 21:05:06,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.53 | bwd_microstep: 1405.05 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-10 21:05:07,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.05 | bwd_microstep: 891.12 | bwd_inner_microstep: 891.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 21:05:09,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1504.02 | bwd_inner_microstep: 1503.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 21:05:11,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1392.91 | bwd_inner_microstep: 1392.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2730 [2024-06-10 21:05:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.82 | bwd_microstep: 1042.70 | bwd_inner_microstep: 1042.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-10 21:05:15,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.53 | bwd_microstep: 1652.47 | bwd_inner_microstep: 1652.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 21:05:17,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.02 | bwd_microstep: 1434.94 | bwd_inner_microstep: 1434.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 21:05:18,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 800.96 | bwd_inner_microstep: 800.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2168 [2024-06-10 21:05:19,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.64 | bwd_microstep: 951.78 | bwd_inner_microstep: 951.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3685 [2024-06-10 21:05:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1359.18 | bwd_inner_microstep: 1359.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 21:05:23,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.76 | bwd_inner_microstep: 1399.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1384 [2024-06-10 21:05:24,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 233.20 | bwd_microstep: 620.90 | bwd_inner_microstep: 620.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3716 [2024-06-10 21:05:26,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.21 | bwd_microstep: 1729.83 | bwd_inner_microstep: 1729.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055 [2024-06-10 21:05:27,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.75 | bwd_microstep: 912.68 | bwd_inner_microstep: 912.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 21:05:29,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.15 | bwd_microstep: 1252.68 | bwd_inner_microstep: 1252.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 21:05:31,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1403.22 | bwd_inner_microstep: 1403.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3580 [2024-06-10 21:05:36,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-10 21:05:36,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 4567.55 | bwd_inner_microstep: 1724.96 | bwd_allreduce_microstep: 2842.54 | step_microstep: 37.85 [2024-06-10 21:05:36,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15065.25 | bwd: 43224.64 | bwd_inner: 40381.07 | bwd_allreduce: 2842.84 | step: 40.63 {'loss': 1.1996, 'learning_rate': 9.516445903972005e-06, 'epoch': 0.69} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 21:05:37,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 796.80 | bwd_inner_microstep: 796.67 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2639 [2024-06-10 21:05:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.05 | bwd_microstep: 1017.79 | bwd_inner_microstep: 1017.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889 [2024-06-10 21:05:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.38 | bwd_microstep: 1583.52 | bwd_inner_microstep: 1583.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 21:05:43,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.02 | bwd_microstep: 1281.48 | bwd_inner_microstep: 1281.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3758 [2024-06-10 21:05:45,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1303.86 | bwd_inner_microstep: 1303.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586 [2024-06-10 21:05:47,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.34 | bwd_microstep: 1435.23 | bwd_inner_microstep: 1435.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-10 21:05:49,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1546.11 | bwd_inner_microstep: 1546.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-10 21:05:50,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.81 | bwd_microstep: 1151.50 | bwd_inner_microstep: 1151.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-10 21:05:52,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.78 | bwd_microstep: 1292.54 | bwd_inner_microstep: 1292.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2640 [2024-06-10 21:05:53,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.10 | bwd_microstep: 1022.07 | bwd_inner_microstep: 1022.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 21:05:55,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1248.69 | bwd_inner_microstep: 1248.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 21:05:57,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1497.49 | bwd_inner_microstep: 1497.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-10 21:05:59,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1510.53 | bwd_inner_microstep: 1510.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 21:06:01,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.25 | bwd_microstep: 1482.85 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-10 21:06:03,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 1418.00 | bwd_inner_microstep: 1417.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 21:06:05,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1406.75 | bwd_inner_microstep: 1406.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-10 21:06:07,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1297.90 | bwd_inner_microstep: 1297.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 21:06:09,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1253.94 | bwd_inner_microstep: 1253.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-10 21:06:11,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1520.48 | bwd_inner_microstep: 1520.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-10 21:06:12,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.34 | bwd_microstep: 981.70 | bwd_inner_microstep: 981.53 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 21:06:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1396.56 | bwd_inner_microstep: 1396.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3447 [2024-06-10 21:06:16,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1192.07 | bwd_inner_microstep: 1192.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 21:06:18,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.63 | bwd_microstep: 1394.86 | bwd_inner_microstep: 1394.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 21:06:19,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.23 | bwd_microstep: 805.39 | bwd_inner_microstep: 805.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 21:06:21,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.78 | bwd_microstep: 1311.43 | bwd_inner_microstep: 1311.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2943 [2024-06-10 21:06:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1162.04 | bwd_inner_microstep: 1162.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 21:06:24,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.61 | bwd_microstep: 1497.28 | bwd_inner_microstep: 1497.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3823 [2024-06-10 21:06:27,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.22 | bwd_microstep: 1684.84 | bwd_inner_microstep: 1684.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-10 21:06:28,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 789.86 | bwd_inner_microstep: 789.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2605 [2024-06-10 21:06:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.86 | bwd_microstep: 1026.97 | bwd_inner_microstep: 1026.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 21:06:31,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.51 | bwd_microstep: 1498.57 | bwd_inner_microstep: 1498.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574 [2024-06-10 21:06:36,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60 [2024-06-10 21:06:36,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.21 | bwd_microstep: 4599.53 | bwd_inner_microstep: 1708.81 | bwd_allreduce_microstep: 2890.67 | step_microstep: 39.09 [2024-06-10 21:06:36,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15470.56 | bwd: 44408.68 | bwd_inner: 41516.87 | bwd_allreduce: 2891.02 | step: 40.67 /1726 [20:24:12<9:17:19, 61.13s/it] 68%|██████▊ | 1179/1726 [20:24:12<9:17:19, 61.13s/it] 68%|██████▊ | 1180/1726 [20:25:12<9:12:05, 60.67s/it] 68%|██████▊ | 1180/1726 [20:25:12<9:12:05, 60.67s/it] 68%|██████▊ | 1181/1726 [20:26:13<9:12:02, 60.78s/it] 68%|██████▊ | 1181/1726 [20:26:13<9:12:02, 60.78s/it] 68%|██████▊ | 1182/1726 [20:27:14<9:13:37, 61.06s/it] 68%|██████▊ | 1182/1726 [20:27:14<9:13:37, 61.06s/it] 69%|██████▊ | 1183/1726 [20:28:13<9:06:01, 60.33s/it] 69%|██████▊ | 1183/1726 [20:28:13<9:06:01, 60.33s/it] 69%|██████▊ | 1184/1726 [20:29:13<9:04:41,{'loss': 1.1969, 'learning_rate': 9.484500127471562e-06, 'epoch': 0.69} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460 [2024-06-10 21:06:38,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1421.80 | bwd_inner_microstep: 1421.72 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 21:06:40,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.03 | bwd_microstep: 1278.67 | bwd_inner_microstep: 1278.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 21:06:42,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1280.99 | bwd_inner_microstep: 1280.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1344 [2024-06-10 21:06:43,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.85 | bwd_microstep: 514.02 | bwd_inner_microstep: 513.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3777 [2024-06-10 21:06:45,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.37 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-10 21:06:46,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.68 | bwd_microstep: 1284.09 | bwd_inner_microstep: 1284.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 21:06:48,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.57 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 21:06:50,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1632.76 | bwd_inner_microstep: 1632.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 21:06:52,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1386.74 | bwd_inner_microstep: 1386.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3696 [2024-06-10 21:06:54,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1327.30 | bwd_inner_microstep: 1327.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078 [2024-06-10 21:06:55,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.71 | bwd_microstep: 849.61 | bwd_inner_microstep: 849.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 21:06:57,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.30 | bwd_microstep: 1380.98 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 21:07:00,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.19 | bwd_microstep: 1716.74 | bwd_inner_microstep: 1716.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-10 21:07:01,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.75 | bwd_microstep: 1313.46 | bwd_inner_microstep: 1313.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 21:07:03,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1392.10 | bwd_inner_microstep: 1392.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690 [2024-06-10 21:07:05,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1332.72 | bwd_inner_microstep: 1332.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3637 [2024-06-10 21:07:08,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.85 | bwd_microstep: 1659.42 | bwd_inner_microstep: 1659.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3451 [2024-06-10 21:07:09,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.32 | bwd_microstep: 1317.75 | bwd_inner_microstep: 1317.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876 [2024-06-10 21:07:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.50 | bwd_microstep: 1588.62 | bwd_inner_microstep: 1588.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 21:07:13,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.41 | bwd_microstep: 696.02 | bwd_inner_microstep: 696.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-10 21:07:14,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 807.99 | bwd_inner_microstep: 807.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-10 21:07:16,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1488.43 | bwd_inner_microstep: 1488.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-10 21:07:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.73 | bwd_microstep: 1299.34 | bwd_inner_microstep: 1299.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 21:07:19,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1456.98 | bwd_inner_microstep: 1456.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-10 21:07:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.10 | bwd_microstep: 1396.46 | bwd_inner_microstep: 1396.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3725 [2024-06-10 21:07:24,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1561.07 | bwd_inner_microstep: 1561.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3819 [2024-06-10 21:07:26,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.36 | bwd_microstep: 1821.93 | bwd_inner_microstep: 1821.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3812 [2024-06-10 21:07:28,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.59 | bwd_microstep: 1619.54 | bwd_inner_microstep: 1619.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3699 [2024-06-10 21:07:30,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.36 | bwd_microstep: 1450.61 | bwd_inner_microstep: 1450.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 21:07:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1339.45 | bwd_inner_microstep: 1339.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-10 21:07:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1315.10 | bwd_inner_microstep: 1315.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 21:07:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.23 | optimizer_step: 6.59 [2024-06-10 21:07:37,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 2415.67 | bwd_inner_microstep: 1729.70 | bwd_allreduce_microstep: 685.92 | step_microstep: 39.06 [2024-06-10 21:07:37,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.29 | bwd: 44045.94 | bwd_inner: 43359.04 | bwd_allreduce: 686.19 | step: 40.68 {'loss': 1.1522, 'learning_rate': 9.452591386506999e-06, 'epoch': 0.69} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-10 21:07:39,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.85 | bwd_microstep: 1523.61 | bwd_inner_microstep: 1523.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3910 [2024-06-10 21:07:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.95 | bwd_microstep: 1692.24 | bwd_inner_microstep: 1692.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3469 [2024-06-10 21:07:43,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.94 | bwd_microstep: 1181.03 | bwd_inner_microstep: 1181.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 21:07:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1479.26 | bwd_inner_microstep: 1479.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4318 [2024-06-10 21:07:48,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.08 | bwd_microstep: 1782.64 | bwd_inner_microstep: 1782.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-10 21:07:49,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-10 21:07:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1492.10 | bwd_inner_microstep: 1492.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2257 [2024-06-10 21:07:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 779.16 | bwd_inner_microstep: 779.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-10 21:07:54,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.21 | bwd_microstep: 814.94 | bwd_inner_microstep: 814.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3627 [2024-06-10 21:07:55,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.55 | bwd_microstep: 1250.94 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436 [2024-06-10 21:07:57,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.77 | bwd_microstep: 1280.55 | bwd_inner_microstep: 1280.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3405 [2024-06-10 21:07:59,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.88 | bwd_microstep: 1207.26 | bwd_inner_microstep: 1207.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 21:08:01,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1479.68 | bwd_inner_microstep: 1479.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 21:08:03,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1515.96 | bwd_inner_microstep: 1515.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3520 [2024-06-10 21:08:05,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1446.24 | bwd_inner_microstep: 1446.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3501 [2024-06-10 21:08:07,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.36 | bwd_microstep: 1187.57 | bwd_inner_microstep: 1187.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514 [2024-06-10 21:08:08,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1416.27 | bwd_inner_microstep: 1416.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 919 [2024-06-10 21:08:09,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.98 | bwd_microstep: 375.03 | bwd_inner_microstep: 375.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-10 21:08:11,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.49 | bwd_microstep: 1420.29 | bwd_inner_microstep: 1420.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-10 21:08:12,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.29 | bwd_microstep: 802.16 | bwd_inner_microstep: 802.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 21:08:14,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1491.89 | bwd_inner_microstep: 1491.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483 [2024-06-10 21:08:16,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.40 | bwd_microstep: 1347.07 | bwd_inner_microstep: 1347.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-10 21:08:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1496.69 | bwd_inner_microstep: 1496.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 21:08:20,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1509.23 | bwd_inner_microstep: 1509.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 21:08:21,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.15 | bwd_microstep: 698.38 | bwd_inner_microstep: 698.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 21:08:23,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1350.44 | bwd_inner_microstep: 1350.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 21:08:25,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1621.30 | bwd_inner_microstep: 1621.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-10 21:08:27,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1605.25 | bwd_inner_microstep: 1605.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3803 [2024-06-10 21:08:30,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.76 | bwd_microstep: 1535.69 | bwd_inner_microstep: 1535.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-10 21:08:32,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.97 | bwd_microstep: 1639.11 | bwd_inner_microstep: 1639.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3772 [2024-06-10 21:08:34,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.30 | bwd_microstep: 1579.07 | bwd_inner_microstep: 1579.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273 [2024-06-10 21:08:36,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 21:08:36,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.58 | bwd_microstep: 2128.11 | bwd_inner_microstep: 993.03 | bwd_allreduce_microstep: 1135.04 | step_microstep: 37.66 [2024-06-10 21:08:36,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15747.60 | bwd: 43374.18 | bwd_inner: 42238.24 | bwd_allreduce: 1135.27 | step: 39.09 {'loss': 1.2125, 'learning_rate': 9.420719793460758e-06, 'epoch': 0.69} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 21:08:38,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1368.46 | bwd_inner_microstep: 1368.33 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3864 [2024-06-10 21:08:40,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.42 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-10 21:08:43,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1550.94 | bwd_inner_microstep: 1550.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 21:08:44,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1341.67 | bwd_inner_microstep: 1341.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3509 [2024-06-10 21:08:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.38 | bwd_microstep: 1191.90 | bwd_inner_microstep: 1191.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4020 [2024-06-10 21:08:48,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1614.06 | bwd_inner_microstep: 1614.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 21:08:50,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1283.19 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3538 [2024-06-10 21:08:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1296.84 | bwd_inner_microstep: 1296.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 21:08:53,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 801.16 | bwd_inner_microstep: 801.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-10 21:08:55,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.94 | bwd_microstep: 1310.78 | bwd_inner_microstep: 1310.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3499 [2024-06-10 21:08:57,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.17 | bwd_microstep: 1333.70 | bwd_inner_microstep: 1333.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595 [2024-06-10 21:08:59,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1371.42 | bwd_inner_microstep: 1371.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 21:09:01,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1514.79 | bwd_inner_microstep: 1514.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 21:09:03,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 21:09:05,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1589.88 | bwd_inner_microstep: 1589.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524 [2024-06-10 21:09:07,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.14 | bwd_microstep: 1582.13 | bwd_inner_microstep: 1582.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401 [2024-06-10 21:09:09,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1437.38 | bwd_inner_microstep: 1437.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3518 [2024-06-10 21:09:11,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1253.55 | bwd_inner_microstep: 1253.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-10 21:09:12,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 802.72 | bwd_inner_microstep: 802.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-10 21:09:14,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.59 | bwd_microstep: 1628.59 | bwd_inner_microstep: 1628.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492 [2024-06-10 21:09:16,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.73 | bwd_microstep: 1190.04 | bwd_inner_microstep: 1190.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-10 21:09:17,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.06 | bwd_microstep: 1201.22 | bwd_inner_microstep: 1201.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-10 21:09:20,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1621.08 | bwd_inner_microstep: 1621.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-10 21:09:22,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 2290.28 | bwd_inner_microstep: 2290.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809 [2024-06-10 21:09:24,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.03 | bwd_microstep: 1487.48 | bwd_inner_microstep: 1487.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 21:09:27,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1657.22 | bwd_inner_microstep: 1657.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 21:09:29,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1500.00 | bwd_inner_microstep: 1499.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 21:09:31,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 21:09:32,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.99 | bwd_microstep: 974.80 | bwd_inner_microstep: 974.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584 [2024-06-10 21:09:34,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3589 [2024-06-10 21:09:37,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.88 | bwd_microstep: 1699.76 | bwd_inner_microstep: 1699.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-10 21:09:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.11 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-10 21:09:39,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.72 | bwd_microstep: 1441.03 | bwd_inner_microstep: 1433.34 | bwd_allreduce_microstep: 7.65 | step_microstep: 39.21 [2024-06-10 21:09:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16587.56 | bwd: 45133.25 | bwd_inner: 45124.61 | bwd_allreduce: 7.92 | step: 40.75 {'loss': 1.1793, 'learning_rate': 9.388885460584392e-06, 'epoch': 0.69} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 21:09:40,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.88 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 21:09:42,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1244.04 | bwd_inner_microstep: 1244.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2331 [2024-06-10 21:09:43,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.52 | bwd_microstep: 886.66 | bwd_inner_microstep: 886.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 21:09:45,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.33 | bwd_microstep: 1556.18 | bwd_inner_microstep: 1556.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-10 21:09:47,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1245.35 | bwd_inner_microstep: 1245.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-10 21:09:49,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1249.57 | bwd_inner_microstep: 1249.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 21:09:51,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.49 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 21:09:53,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-10 21:09:55,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1421.39 | bwd_inner_microstep: 1421.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689 [2024-06-10 21:09:57,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1624.90 | bwd_inner_microstep: 1624.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-10 21:09:59,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 21:10:01,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 21:10:02,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1247.94 | bwd_inner_microstep: 1247.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-10 21:10:03,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.92 | bwd_microstep: 796.22 | bwd_inner_microstep: 796.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-10 21:10:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1489.48 | bwd_inner_microstep: 1489.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3843 [2024-06-10 21:10:08,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.43 | bwd_microstep: 1792.13 | bwd_inner_microstep: 1792.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431 [2024-06-10 21:10:10,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.81 | bwd_microstep: 1373.62 | bwd_inner_microstep: 1373.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 21:10:12,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1551.02 | bwd_inner_microstep: 1550.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-10 21:10:14,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.05 | bwd_microstep: 1553.93 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3618 [2024-06-10 21:10:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.67 | bwd_microstep: 1342.53 | bwd_inner_microstep: 1342.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 21:10:18,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.58 | bwd_microstep: 1382.17 | bwd_inner_microstep: 1382.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3689 [2024-06-10 21:10:20,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.62 | bwd_microstep: 1591.07 | bwd_inner_microstep: 1591.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3543 [2024-06-10 21:10:22,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1554.79 | bwd_inner_microstep: 1554.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2134 [2024-06-10 21:10:23,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 767.87 | bwd_inner_microstep: 767.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3604 [2024-06-10 21:10:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.90 | bwd_microstep: 1648.37 | bwd_inner_microstep: 1648.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 21:10:28,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1398.24 | bwd_inner_microstep: 1398.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-10 21:10:30,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1424.93 | bwd_inner_microstep: 1424.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2018 [2024-06-10 21:10:31,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.85 | bwd_microstep: 902.45 | bwd_inner_microstep: 902.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3723 [2024-06-10 21:10:33,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.78 | bwd_microstep: 1732.43 | bwd_inner_microstep: 1732.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-10 21:10:35,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.62 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580 [2024-06-10 21:10:37,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1300.71 | bwd_inner_microstep: 1300.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2441 [2024-06-10 21:10:40,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.04 | optimizer_step: 6.61 [2024-06-10 21:10:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 3072.55 | bwd_inner_microstep: 1189.50 | bwd_allreduce_microstep: 1883.00 | step_microstep: 37.78 [2024-06-10 21:10:40,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16161.63 | bwd: 45201.58 | bwd_inner: 43317.69 | bwd_allreduce: 1883.23 | step: 39.29 {'loss': 1.1949, 'learning_rate': 9.35708849999828e-06, 'epoch': 0.69} dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3403 [2024-06-10 21:10:42,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.32 | bwd_microstep: 1451.94 | bwd_inner_microstep: 1451.85 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 21:10:44,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.24 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824 [2024-06-10 21:10:46,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.39 | bwd_microstep: 1355.22 | bwd_inner_microstep: 1355.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-10 21:10:48,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.93 | bwd_microstep: 1396.30 | bwd_inner_microstep: 1396.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 21:10:50,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946 [2024-06-10 21:10:51,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.70 | bwd_microstep: 727.36 | bwd_inner_microstep: 727.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 21:10:53,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.27 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2401 [2024-06-10 21:10:54,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.97 | bwd_microstep: 937.25 | bwd_inner_microstep: 937.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3448 [2024-06-10 21:10:56,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.52 | bwd_microstep: 1184.82 | bwd_inner_microstep: 1184.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 21:10:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.76 | bwd_microstep: 799.64 | bwd_inner_microstep: 799.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487 [2024-06-10 21:10:59,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.56 | bwd_microstep: 1315.44 | bwd_inner_microstep: 1315.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2132 [2024-06-10 21:11:00,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.43 | bwd_microstep: 888.95 | bwd_inner_microstep: 888.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1972 [2024-06-10 21:11:01,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 703.36 | bwd_inner_microstep: 703.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1846 [2024-06-10 21:11:02,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 258.17 | bwd_microstep: 671.46 | bwd_inner_microstep: 671.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 21:11:03,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1296.01 | bwd_inner_microstep: 1295.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515 [2024-06-10 21:11:05,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1324.27 | bwd_inner_microstep: 1324.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3433 [2024-06-10 21:11:07,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.76 | bwd_microstep: 1309.09 | bwd_inner_microstep: 1309.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3714 [2024-06-10 21:11:09,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1428.46 | bwd_inner_microstep: 1428.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 21:11:11,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-10 21:11:13,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.20 | bwd_microstep: 1631.92 | bwd_inner_microstep: 1631.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3681 [2024-06-10 21:11:15,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1379.04 | bwd_inner_microstep: 1379.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 21:11:17,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.71 | bwd_microstep: 1290.77 | bwd_inner_microstep: 1290.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2443 [2024-06-10 21:11:19,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.47 | bwd_microstep: 1046.41 | bwd_inner_microstep: 1046.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 21:11:21,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1504.41 | bwd_inner_microstep: 1504.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-10 21:11:23,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.34 | bwd_microstep: 1498.14 | bwd_inner_microstep: 1498.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-10 21:11:24,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.24 | bwd_microstep: 811.27 | bwd_inner_microstep: 811.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3596 [2024-06-10 21:11:26,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.63 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3640 [2024-06-10 21:11:28,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1531.60 | bwd_inner_microstep: 1531.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044 [2024-06-10 21:11:29,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.98 | bwd_microstep: 810.57 | bwd_inner_microstep: 810.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-10 21:11:31,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.42 | bwd_microstep: 1452.25 | bwd_inner_microstep: 1452.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-10 21:11:33,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.52 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 21:11:42,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.31 | optimizer_step: 6.60 [2024-06-10 21:11:42,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 8564.95 | bwd_inner_microstep: 1935.21 | bwd_allreduce_microstep: 6629.67 | step_microstep: 39.94 [2024-06-10 21:11:42,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14917.83 | bwd: 46750.89 | bwd_inner: 40120.21 | bwd_allreduce: 6629.96 | step: 41.57 60.30s/it] 69%|██████▊ | 1184/1726 [20:29:13<9:04:41, 60.30s/it] 69%|██████▊ | 1185/1726 [20:30:14<9:04:24, 60.38s/it] 69%|██████▊ | 1185/1726 [20:30:14<9:04:24, 60.38s/it] 69%|██████▊ | 1186/1726 [20:31:13<9:00:54, 60.10s/it] 69%|██████▊ | 1186/1726 [20:31:13<9:00:54, 60.10s/it] 69%|██████▉ | 1187/1726 [20:32:15<9:05:12, 60.69s/it] 69%|██████▉ | 1187/1726 [20:32:15<9:05:12, 60.69s/it] 69%|██████▉ | 1188/1726 [20:33:17<9:06:55, 61.00s/it] 69%|██████▉ | 1188/1726 [20:33:17<9:06:55, 61.00s/it] 69%|██████▉ | 1189/1726 [20:34:19<9:08:35, 61.30s/it] {'loss': 1.1688, 'learning_rate': 9.325329023691137e-06, 'epoch': 0.69} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 21:11:44,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1230.40 | bwd_inner_microstep: 1230.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4436 [2024-06-10 21:11:46,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.83 | bwd_microstep: 1816.96 | bwd_inner_microstep: 1816.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3857 [2024-06-10 21:11:49,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.33 | bwd_microstep: 1490.18 | bwd_inner_microstep: 1490.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 21:11:50,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1377.12 | bwd_inner_microstep: 1377.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 21:11:52,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1381.48 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 21:11:54,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.81 | bwd_microstep: 1533.98 | bwd_inner_microstep: 1533.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 21:11:56,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1284.30 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 21:11:58,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.09 | bwd_microstep: 1148.00 | bwd_inner_microstep: 1147.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-10 21:11:59,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.94 | bwd_microstep: 1151.13 | bwd_inner_microstep: 1151.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3436 [2024-06-10 21:12:01,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.56 | bwd_microstep: 1185.68 | bwd_inner_microstep: 1185.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2182 [2024-06-10 21:12:02,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.91 | bwd_microstep: 860.12 | bwd_inner_microstep: 860.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 21:12:04,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.22 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-10 21:12:06,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.01 | bwd_microstep: 1531.83 | bwd_inner_microstep: 1531.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-10 21:12:07,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 795.13 | bwd_inner_microstep: 795.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-10 21:12:09,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1408.50 | bwd_inner_microstep: 1408.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 21:12:11,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.61 | bwd_microstep: 1717.83 | bwd_inner_microstep: 1717.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 21:12:14,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1513.01 | bwd_inner_microstep: 1512.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623 [2024-06-10 21:12:16,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.14 | bwd_microstep: 1705.04 | bwd_inner_microstep: 1705.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3619 [2024-06-10 21:12:18,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.84 | bwd_microstep: 1707.38 | bwd_inner_microstep: 1707.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-10 21:12:20,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.75 | bwd_microstep: 1580.92 | bwd_inner_microstep: 1580.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829 [2024-06-10 21:12:23,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.53 | bwd_microstep: 1755.71 | bwd_inner_microstep: 1755.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 21:12:25,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.46 | bwd_microstep: 1292.66 | bwd_inner_microstep: 1292.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-10 21:12:27,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 21:12:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286 [2024-06-10 21:12:30,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.50 | bwd_microstep: 877.57 | bwd_inner_microstep: 877.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3085 [2024-06-10 21:12:32,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.71 | bwd_microstep: 1332.16 | bwd_inner_microstep: 1332.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604 [2024-06-10 21:12:34,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.99 | bwd_microstep: 1534.93 | bwd_inner_microstep: 1534.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 21:12:36,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1399.97 | bwd_inner_microstep: 1399.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 21:12:37,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1280.52 | bwd_inner_microstep: 1280.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2018 [2024-06-10 21:12:38,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.12 | bwd_microstep: 714.04 | bwd_inner_microstep: 714.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3729 [2024-06-10 21:12:40,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.68 | bwd_microstep: 1443.64 | bwd_inner_microstep: 1443.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 21:12:45,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-10 21:12:45,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 3715.08 | bwd_inner_microstep: 1664.10 | bwd_allreduce_microstep: 2050.93 | step_microstep: 38.03 [2024-06-10 21:12:45,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16281.61 | bwd: 45796.09 | bwd_inner: 43744.25 | bwd_allreduce: 2051.16 | step: 39.65 {'loss': 1.2349, 'learning_rate': 9.293607143519685e-06, 'epoch': 0.69} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 21:12:46,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1275.63 | bwd_inner_microstep: 1275.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475 [2024-06-10 21:12:48,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.10 | bwd_microstep: 1344.09 | bwd_inner_microstep: 1344.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3943 [2024-06-10 21:12:51,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.74 | bwd_microstep: 1691.88 | bwd_inner_microstep: 1691.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-10 21:12:53,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1482.72 | bwd_inner_microstep: 1482.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1942 [2024-06-10 21:12:54,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.58 | bwd_microstep: 726.18 | bwd_inner_microstep: 726.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2271 [2024-06-10 21:12:55,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.13 | bwd_microstep: 934.92 | bwd_inner_microstep: 934.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 21:12:57,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1384.98 | bwd_inner_microstep: 1384.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3612 [2024-06-10 21:12:59,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.27 | bwd_microstep: 1214.29 | bwd_inner_microstep: 1214.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2565 [2024-06-10 21:13:00,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.72 | bwd_microstep: 975.75 | bwd_inner_microstep: 975.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436 [2024-06-10 21:13:02,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1310.97 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2339 [2024-06-10 21:13:03,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.73 | bwd_microstep: 894.74 | bwd_inner_microstep: 894.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 21:13:05,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.10 | bwd_microstep: 1485.33 | bwd_inner_microstep: 1485.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 21:13:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.77 | bwd_microstep: 1339.80 | bwd_inner_microstep: 1339.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388 [2024-06-10 21:13:09,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.31 | bwd_microstep: 1339.10 | bwd_inner_microstep: 1339.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3560 [2024-06-10 21:13:11,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.71 | bwd_microstep: 1340.46 | bwd_inner_microstep: 1340.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684 [2024-06-10 21:13:13,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.97 | bwd_microstep: 1721.05 | bwd_inner_microstep: 1721.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 21:13:15,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1350.10 | bwd_inner_microstep: 1350.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3696 [2024-06-10 21:13:17,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.21 | bwd_microstep: 1492.83 | bwd_inner_microstep: 1492.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 21:13:19,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-10 21:13:21,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1541.81 | bwd_inner_microstep: 1541.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1407 [2024-06-10 21:13:22,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.15 | bwd_microstep: 560.42 | bwd_inner_microstep: 560.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3425 [2024-06-10 21:13:23,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3688 [2024-06-10 21:13:25,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1329.29 | bwd_inner_microstep: 1329.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-10 21:13:27,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.75 | bwd_microstep: 1440.62 | bwd_inner_microstep: 1440.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-10 21:13:29,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1301.78 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3491 [2024-06-10 21:13:31,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.84 | bwd_microstep: 1220.39 | bwd_inner_microstep: 1220.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045 [2024-06-10 21:13:32,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.25 | bwd_microstep: 809.68 | bwd_inner_microstep: 809.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 21:13:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1509.92 | bwd_inner_microstep: 1509.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817 [2024-06-10 21:13:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.06 | bwd_microstep: 1615.40 | bwd_inner_microstep: 1615.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435 [2024-06-10 21:13:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.87 | bwd_microstep: 1224.23 | bwd_inner_microstep: 1224.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573 [2024-06-10 21:13:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.57 | bwd_microstep: 1566.31 | bwd_inner_microstep: 1566.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807 [2024-06-10 21:13:44,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.26 | optimizer_step: 6.59 [2024-06-10 21:13:44,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 3502.19 | bwd_inner_microstep: 1676.00 | bwd_allreduce_microstep: 1826.11 | step_microstep: 39.84 [2024-06-10 21:13:44,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15579.79 | bwd: 43585.05 | bwd_inner: 41758.01 | bwd_allreduce: 1826.35 | step: 41.45 {'loss': 1.1595, 'learning_rate': 9.261922971208217e-06, 'epoch': 0.69} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 21:13:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.32 | bwd_microstep: 773.88 | bwd_inner_microstep: 773.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 21:13:46,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.75 | bwd_microstep: 791.32 | bwd_inner_microstep: 791.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 21:13:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1341.85 | bwd_inner_microstep: 1341.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 21:13:50,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-10 21:13:52,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-10 21:13:54,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.90 | bwd_microstep: 1560.85 | bwd_inner_microstep: 1560.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695 [2024-06-10 21:13:57,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.62 | bwd_microstep: 1625.99 | bwd_inner_microstep: 1625.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 21:13:58,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1254.76 | bwd_inner_microstep: 1254.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-10 21:14:00,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.74 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495 [2024-06-10 21:14:02,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1443.98 | bwd_inner_microstep: 1443.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-10 21:14:04,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1606.14 | bwd_inner_microstep: 1606.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 21:14:05,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.07 | bwd_microstep: 889.55 | bwd_inner_microstep: 889.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 21:14:08,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.90 | bwd_microstep: 1586.89 | bwd_inner_microstep: 1586.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3431 [2024-06-10 21:14:09,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.60 | bwd_microstep: 1189.78 | bwd_inner_microstep: 1189.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 21:14:12,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.64 | bwd_microstep: 1658.57 | bwd_inner_microstep: 1658.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558 [2024-06-10 21:14:13,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3850 [2024-06-10 21:14:16,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.20 | bwd_microstep: 1564.05 | bwd_inner_microstep: 1564.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 21:14:18,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.71 | bwd_microstep: 1516.09 | bwd_inner_microstep: 1516.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 21:14:20,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1396.97 | bwd_inner_microstep: 1396.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636 [2024-06-10 21:14:22,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.51 | bwd_microstep: 1615.25 | bwd_inner_microstep: 1615.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-10 21:14:24,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 21:14:26,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.89 | bwd_inner_microstep: 1378.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859 [2024-06-10 21:14:28,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.10 | bwd_microstep: 1568.52 | bwd_inner_microstep: 1568.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 21:14:30,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809 [2024-06-10 21:14:32,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.23 | bwd_microstep: 1477.05 | bwd_inner_microstep: 1477.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 21:14:34,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1408.10 | bwd_inner_microstep: 1408.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3602 [2024-06-10 21:14:35,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1305.59 | bwd_inner_microstep: 1305.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3455 [2024-06-10 21:14:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.48 | bwd_microstep: 1415.46 | bwd_inner_microstep: 1415.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3778 [2024-06-10 21:14:40,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.24 | bwd_microstep: 1748.43 | bwd_inner_microstep: 1748.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 21:14:42,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.58 | bwd_microstep: 1636.68 | bwd_inner_microstep: 1636.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 21:14:44,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1352.15 | bwd_inner_microstep: 1352.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-10 21:14:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-10 21:14:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.75 | bwd_microstep: 2422.84 | bwd_inner_microstep: 1834.97 | bwd_allreduce_microstep: 587.82 | step_microstep: 37.88 [2024-06-10 21:14:47,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16735.46 | bwd: 45691.87 | bwd_inner: 45103.05 | bwd_allreduce: 588.10 | step: 39.42 {'loss': 1.2084, 'learning_rate': 9.230276618348224e-06, 'epoch': 0.69} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440 [2024-06-10 21:14:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.41 | bwd_microstep: 1378.76 | bwd_inner_microstep: 1378.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1904 [2024-06-10 21:14:50,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 810.76 | bwd_inner_microstep: 810.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3485 [2024-06-10 21:14:52,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1234.30 | bwd_inner_microstep: 1234.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904 [2024-06-10 21:14:53,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.01 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799 [2024-06-10 21:14:55,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.22 | bwd_microstep: 1549.24 | bwd_inner_microstep: 1549.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 21:14:57,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-10 21:14:59,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1251.04 | bwd_inner_microstep: 1251.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1979 [2024-06-10 21:15:00,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.18 | bwd_microstep: 705.54 | bwd_inner_microstep: 705.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-10 21:15:02,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1413.23 | bwd_inner_microstep: 1413.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-10 21:15:03,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.21 | bwd_microstep: 1159.34 | bwd_inner_microstep: 1159.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 21:15:05,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1533.70 | bwd_inner_microstep: 1533.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 21:15:07,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1385.73 | bwd_inner_microstep: 1385.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 21:15:09,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1256.75 | bwd_inner_microstep: 1256.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-10 21:15:11,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3380 [2024-06-10 21:15:13,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1337.62 | bwd_inner_microstep: 1337.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385 [2024-06-10 21:15:14,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.15 | bwd_microstep: 1242.52 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-10 21:15:16,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2625 [2024-06-10 21:15:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.34 | bwd_microstep: 984.99 | bwd_inner_microstep: 984.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 21:15:20,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.21 | bwd_microstep: 1491.67 | bwd_inner_microstep: 1491.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-10 21:15:22,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1394.70 | bwd_inner_microstep: 1394.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1936 [2024-06-10 21:15:23,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.66 | bwd_microstep: 758.50 | bwd_inner_microstep: 758.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397 [2024-06-10 21:15:25,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1439.96 | bwd_inner_microstep: 1439.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-10 21:15:26,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.39 | bwd_microstep: 1216.98 | bwd_inner_microstep: 1216.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572 [2024-06-10 21:15:28,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.11 | bwd_microstep: 1425.71 | bwd_inner_microstep: 1425.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-10 21:15:30,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.25 | bwd_microstep: 1191.04 | bwd_inner_microstep: 1191.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108 [2024-06-10 21:15:31,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.65 | bwd_microstep: 822.98 | bwd_inner_microstep: 822.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3439 [2024-06-10 21:15:33,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.78 | bwd_microstep: 1212.04 | bwd_inner_microstep: 1212.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 21:15:35,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.38 | bwd_inner_microstep: 1404.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 21:15:37,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1458.77 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-10 21:15:39,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1507.11 | bwd_inner_microstep: 1507.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 21:15:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.49 | bwd_microstep: 975.89 | bwd_inner_microstep: 975.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 21:15:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-10 21:15:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 9910.92 | bwd_inner_microstep: 1547.59 | bwd_allreduce_microstep: 8363.25 | step_microstep: 39.44 [2024-06-10 21:15:51,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15023.82 | bwd: 48501.30 | bwd_inner: 40137.12 | bwd_allreduce: 8363.49 | step: 40.95 {'loss': 1.2062, 'learning_rate': 9.198668196397995e-06, 'epoch': 0.69} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1876 [2024-06-10 21:15:52,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.54 | bwd_microstep: 763.72 | bwd_inner_microstep: 763.55 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:15:54,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1367.32 | bwd_inner_microstep: 1367.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-10 21:15:56,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1547.17 | bwd_inner_microstep: 1547.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-10 21:15:58,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.13 | bwd_microstep: 1442.05 | bwd_inner_microstep: 1442.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 21:16:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.78 | bwd_microstep: 1387.32 | bwd_inner_microstep: 1387.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1923 [2024-06-10 21:16:01,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.20 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 21:16:03,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 21:16:04,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.37 | bwd_microstep: 1284.16 | bwd_inner_microstep: 1284.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-10 21:16:06,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.65 | bwd_microstep: 1150.88 | bwd_inner_microstep: 1150.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 21:16:08,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.92 | bwd_microstep: 1504.65 | bwd_inner_microstep: 1504.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496 [2024-06-10 21:16:10,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.82 | bwd_microstep: 1349.87 | bwd_inner_microstep: 1349.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 21:16:12,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 1338.15 | bwd_inner_microstep: 1338.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387 [2024-06-10 21:16:14,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1430.25 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3789 [2024-06-10 21:16:16,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.90 | bwd_microstep: 1577.31 | bwd_inner_microstep: 1577.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2072 [2024-06-10 21:16:17,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.81 | bwd_microstep: 817.08 | bwd_inner_microstep: 817.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-10 21:16:19,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.47 | bwd_microstep: 1618.95 | bwd_inner_microstep: 1618.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 21:16:22,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.58 | bwd_microstep: 1653.88 | bwd_inner_microstep: 1653.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3522 [2024-06-10 21:16:23,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1355.30 | bwd_inner_microstep: 1355.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 21:16:25,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.10 | bwd_inner_microstep: 1280.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050 [2024-06-10 21:16:27,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.79 | bwd_microstep: 910.38 | bwd_inner_microstep: 910.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2278 [2024-06-10 21:16:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.10 | bwd_microstep: 811.92 | bwd_inner_microstep: 811.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-10 21:16:30,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1499.38 | bwd_inner_microstep: 1499.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2638 [2024-06-10 21:16:31,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 426.09 | bwd_microstep: 1148.77 | bwd_inner_microstep: 1148.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-10 21:16:33,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.53 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3462 [2024-06-10 21:16:35,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1326.46 | bwd_inner_microstep: 1326.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2278 [2024-06-10 21:16:37,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.18 | bwd_microstep: 1070.80 | bwd_inner_microstep: 1070.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 21:16:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1410.58 | bwd_inner_microstep: 1410.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 21:16:41,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.23 | bwd_microstep: 1479.56 | bwd_inner_microstep: 1479.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3550 [2024-06-10 21:16:42,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1360.51 | bwd_inner_microstep: 1360.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2958 [2024-06-10 21:16:44,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.31 | bwd_microstep: 1016.74 | bwd_inner_microstep: 1016.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1906 [2024-06-10 21:16:45,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.11 | bwd_microstep: 778.20 | bwd_inner_microstep: 778.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3842 [2024-06-10 21:16:52,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.58 [2024-06-10 21:16:52,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.39 | bwd_microstep: 5918.62 | bwd_inner_microstep: 2329.57 | bwd_allreduce_microstep: 3589.01 | step_microstep: 37.79 [2024-06-10 21:16:52,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15341.30 | bwd: 45102.02 | bwd_inner: 41511.99 | bwd_allreduce: 3589.31 | step: 39.34 69%|██████▉ | 1189/1726 [20:34:19<9:08:35, 61.30s/it] 69%|██████▉ | 1190/1726 [20:35:21<9:10:35, 61.63s/it] 69%|██████▉ | 1190/1726 [20:35:21<9:10:35, 61.63s/it] 69%|██████▉ | 1191/1726 [20:36:21<9:03:51, 60.99s/it] 69%|██████▉ | 1191/1726 [20:36:21<9:03:51, 60.99s/it] 69%|██████▉ | 1192/1726 [20:37:24<9:07:35, 61.53s/it] 69%|██████▉ | 1192/1726 [20:37:24<9:07:35, 61.53s/it] 69%|██████▉ | 1193/1726 [20:38:28<9:12:46, 62.23s/it] 69%|██████▉ | 1193/1726 [20:38:28<9:12:46, 62.23s/it] 69%|██████▉ | 1194/1726 [20:39:28<9:07:54, 61.79s/it] {'loss': 1.1968, 'learning_rate': 9.167097816682218e-06, 'epoch': 0.69} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 21:16:53,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.84 | bwd_microstep: 1331.64 | bwd_inner_microstep: 1331.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2452 [2024-06-10 21:16:55,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.80 | bwd_microstep: 948.03 | bwd_inner_microstep: 948.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 21:16:57,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1376.34 | bwd_inner_microstep: 1376.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 21:16:59,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.01 | bwd_microstep: 1375.38 | bwd_inner_microstep: 1375.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3498 [2024-06-10 21:17:00,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.35 | bwd_microstep: 1218.29 | bwd_inner_microstep: 1218.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2259 [2024-06-10 21:17:02,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.38 | bwd_microstep: 929.94 | bwd_inner_microstep: 929.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 21:17:03,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-10 21:17:05,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1349.32 | bwd_inner_microstep: 1349.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 21:17:07,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1533.33 | bwd_inner_microstep: 1533.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 21:17:09,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1393.21 | bwd_inner_microstep: 1393.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2197 [2024-06-10 21:17:10,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.69 | bwd_microstep: 860.97 | bwd_inner_microstep: 860.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383 [2024-06-10 21:17:12,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.83 | bwd_microstep: 1241.19 | bwd_inner_microstep: 1241.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3523 [2024-06-10 21:17:14,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.26 | bwd_microstep: 1414.66 | bwd_inner_microstep: 1414.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2886 [2024-06-10 21:17:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.13 | bwd_microstep: 1087.21 | bwd_inner_microstep: 1087.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 21:17:18,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.71 | bwd_microstep: 1605.09 | bwd_inner_microstep: 1605.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2381 [2024-06-10 21:17:19,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.31 | bwd_microstep: 1030.96 | bwd_inner_microstep: 1030.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-10 21:17:21,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1407.06 | bwd_inner_microstep: 1407.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477 [2024-06-10 21:17:23,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.15 | bwd_microstep: 1186.10 | bwd_inner_microstep: 1186.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 21:17:25,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1282.66 | bwd_inner_microstep: 1282.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 21:17:27,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1389.96 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 21:17:29,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1555.93 | bwd_inner_microstep: 1555.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 21:17:30,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.51 | bwd_microstep: 1288.75 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 21:17:33,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.67 | bwd_microstep: 1557.08 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3739 [2024-06-10 21:17:34,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.03 | bwd_microstep: 1339.29 | bwd_inner_microstep: 1339.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-10 21:17:36,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1298.63 | bwd_inner_microstep: 1298.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 21:17:38,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.10 | bwd_microstep: 1159.25 | bwd_inner_microstep: 1159.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728 [2024-06-10 21:17:40,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.98 | bwd_microstep: 1336.70 | bwd_inner_microstep: 1336.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-10 21:17:42,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1327.80 | bwd_inner_microstep: 1327.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-10 21:17:43,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.45 | bwd_microstep: 789.99 | bwd_inner_microstep: 789.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-10 21:17:44,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.35 | bwd_microstep: 791.01 | bwd_inner_microstep: 790.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928 [2024-06-10 21:17:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.47 | bwd_microstep: 730.90 | bwd_inner_microstep: 730.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 21:17:51,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-10 21:17:51,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 5279.71 | bwd_inner_microstep: 1525.33 | bwd_allreduce_microstep: 3754.33 | step_microstep: 38.20 [2024-06-10 21:17:51,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14986.39 | bwd: 43665.79 | bwd_inner: 39910.43 | bwd_allreduce: 3754.63 | step: 39.80 {'loss': 1.1994, 'learning_rate': 9.135565590391633e-06, 'epoch': 0.69} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 21:17:53,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.84 | bwd_microstep: 1472.57 | bwd_inner_microstep: 1472.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-10 21:17:55,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.22 | bwd_microstep: 1678.23 | bwd_inner_microstep: 1678.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 21:17:57,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1372.25 | bwd_inner_microstep: 1372.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775 [2024-06-10 21:17:59,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1438.19 | bwd_inner_microstep: 1438.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2636 [2024-06-10 21:18:00,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.01 | bwd_microstep: 1016.66 | bwd_inner_microstep: 1016.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 21:18:02,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1383.41 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-10 21:18:04,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1411.50 | bwd_inner_microstep: 1411.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3537 [2024-06-10 21:18:06,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1257.75 | bwd_inner_microstep: 1257.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-10 21:18:07,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 796.55 | bwd_inner_microstep: 796.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 21:18:09,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 21:18:11,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.98 | bwd_microstep: 1514.91 | bwd_inner_microstep: 1514.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 21:18:13,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3707 [2024-06-10 21:18:15,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.96 | bwd_microstep: 1489.62 | bwd_inner_microstep: 1489.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3654 [2024-06-10 21:18:17,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1547.78 | bwd_inner_microstep: 1547.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 21:18:19,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1347.94 | bwd_inner_microstep: 1347.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501 [2024-06-10 21:18:21,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.91 | bwd_microstep: 1222.76 | bwd_inner_microstep: 1222.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-10 21:18:22,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.57 | bwd_microstep: 1295.25 | bwd_inner_microstep: 1295.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518 [2024-06-10 21:18:24,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.40 | bwd_microstep: 1224.79 | bwd_inner_microstep: 1224.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-10 21:18:25,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.28 | bwd_microstep: 697.10 | bwd_inner_microstep: 697.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 21:18:27,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.07 | bwd_microstep: 1451.19 | bwd_inner_microstep: 1451.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 21:18:29,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1554.79 | bwd_inner_microstep: 1554.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 21:18:31,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.59 | bwd_microstep: 1289.28 | bwd_inner_microstep: 1289.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 21:18:33,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.60 | bwd_microstep: 1459.84 | bwd_inner_microstep: 1459.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 5558 [2024-06-10 21:18:36,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 742.22 | bwd_microstep: 1971.62 | bwd_inner_microstep: 1971.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3728 [2024-06-10 21:18:38,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.59 | bwd_microstep: 1563.63 | bwd_inner_microstep: 1563.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 21:18:40,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.58 | bwd_microstep: 1550.68 | bwd_inner_microstep: 1550.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 21:18:42,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.07 | bwd_microstep: 1545.21 | bwd_inner_microstep: 1545.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3562 [2024-06-10 21:18:44,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1545.92 | bwd_inner_microstep: 1545.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3415 [2024-06-10 21:18:46,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1409.44 | bwd_inner_microstep: 1409.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3589 [2024-06-10 21:18:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1534.55 | bwd_inner_microstep: 1534.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 21:18:51,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.23 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2533 [2024-06-10 21:18:52,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.04 | optimizer_step: 6.61 [2024-06-10 21:18:52,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.33 | bwd_microstep: 1227.63 | bwd_inner_microstep: 1219.95 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.40 [2024-06-10 21:18:52,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16695.25 | bwd: 44681.51 | bwd_inner: 44672.99 | bwd_allreduce: 7.86 | step: 38.98 {'loss': 1.1384, 'learning_rate': 9.104071628582542e-06, 'epoch': 0.69} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931 [2024-06-10 21:18:54,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.61 | bwd_microstep: 880.29 | bwd_inner_microstep: 880.23 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 21:18:55,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1279.77 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819 [2024-06-10 21:18:57,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1386.83 | bwd_inner_microstep: 1386.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2257 [2024-06-10 21:18:58,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.44 | bwd_microstep: 902.69 | bwd_inner_microstep: 902.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 21:19:00,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1247.84 | bwd_inner_microstep: 1247.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 21:19:02,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.19 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4131 [2024-06-10 21:19:05,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.10 | bwd_microstep: 1637.95 | bwd_inner_microstep: 1637.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 21:19:07,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1409.07 | bwd_inner_microstep: 1409.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4029 [2024-06-10 21:19:09,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.64 | bwd_microstep: 1615.59 | bwd_inner_microstep: 1615.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 21:19:11,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.83 | bwd_microstep: 1432.95 | bwd_inner_microstep: 1432.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654 [2024-06-10 21:19:13,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.23 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436 [2024-06-10 21:19:15,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1313.11 | bwd_inner_microstep: 1313.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-10 21:19:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 803.22 | bwd_inner_microstep: 803.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489 [2024-06-10 21:19:18,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.75 | bwd_microstep: 1446.56 | bwd_inner_microstep: 1446.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2095 [2024-06-10 21:19:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 1015.26 | bwd_inner_microstep: 1015.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 21:19:21,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 21:19:23,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1389.39 | bwd_inner_microstep: 1389.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435 [2024-06-10 21:19:25,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.14 | bwd_microstep: 1188.02 | bwd_inner_microstep: 1187.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 21:19:27,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1559.28 | bwd_inner_microstep: 1559.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2919 [2024-06-10 21:19:28,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.88 | bwd_microstep: 1096.05 | bwd_inner_microstep: 1096.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2284 [2024-06-10 21:19:30,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.24 | bwd_microstep: 1005.26 | bwd_inner_microstep: 1005.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3834 [2024-06-10 21:19:32,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1269.63 | bwd_inner_microstep: 1269.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-10 21:19:34,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1519.65 | bwd_inner_microstep: 1519.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 21:19:35,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1280.24 | bwd_inner_microstep: 1280.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2280 [2024-06-10 21:19:37,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.74 | bwd_microstep: 1072.57 | bwd_inner_microstep: 1072.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 21:19:39,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-10 21:19:41,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.08 | bwd_microstep: 1449.57 | bwd_inner_microstep: 1449.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-10 21:19:43,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1303.69 | bwd_inner_microstep: 1303.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-10 21:19:45,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.77 | bwd_microstep: 1448.57 | bwd_inner_microstep: 1448.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559 [2024-06-10 21:19:47,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.31 | bwd_microstep: 1332.52 | bwd_inner_microstep: 1332.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 21:19:49,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.92 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 21:19:53,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 21:19:53,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 3557.24 | bwd_inner_microstep: 1039.27 | bwd_allreduce_microstep: 2517.93 | step_microstep: 38.09 [2024-06-10 21:19:53,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15635.66 | bwd: 44325.18 | bwd_inner: 41806.31 | bwd_allreduce: 2518.18 | step: 39.58 {'loss': 1.1997, 'learning_rate': 9.072616042176543e-06, 'epoch': 0.69} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-10 21:19:55,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.22 | bwd_microstep: 1433.74 | bwd_inner_microstep: 1433.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 21:19:56,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1377.60 | bwd_inner_microstep: 1377.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-10 21:19:58,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 789.31 | bwd_inner_microstep: 789.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-10 21:20:00,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.20 | bwd_microstep: 1446.61 | bwd_inner_microstep: 1446.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-10 21:20:02,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1550.60 | bwd_inner_microstep: 1550.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-10 21:20:04,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.61 | bwd_microstep: 1410.22 | bwd_inner_microstep: 1410.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416 [2024-06-10 21:20:05,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.53 | bwd_microstep: 1151.89 | bwd_inner_microstep: 1151.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 21:20:07,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.36 | bwd_microstep: 1358.76 | bwd_inner_microstep: 1358.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 21:20:09,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1280.22 | bwd_inner_microstep: 1280.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 21:20:11,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1293.64 | bwd_inner_microstep: 1293.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 21:20:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1348.70 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-10 21:20:14,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.31 | bwd_microstep: 699.17 | bwd_inner_microstep: 699.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962 [2024-06-10 21:20:15,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.59 | bwd_microstep: 733.88 | bwd_inner_microstep: 733.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 21:20:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1286.11 | bwd_inner_microstep: 1286.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-10 21:20:19,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1590.34 | bwd_inner_microstep: 1590.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654 [2024-06-10 21:20:21,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.95 | bwd_microstep: 1445.60 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 21:20:23,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1478.49 | bwd_inner_microstep: 1478.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 21:20:24,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.22 | bwd_microstep: 1392.78 | bwd_inner_microstep: 1392.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3632 [2024-06-10 21:20:27,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.24 | bwd_microstep: 1657.46 | bwd_inner_microstep: 1657.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-10 21:20:29,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1495.36 | bwd_inner_microstep: 1495.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-10 21:20:31,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.08 | bwd_microstep: 1425.18 | bwd_inner_microstep: 1425.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2969 [2024-06-10 21:20:32,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.62 | bwd_microstep: 1204.53 | bwd_inner_microstep: 1204.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 21:20:35,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1494.27 | bwd_inner_microstep: 1494.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3555 [2024-06-10 21:20:36,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.65 | bwd_microstep: 1262.56 | bwd_inner_microstep: 1262.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 21:20:38,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1502.41 | bwd_inner_microstep: 1502.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 21:20:40,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1393.20 | bwd_inner_microstep: 1393.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 21:20:42,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1281.31 | bwd_inner_microstep: 1281.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-10 21:20:44,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1298.48 | bwd_inner_microstep: 1298.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3572 [2024-06-10 21:20:46,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.35 | bwd_microstep: 1600.99 | bwd_inner_microstep: 1600.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2261 [2024-06-10 21:20:47,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.02 | bwd_microstep: 972.57 | bwd_inner_microstep: 972.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 21:20:49,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1377.62 | bwd_inner_microstep: 1377.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 21:20:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-10 21:20:53,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 2945.06 | bwd_inner_microstep: 1761.59 | bwd_allreduce_microstep: 1183.42 | step_microstep: 37.84 [2024-06-10 21:20:53,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.83 | bwd: 43978.68 | bwd_inner: 42794.35 | bwd_allreduce: 1183.65 | step: 39.51 {'loss': 1.1777, 'learning_rate': 9.04119894196003e-06, 'epoch': 0.69} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 21:20:55,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.18 | bwd_microstep: 1474.54 | bwd_inner_microstep: 1474.45 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4010 [2024-06-10 21:20:57,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.18 | bwd_microstep: 1705.84 | bwd_inner_microstep: 1705.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851 [2024-06-10 21:20:59,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.78 | bwd_microstep: 1558.23 | bwd_inner_microstep: 1558.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 21:21:02,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.55 | bwd_microstep: 1653.49 | bwd_inner_microstep: 1653.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3774 [2024-06-10 21:21:03,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.22 | bwd_microstep: 1244.95 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 21:21:05,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1384.97 | bwd_inner_microstep: 1384.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 21:21:07,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1248.56 | bwd_inner_microstep: 1248.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-10 21:21:09,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.18 | bwd_microstep: 1350.35 | bwd_inner_microstep: 1350.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968 [2024-06-10 21:21:10,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.64 | bwd_microstep: 732.46 | bwd_inner_microstep: 732.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 21:21:12,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1528.41 | bwd_inner_microstep: 1528.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3500 [2024-06-10 21:21:14,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.59 | bwd_microstep: 1492.08 | bwd_inner_microstep: 1492.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3704 [2024-06-10 21:21:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.33 | bwd_microstep: 1677.78 | bwd_inner_microstep: 1677.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-10 21:21:19,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.16 | bwd_microstep: 1582.46 | bwd_inner_microstep: 1582.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 21:21:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.04 | bwd_microstep: 1718.77 | bwd_inner_microstep: 1718.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 21:21:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1397.78 | bwd_inner_microstep: 1397.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-10 21:21:25,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1508.81 | bwd_inner_microstep: 1508.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1984 [2024-06-10 21:21:26,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.31 | bwd_microstep: 705.35 | bwd_inner_microstep: 705.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480 [2024-06-10 21:21:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1189.42 | bwd_inner_microstep: 1189.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 21:21:29,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.30 | bwd_microstep: 1284.68 | bwd_inner_microstep: 1284.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3666 [2024-06-10 21:21:31,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.18 | bwd_microstep: 1455.60 | bwd_inner_microstep: 1455.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627 [2024-06-10 21:21:34,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1540.84 | bwd_inner_microstep: 1540.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489 [2024-06-10 21:21:35,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 21:21:38,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.50 | bwd_microstep: 1591.73 | bwd_inner_microstep: 1591.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-10 21:21:39,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1403.80 | bwd_inner_microstep: 1403.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 21:21:42,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1506.73 | bwd_inner_microstep: 1506.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3688 [2024-06-10 21:21:44,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.39 | bwd_microstep: 1430.75 | bwd_inner_microstep: 1430.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-10 21:21:45,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3469 [2024-06-10 21:21:47,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.24 | bwd_microstep: 1435.51 | bwd_inner_microstep: 1435.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 21:21:49,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.91 | bwd_microstep: 1349.24 | bwd_inner_microstep: 1349.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-10 21:21:51,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1442.95 | bwd_inner_microstep: 1442.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3600 [2024-06-10 21:21:54,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.27 | bwd_microstep: 1704.22 | bwd_inner_microstep: 1704.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3564 [2024-06-10 21:21:56,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-10 21:21:56,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.41 | bwd_microstep: 2069.60 | bwd_inner_microstep: 1908.65 | bwd_allreduce_microstep: 160.90 | step_microstep: 37.80 [2024-06-10 21:21:56,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17073.27 | bwd: 46087.94 | bwd_inner: 45926.07 | bwd_allreduce: 161.17 | step: 39.36 {'loss': 1.1104, 'learning_rate': 9.009820438583881e-06, 'epoch': 0.69} 69%|██████▉ | 1194/1726 [20:39:28<9:07:54, 61.79s/it] 69%|██████▉ | 1195/1726 [20:40:27<8:59:25, 60.95s/it] 69%|██████▉ | 1195/1726 [20:40:27<8:59:25, 60.95s/it] 69%|██████▉ | 1196/1726 [20:41:29<9:00:24, 61.18s/it] 69%|██████▉ | 1196/1726 [20:41:29<9:00:24, 61.18s/it] 69%|██████▉ | 1197/1726 [20:42:29<8:57:02, 60.91s/it] 69%|██████▉ | 1197/1726 [20:42:29<8:57:02, 60.91s/it] 69%|██████▉ | 1198/1726 [20:43:30<8:54:28, 60.74s/it] 69%|██████▉ | 1198/1726 [20:43:30<8:54:28, 60.74s/it] 69%|██████▉ | 1199/1726 [20:44:33<9:00:47, 61.57s/it] 6dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925 [2024-06-10 21:21:57,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.72 | bwd_microstep: 721.70 | bwd_inner_microstep: 721.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 21:21:59,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.98 | bwd_inner_microstep: 1244.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 21:22:01,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1298.88 | bwd_inner_microstep: 1298.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-10 21:22:03,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.29 | bwd_microstep: 1548.91 | bwd_inner_microstep: 1548.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 21:22:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.18 | bwd_microstep: 1377.83 | bwd_inner_microstep: 1377.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759 [2024-06-10 21:22:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1436.33 | bwd_inner_microstep: 1436.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4114 [2024-06-10 21:22:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.93 | bwd_microstep: 1638.24 | bwd_inner_microstep: 1638.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012 [2024-06-10 21:22:10,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.92 | bwd_microstep: 710.47 | bwd_inner_microstep: 710.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 21:22:12,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.99 | bwd_microstep: 1376.28 | bwd_inner_microstep: 1376.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 21:22:14,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.61 | bwd_microstep: 1410.79 | bwd_inner_microstep: 1410.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 21:22:16,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3445 [2024-06-10 21:22:18,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1302.62 | bwd_inner_microstep: 1302.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-10 21:22:19,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1340.43 | bwd_inner_microstep: 1340.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1925 [2024-06-10 21:22:21,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.45 | bwd_microstep: 759.17 | bwd_inner_microstep: 759.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503 [2024-06-10 21:22:23,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.57 | bwd_microstep: 1445.35 | bwd_inner_microstep: 1445.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657 [2024-06-10 21:22:25,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.00 | bwd_microstep: 1718.21 | bwd_inner_microstep: 1718.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 21:22:27,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1379.60 | bwd_inner_microstep: 1379.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-10 21:22:29,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657 [2024-06-10 21:22:31,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.18 | bwd_microstep: 1523.53 | bwd_inner_microstep: 1523.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552 [2024-06-10 21:22:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1427.79 | bwd_inner_microstep: 1427.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 21:22:35,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.85 | bwd_microstep: 1358.52 | bwd_inner_microstep: 1358.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756 [2024-06-10 21:22:37,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.12 | bwd_microstep: 1541.58 | bwd_inner_microstep: 1541.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055 [2024-06-10 21:22:38,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.49 | bwd_microstep: 914.87 | bwd_inner_microstep: 914.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-10 21:22:40,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.45 | bwd_microstep: 1301.47 | bwd_inner_microstep: 1301.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3673 [2024-06-10 21:22:42,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.10 | bwd_microstep: 1276.77 | bwd_inner_microstep: 1276.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 21:22:43,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 21:22:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1411.00 | bwd_inner_microstep: 1410.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559 [2024-06-10 21:22:47,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1330.24 | bwd_inner_microstep: 1330.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 21:22:49,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.44 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-10 21:22:51,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.59 | bwd_microstep: 1602.14 | bwd_inner_microstep: 1602.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3450 [2024-06-10 21:22:53,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.46 | bwd_microstep: 1512.96 | bwd_inner_microstep: 1512.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 21:22:58,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.49 | optimizer_step: 6.61 [2024-06-10 21:22:58,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 3551.50 | bwd_inner_microstep: 1690.58 | bwd_allreduce_microstep: 1860.84 | step_microstep: 43.09 [2024-06-10 21:22:58,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.59 | bwd: 44830.19 | bwd_inner: 42968.38 | bwd_allreduce: 1861.10 | step: 44.57 {'loss': 1.1653, 'learning_rate': 8.978480642563015e-06, 'epoch': 0.7} 9%|██████▉ | 1199/1726 [20:44:33<9:00:47, 61.57s/it] 70%|██████▉ | 1200/1726 [20:45:34<8:58:52, 61.47s/it] 70%|██████▉ | 1200/1726 [20:45:34<8:58:52, 61.47s/it][INFO|trainer.py:2936] 2024-06-10 21:23:01,003 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200 [INFO|configuration_utils.py:473] 2024-06-10 21:23:01,007 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/config.json [INFO|configuration_utils.py:594] 2024-06-10 21:23:01,009 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-10 21:23:09,316 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-10 21:23:09,339 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-10 21:23:09,342 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-10 21:23:09,343 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/added_tokens.json [2024-06-10 21:23:09,644] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1200 is about to be saved! [2024-06-10 21:23:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt [2024-06-10 21:23:09,657] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt... [2024-06-10 21:23:18,017] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/mp_rank_00_model_states.pt. [2024-06-10 21:23:18,033] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-10 21:23:30,076] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-10 21:23:30,104] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1200/global_step1200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-10 21:23:30,105] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1200 is ready now! [INFO|trainer.py:3028] 2024-06-10 21:23:30,325 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-600] due to args.save_total_limit dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379 [2024-06-10 21:23:33,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1323.38 | bwd_inner_microstep: 1323.28 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3409 [2024-06-10 21:23:35,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1366.03 | bwd_inner_microstep: 1366.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 21:23:36,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.06 | bwd_microstep: 1385.97 | bwd_inner_microstep: 1385.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-10 21:23:38,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.70 | bwd_microstep: 1479.49 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755 [2024-06-10 21:23:41,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1534.80 | bwd_inner_microstep: 1534.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3396 [2024-06-10 21:23:42,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.70 | bwd_microstep: 1206.87 | bwd_inner_microstep: 1206.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 21:23:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1254.96 | bwd_inner_microstep: 1254.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 21:23:46,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1373.70 | bwd_inner_microstep: 1373.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 21:23:48,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167 [2024-06-10 21:23:49,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.76 | bwd_microstep: 852.69 | bwd_inner_microstep: 852.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1985 [2024-06-10 21:23:50,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.39 | bwd_microstep: 826.42 | bwd_inner_microstep: 826.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010 [2024-06-10 21:23:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.34 | bwd_microstep: 802.93 | bwd_inner_microstep: 802.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3421 [2024-06-10 21:23:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.89 | bwd_microstep: 1370.87 | bwd_inner_microstep: 1370.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 21:23:54,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 795.57 | bwd_inner_microstep: 795.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 21:23:56,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1392.43 | bwd_inner_microstep: 1392.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3481 [2024-06-10 21:23:58,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1184.05 | bwd_inner_microstep: 1184.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3648 [2024-06-10 21:24:00,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1315.55 | bwd_inner_microstep: 1315.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090 [2024-06-10 21:24:01,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.27 | bwd_microstep: 918.35 | bwd_inner_microstep: 918.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-10 21:24:03,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1256.15 | bwd_inner_microstep: 1256.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-10 21:24:05,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 21:24:07,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1551.38 | bwd_inner_microstep: 1551.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 21:24:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1455.10 | bwd_inner_microstep: 1455.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 21:24:10,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.07 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 21:24:12,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1258.93 | bwd_inner_microstep: 1258.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-10 21:24:14,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1416.33 | bwd_inner_microstep: 1416.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525 [2024-06-10 21:24:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.56 | bwd_microstep: 1352.90 | bwd_inner_microstep: 1352.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3721 [2024-06-10 21:24:19,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.84 | bwd_microstep: 1781.40 | bwd_inner_microstep: 1781.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836 [2024-06-10 21:24:21,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.30 | bwd_microstep: 1761.35 | bwd_inner_microstep: 1761.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392 [2024-06-10 21:24:23,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.83 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3765 [2024-06-10 21:24:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.62 | bwd_microstep: 1402.91 | bwd_inner_microstep: 1402.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 21:24:26,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1373.37 | bwd_inner_microstep: 1373.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-10 21:24:32,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-10 21:24:32,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 5249.36 | bwd_inner_microstep: 1648.69 | bwd_allreduce_microstep: 3600.61 | step_microstep: 39.11 [2024-06-10 21:24:32,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.39 | bwd: 45544.71 | bwd_inner: 41943.10 | bwd_allreduce: 3600.90 | step: 40.85 {'loss': 1.2298, 'learning_rate': 8.947179664276028e-06, 'epoch': 0.7} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 21:24:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.73 | bwd_microstep: 1469.36 | bwd_inner_microstep: 1469.23 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4003 [2024-06-10 21:24:36,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1505.92 | bwd_inner_microstep: 1505.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 21:24:38,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.96 | bwd_microstep: 1482.78 | bwd_inner_microstep: 1482.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2302 [2024-06-10 21:24:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.55 | bwd_microstep: 814.58 | bwd_inner_microstep: 814.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 21:24:41,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 793.84 | bwd_inner_microstep: 793.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-10 21:24:43,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.89 | bwd_microstep: 1444.13 | bwd_inner_microstep: 1444.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-10 21:24:44,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 789.75 | bwd_inner_microstep: 789.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2457 [2024-06-10 21:24:45,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.75 | bwd_microstep: 920.81 | bwd_inner_microstep: 920.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 21:24:47,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.97 | bwd_microstep: 1302.74 | bwd_inner_microstep: 1302.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 21:24:49,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.22 | bwd_microstep: 1432.92 | bwd_inner_microstep: 1432.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3499 [2024-06-10 21:24:51,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.37 | bwd_microstep: 1533.50 | bwd_inner_microstep: 1533.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3668 [2024-06-10 21:24:53,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.93 | bwd_microstep: 1356.37 | bwd_inner_microstep: 1356.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515 [2024-06-10 21:24:55,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1416.61 | bwd_inner_microstep: 1416.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-10 21:24:56,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.13 | bwd_microstep: 891.04 | bwd_inner_microstep: 891.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-10 21:24:58,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1245.04 | bwd_inner_microstep: 1245.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4022 [2024-06-10 21:25:00,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.15 | bwd_microstep: 1620.31 | bwd_inner_microstep: 1620.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2509 [2024-06-10 21:25:01,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.66 | bwd_microstep: 962.19 | bwd_inner_microstep: 962.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648 [2024-06-10 21:25:04,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.98 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 21:25:05,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1278.08 | bwd_inner_microstep: 1278.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3894 [2024-06-10 21:25:07,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.79 | bwd_microstep: 1516.60 | bwd_inner_microstep: 1516.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 21:25:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.03 | bwd_microstep: 801.96 | bwd_inner_microstep: 801.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2111 [2024-06-10 21:25:10,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.03 | bwd_microstep: 823.84 | bwd_inner_microstep: 823.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3544 [2024-06-10 21:25:12,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.00 | bwd_microstep: 1329.07 | bwd_inner_microstep: 1329.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-10 21:25:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1300.54 | bwd_inner_microstep: 1300.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-10 21:25:15,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.62 | bwd_microstep: 1453.81 | bwd_inner_microstep: 1453.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639 [2024-06-10 21:25:18,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.15 | bwd_microstep: 1581.08 | bwd_inner_microstep: 1581.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944 [2024-06-10 21:25:19,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.77 | bwd_microstep: 729.49 | bwd_inner_microstep: 729.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779 [2024-06-10 21:25:21,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.95 | bwd_microstep: 1747.20 | bwd_inner_microstep: 1747.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 21:25:23,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.71 | bwd_microstep: 1507.77 | bwd_inner_microstep: 1507.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-10 21:25:24,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.68 | bwd_microstep: 813.84 | bwd_inner_microstep: 813.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2243 [2024-06-10 21:25:25,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.46 | bwd_microstep: 966.95 | bwd_inner_microstep: 966.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3438 [2024-06-10 21:25:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-10 21:25:35,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 8999.92 | bwd_inner_microstep: 1720.63 | bwd_allreduce_microstep: 7279.24 | step_microstep: 38.09 [2024-06-10 21:25:35,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14967.12 | bwd: 47448.45 | bwd_inner: 40168.19 | bwd_allreduce: 7279.52 | step: 39.61 {'loss': 1.1638, 'learning_rate': 8.9159176139648e-06, 'epoch': 0.7} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 21:25:37,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.88 | bwd_microstep: 1371.07 | bwd_inner_microstep: 1370.99 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2446 [2024-06-10 21:25:38,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.04 | bwd_microstep: 913.55 | bwd_inner_microstep: 913.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869 [2024-06-10 21:25:40,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.73 | bwd_microstep: 1557.71 | bwd_inner_microstep: 1557.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-10 21:25:42,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.47 | bwd_inner_microstep: 1443.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 21:25:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1436.16 | bwd_inner_microstep: 1436.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-10 21:25:46,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1434.83 | bwd_inner_microstep: 1434.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 21:25:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1241.84 | bwd_inner_microstep: 1241.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 767 [2024-06-10 21:25:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.68 | bwd_microstep: 310.44 | bwd_inner_microstep: 310.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 21:25:50,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1388.38 | bwd_inner_microstep: 1388.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-10 21:25:53,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.04 | bwd_microstep: 1525.20 | bwd_inner_microstep: 1525.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444 [2024-06-10 21:25:54,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.47 | bwd_microstep: 1217.37 | bwd_inner_microstep: 1217.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2096 [2024-06-10 21:25:55,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.58 | bwd_microstep: 849.05 | bwd_inner_microstep: 849.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3657 [2024-06-10 21:25:58,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.17 | bwd_microstep: 1717.57 | bwd_inner_microstep: 1717.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2101 [2024-06-10 21:25:59,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.79 | bwd_microstep: 730.19 | bwd_inner_microstep: 730.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-10 21:26:01,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1407.51 | bwd_inner_microstep: 1407.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 21:26:02,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.67 | bwd_microstep: 791.73 | bwd_inner_microstep: 791.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-10 21:26:04,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.12 | bwd_microstep: 1481.37 | bwd_inner_microstep: 1481.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1962 [2024-06-10 21:26:05,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.83 | bwd_microstep: 887.73 | bwd_inner_microstep: 887.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466 [2024-06-10 21:26:07,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1183.43 | bwd_inner_microstep: 1183.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-10 21:26:09,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.26 | bwd_microstep: 1440.67 | bwd_inner_microstep: 1440.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3693 [2024-06-10 21:26:10,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.08 | bwd_microstep: 1231.91 | bwd_inner_microstep: 1231.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 21:26:12,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.35 | bwd_microstep: 1498.06 | bwd_inner_microstep: 1498.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 21:26:14,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1282.55 | bwd_inner_microstep: 1282.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823 [2024-06-10 21:26:16,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.36 | bwd_microstep: 1479.81 | bwd_inner_microstep: 1479.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3776 [2024-06-10 21:26:19,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1675.44 | bwd_inner_microstep: 1675.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-10 21:26:21,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.65 | bwd_microstep: 1647.61 | bwd_inner_microstep: 1647.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-10 21:26:23,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.42 | bwd_microstep: 1601.65 | bwd_inner_microstep: 1601.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 21:26:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.27 | bwd_microstep: 1550.32 | bwd_inner_microstep: 1550.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 21:26:27,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1402.91 | bwd_inner_microstep: 1402.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2237 [2024-06-10 21:26:28,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.65 | bwd_microstep: 895.97 | bwd_inner_microstep: 895.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3731 [2024-06-10 21:26:31,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1600.35 | bwd_inner_microstep: 1600.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 21:26:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 6.42 | optimizer_step: 6.60 [2024-06-10 21:26:36,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 4984.79 | bwd_inner_microstep: 1755.00 | bwd_allreduce_microstep: 3229.72 | step_microstep: 42.23 [2024-06-10 21:26:36,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15611.32 | bwd: 45180.67 | bwd_inner: 41949.96 | bwd_allreduce: 3230.00 | step: 43.78 {'loss': 1.2172, 'learning_rate': 8.884694601734123e-06, 'epoch': 0.7} dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2637 [2024-06-10 21:26:38,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 433.61 | bwd_microstep: 1134.27 | bwd_inner_microstep: 1134.13 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 21:26:40,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.58 | bwd_microstep: 1242.77 | bwd_inner_microstep: 1242.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3868 [2024-06-10 21:26:42,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1661.84 | bwd_inner_microstep: 1661.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-10 21:26:44,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.24 | bwd_microstep: 1341.94 | bwd_inner_microstep: 1341.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690 [2024-06-10 21:26:46,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.13 | bwd_microstep: 1360.28 | bwd_inner_microstep: 1360.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3496 [2024-06-10 21:26:48,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1443.61 | bwd_inner_microstep: 1443.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2714 [2024-06-10 21:26:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.46 | bwd_microstep: 939.53 | bwd_inner_microstep: 939.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 21:26:50,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.25 | bwd_microstep: 790.72 | bwd_inner_microstep: 790.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2186 [2024-06-10 21:26:51,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.71 | bwd_microstep: 953.14 | bwd_inner_microstep: 953.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 21:26:53,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1247.66 | bwd_inner_microstep: 1247.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 21:26:55,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.71 | bwd_microstep: 1404.39 | bwd_inner_microstep: 1404.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 21:26:57,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 1287.12 | bwd_inner_microstep: 1287.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-10 21:26:59,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1532.23 | bwd_inner_microstep: 1532.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1911 [2024-06-10 21:27:00,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.96 | bwd_microstep: 763.92 | bwd_inner_microstep: 763.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 21:27:02,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.08 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-10 21:27:04,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.27 | bwd_microstep: 1611.95 | bwd_inner_microstep: 1611.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3560 [2024-06-10 21:27:06,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.67 | bwd_microstep: 1658.22 | bwd_inner_microstep: 1658.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3472 [2024-06-10 21:27:08,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.35 | bwd_microstep: 1213.67 | bwd_inner_microstep: 1213.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3841 [2024-06-10 21:27:10,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1491.36 | bwd_inner_microstep: 1491.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-10 21:27:12,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.53 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488 [2024-06-10 21:27:14,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.70 | bwd_microstep: 1441.34 | bwd_inner_microstep: 1441.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001 [2024-06-10 21:27:15,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.24 | bwd_microstep: 803.98 | bwd_inner_microstep: 803.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-10 21:27:16,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.79 | bwd_microstep: 881.47 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3554 [2024-06-10 21:27:18,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1330.84 | bwd_inner_microstep: 1330.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3812 [2024-06-10 21:27:20,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1385.99 | bwd_inner_microstep: 1385.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-10 21:27:22,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.60 | bwd_microstep: 1433.72 | bwd_inner_microstep: 1433.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2268 [2024-06-10 21:27:23,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.46 | bwd_microstep: 971.05 | bwd_inner_microstep: 971.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-10 21:27:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1488.47 | bwd_inner_microstep: 1488.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 21:27:27,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1495.10 | bwd_inner_microstep: 1495.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769 [2024-06-10 21:27:29,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1470.89 | bwd_inner_microstep: 1470.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579 [2024-06-10 21:27:31,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.54 | bwd_microstep: 1526.51 | bwd_inner_microstep: 1526.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806 [2024-06-10 21:27:36,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.22 | optimizer_step: 6.94 [2024-06-10 21:27:36,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 3948.36 | bwd_inner_microstep: 1670.98 | bwd_allreduce_microstep: 2277.33 | step_microstep: 38.35 [2024-06-10 21:27:36,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15557.43 | bwd: 43929.68 | bwd_inner: 41651.33 | bwd_allreduce: 2277.61 | step: 39.92 {'loss': 1.1509, 'learning_rate': 8.853510737551274e-06, 'epoch': 0.7} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3402 [2024-06-10 21:27:38,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.95 | bwd_microstep: 1497.38 | bwd_inner_microstep: 1497.22 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 21:27:40,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1380.53 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-10 21:27:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1348.78 | bwd_inner_microstep: 1348.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-10 21:27:44,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1480.73 | bwd_inner_microstep: 1480.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1868 [2024-06-10 21:27:45,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.14 | bwd_microstep: 708.26 | bwd_inner_microstep: 708.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 21:27:47,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.56 | bwd_microstep: 1376.14 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 21:27:49,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.87 | bwd_microstep: 1376.90 | bwd_inner_microstep: 1376.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405 [2024-06-10 21:27:50,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.40 | bwd_microstep: 1180.34 | bwd_inner_microstep: 1180.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-10 21:27:52,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.07 | bwd_microstep: 1291.89 | bwd_inner_microstep: 1291.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3709 [2024-06-10 21:27:54,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1628.46 | bwd_inner_microstep: 1628.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950 [2024-06-10 21:27:56,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.47 | bwd_microstep: 886.55 | bwd_inner_microstep: 886.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699 [2024-06-10 21:27:58,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.49 | bwd_microstep: 1722.76 | bwd_inner_microstep: 1722.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3990 [2024-06-10 21:28:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.26 | bwd_microstep: 1637.02 | bwd_inner_microstep: 1637.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 21:28:02,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1376.72 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3665 [2024-06-10 21:28:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.77 | bwd_microstep: 1687.99 | bwd_inner_microstep: 1687.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-10 21:28:06,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.51 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 21:28:08,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 796.57 | bwd_inner_microstep: 796.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-10 21:28:09,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1429.67 | bwd_inner_microstep: 1429.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 21:28:11,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1392.56 | bwd_inner_microstep: 1392.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-10 21:28:14,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.44 | bwd_microstep: 1607.30 | bwd_inner_microstep: 1607.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 21:28:16,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1398.60 | bwd_inner_microstep: 1398.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 21:28:18,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.82 | bwd_microstep: 1559.64 | bwd_inner_microstep: 1559.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 21:28:19,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.66 | bwd_microstep: 698.18 | bwd_inner_microstep: 698.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-10 21:28:21,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1402.30 | bwd_inner_microstep: 1402.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 21:28:22,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1347.54 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3781 [2024-06-10 21:28:24,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.17 | bwd_microstep: 1317.05 | bwd_inner_microstep: 1317.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-10 21:28:26,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1294.93 | bwd_inner_microstep: 1294.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3563 [2024-06-10 21:28:28,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.91 | bwd_microstep: 1329.76 | bwd_inner_microstep: 1329.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-10 21:28:30,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1490.38 | bwd_inner_microstep: 1490.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2016 [2024-06-10 21:28:31,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.32 | bwd_microstep: 775.10 | bwd_inner_microstep: 775.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3689 [2024-06-10 21:28:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.43 | bwd_microstep: 1827.24 | bwd_inner_microstep: 1827.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3774 [2024-06-10 21:28:39,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.07 | optimizer_gradients: 4.32 | optimizer_step: 6.62 [2024-06-10 21:28:39,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.64 | bwd_microstep: 4342.43 | bwd_inner_microstep: 1804.36 | bwd_allreduce_microstep: 2538.01 | step_microstep: 39.76 [2024-06-10 21:28:39,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16173.71 | bwd: 45995.32 | bwd_inner: 43456.26 | bwd_allreduce: 2538.33 | step: 41.28 {'loss': 1.242, 'learning_rate': 8.822366131245664e-06, 'epoch': 0.7} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3410 [2024-06-10 21:28:40,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.96 | bwd_microstep: 1175.35 | bwd_inner_microstep: 1175.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1909 [2024-06-10 21:28:41,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.76 | bwd_microstep: 714.97 | bwd_inner_microstep: 714.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-10 21:28:43,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1244.30 | bwd_inner_microstep: 1244.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577 [2024-06-10 21:28:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1294.77 | bwd_inner_microstep: 1294.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3795 [2024-06-10 21:28:47,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1595.78 | bwd_inner_microstep: 1595.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 21:28:49,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.89 | bwd_microstep: 1281.64 | bwd_inner_microstep: 1281.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 21:28:51,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1384.66 | bwd_inner_microstep: 1384.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-10 21:28:53,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.22 | bwd_microstep: 1528.36 | bwd_inner_microstep: 1528.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 21:28:55,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1386.55 | bwd_inner_microstep: 1386.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 21:28:56,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1249.78 | bwd_inner_microstep: 1249.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 21:28:57,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 793.75 | bwd_inner_microstep: 793.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-10 21:28:59,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.53 | bwd_microstep: 1152.79 | bwd_inner_microstep: 1152.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3407 [2024-06-10 21:29:01,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.79 | bwd_microstep: 1436.42 | bwd_inner_microstep: 1436.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3666 [2024-06-10 21:29:03,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1585.67 | bwd_inner_microstep: 1585.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-10 21:29:05,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.97 | bwd_microstep: 975.70 | bwd_inner_microstep: 975.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 21:29:07,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.00 | bwd_microstep: 1484.43 | bwd_inner_microstep: 1484.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3123 [2024-06-10 21:29:08,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.47 | bwd_microstep: 1093.90 | bwd_inner_microstep: 1093.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2961 [2024-06-10 21:29:10,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.42 | bwd_microstep: 1135.81 | bwd_inner_microstep: 1135.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3423 [2024-06-10 21:29:11,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.76 | bwd_microstep: 1185.43 | bwd_inner_microstep: 1185.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837 [2024-06-10 21:29:13,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.09 | bwd_microstep: 1488.93 | bwd_inner_microstep: 1488.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3616 [2024-06-10 21:29:15,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1389.36 | bwd_inner_microstep: 1389.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 21:29:17,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1397.60 | bwd_inner_microstep: 1397.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-10 21:29:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.05 | bwd_microstep: 1327.57 | bwd_inner_microstep: 1327.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-10 21:29:21,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1504.86 | bwd_inner_microstep: 1504.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2285 [2024-06-10 21:29:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.31 | bwd_microstep: 784.60 | bwd_inner_microstep: 784.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1982 [2024-06-10 21:29:23,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.31 | bwd_microstep: 709.58 | bwd_inner_microstep: 709.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2919 [2024-06-10 21:29:25,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.19 | bwd_microstep: 1187.98 | bwd_inner_microstep: 1187.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3578 [2024-06-10 21:29:27,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.38 | bwd_microstep: 1364.95 | bwd_inner_microstep: 1364.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566 [2024-06-10 21:29:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.95 | bwd_microstep: 1428.53 | bwd_inner_microstep: 1428.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-10 21:29:31,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.84 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 21:29:33,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1252.11 | bwd_inner_microstep: 1252.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3570 [2024-06-10 21:29:40,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.35 | optimizer_step: 6.56 [2024-06-10 21:29:40,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 6360.78 | bwd_inner_microstep: 1871.47 | bwd_allreduce_microstep: 4489.24 | step_microstep: 38.80 [2024-06-10 21:29:40,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15323.15 | bwd: 45439.98 | bwd_inner: 40949.81 | bwd_allreduce: 4489.48 | step: 40.27 70%|██████▉ | 1201/1726 [20:47:09<10:25:02, 71.43s/it] 70%|██████▉ | 1201/1726 [20:47:09<10:25:02, 71.43s/it] 70%|██████▉ | 1202/1726 [20:48:12<10:01:05, 68.83s/it] 70%|██████▉ | 1202/1726 [20:48:12<10:01:05, 68.83s/it] 70%|██████▉ | 1203/1726 [20:49:13<9:39:50, 66.52s/it] 70%|██████▉ | 1203/1726 [20:49:13<9:39:50, 66.52s/it] 70%|██████▉ | 1204/1726 [20:50:13<9:21:14, 64.51s/it] 70%|██████▉ | 1204/1726 [20:50:13<9:21:14, 64.51s/it] 70%|██████▉ | 1205/1726 [20:51:15<9:14:58, 63.91s/it] 70%|██████▉ | 1205/1726 [20:51:15<9:14:58, 63.91s/it] 70%|███�{'loss': 1.1899, 'learning_rate': 8.79126089250843e-06, 'epoch': 0.7} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-10 21:29:42,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1467.35 | bwd_inner_microstep: 1467.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 21:29:43,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.01 | bwd_microstep: 1308.55 | bwd_inner_microstep: 1308.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839 [2024-06-10 21:29:45,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.37 | bwd_microstep: 1353.99 | bwd_inner_microstep: 1353.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 21:29:47,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.01 | bwd_microstep: 1446.06 | bwd_inner_microstep: 1446.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771 [2024-06-10 21:29:49,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.77 | bwd_microstep: 1340.48 | bwd_inner_microstep: 1340.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-10 21:29:51,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.07 | bwd_microstep: 1308.73 | bwd_inner_microstep: 1308.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 21:29:53,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-10 21:29:55,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1496.35 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 21:29:57,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.47 | bwd_microstep: 1389.93 | bwd_inner_microstep: 1389.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-10 21:29:59,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.29 | bwd_microstep: 1189.29 | bwd_inner_microstep: 1189.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625 [2024-06-10 21:30:00,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.65 | bwd_microstep: 1313.37 | bwd_inner_microstep: 1313.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 21:30:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.31 | bwd_microstep: 793.37 | bwd_inner_microstep: 793.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 21:30:03,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.69 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674 [2024-06-10 21:30:05,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1549.42 | bwd_inner_microstep: 1549.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 21:30:07,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.14 | bwd_microstep: 1377.25 | bwd_inner_microstep: 1377.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 21:30:10,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.09 | bwd_microstep: 1625.91 | bwd_inner_microstep: 1625.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-10 21:30:11,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 800.78 | bwd_inner_microstep: 800.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2597 [2024-06-10 21:30:12,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 408.48 | bwd_microstep: 1094.59 | bwd_inner_microstep: 1094.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617 [2024-06-10 21:30:14,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1511.71 | bwd_inner_microstep: 1511.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3672 [2024-06-10 21:30:16,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.50 | bwd_microstep: 1454.17 | bwd_inner_microstep: 1454.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-10 21:30:17,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.17 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 21:30:19,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.15 | bwd_microstep: 820.42 | bwd_inner_microstep: 820.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 21:30:20,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076 [2024-06-10 21:30:22,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.59 | bwd_microstep: 917.55 | bwd_inner_microstep: 917.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2023 [2024-06-10 21:30:23,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.61 | bwd_microstep: 904.78 | bwd_inner_microstep: 904.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3806 [2024-06-10 21:30:25,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.81 | bwd_microstep: 1574.34 | bwd_inner_microstep: 1574.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3793 [2024-06-10 21:30:27,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.90 | bwd_microstep: 1749.35 | bwd_inner_microstep: 1749.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2913 [2024-06-10 21:30:29,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.16 | bwd_microstep: 1094.28 | bwd_inner_microstep: 1094.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-10 21:30:31,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.23 | bwd_microstep: 1752.89 | bwd_inner_microstep: 1752.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 21:30:34,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.74 | bwd_microstep: 1547.05 | bwd_inner_microstep: 1547.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591 [2024-06-10 21:30:35,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1307.08 | bwd_inner_microstep: 1307.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-10 21:30:39,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-10 21:30:39,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.72 | bwd_microstep: 3361.32 | bwd_inner_microstep: 1861.37 | bwd_allreduce_microstep: 1499.90 | step_microstep: 39.15 [2024-06-10 21:30:39,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15708.50 | bwd: 43684.95 | bwd_inner: 42184.15 | bwd_allreduce: 1500.12 | step: 41.74 {'loss': 1.1673, 'learning_rate': 8.76019513089206e-06, 'epoch': 0.7} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5710 [2024-06-10 21:30:42,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 805.54 | bwd_microstep: 2152.04 | bwd_inner_microstep: 2152.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 21:30:44,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.17 | bwd_microstep: 1275.46 | bwd_inner_microstep: 1275.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3881 [2024-06-10 21:30:46,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1582.76 | bwd_inner_microstep: 1582.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3420 [2024-06-10 21:30:48,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.72 | bwd_microstep: 1474.81 | bwd_inner_microstep: 1474.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1929 [2024-06-10 21:30:49,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.99 | bwd_microstep: 727.00 | bwd_inner_microstep: 726.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 21:30:51,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1383.62 | bwd_inner_microstep: 1383.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 21:30:53,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.55 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 21:30:55,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.62 | bwd_microstep: 1247.68 | bwd_inner_microstep: 1247.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 21:30:57,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.06 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 21:30:59,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.07 | bwd_microstep: 1542.61 | bwd_inner_microstep: 1542.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3515 [2024-06-10 21:31:01,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.43 | bwd_microstep: 1333.86 | bwd_inner_microstep: 1333.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 21:31:03,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.44 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 21:31:04,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.83 | bwd_microstep: 1376.11 | bwd_inner_microstep: 1376.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646 [2024-06-10 21:31:06,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1410.04 | bwd_inner_microstep: 1410.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3900 [2024-06-10 21:31:09,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.93 | bwd_microstep: 1785.08 | bwd_inner_microstep: 1785.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507 [2024-06-10 21:31:11,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 1415.90 | bwd_inner_microstep: 1415.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911 [2024-06-10 21:31:12,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.73 | bwd_microstep: 685.60 | bwd_inner_microstep: 685.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 21:31:14,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1291.91 | bwd_inner_microstep: 1291.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931 [2024-06-10 21:31:16,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1493.59 | bwd_inner_microstep: 1493.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 21:31:18,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.04 | bwd_microstep: 1659.98 | bwd_inner_microstep: 1659.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 21:31:20,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1287.67 | bwd_inner_microstep: 1287.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 21:31:21,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1254.70 | bwd_inner_microstep: 1254.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2303 [2024-06-10 21:31:23,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 977.44 | bwd_inner_microstep: 977.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511 [2024-06-10 21:31:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.98 | bwd_microstep: 1192.06 | bwd_inner_microstep: 1192.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006 [2024-06-10 21:31:26,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.05 | bwd_microstep: 833.82 | bwd_inner_microstep: 833.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 21:31:28,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.05 | bwd_microstep: 1453.43 | bwd_inner_microstep: 1453.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3555 [2024-06-10 21:31:30,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.39 | bwd_microstep: 1543.16 | bwd_inner_microstep: 1543.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3787 [2024-06-10 21:31:32,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.07 | bwd_microstep: 1599.05 | bwd_inner_microstep: 1599.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2230 [2024-06-10 21:31:33,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.37 | bwd_microstep: 1024.44 | bwd_inner_microstep: 1024.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3762 [2024-06-10 21:31:36,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1606.42 | bwd_inner_microstep: 1606.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3770 [2024-06-10 21:31:38,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.53 | bwd_microstep: 1842.66 | bwd_inner_microstep: 1842.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2406 [2024-06-10 21:31:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.15 | optimizer_gradients: 4.17 | optimizer_step: 6.59 [2024-06-10 21:31:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.80 | bwd_microstep: 1596.10 | bwd_inner_microstep: 1150.90 | bwd_allreduce_microstep: 445.03 | step_microstep: 39.35 [2024-06-10 21:31:40,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16276.45 | bwd: 44096.30 | bwd_inner: 43650.23 | bwd_allreduce: 445.25 | step: 40.89 {'loss': 1.2092, 'learning_rate': 8.729168955810015e-06, 'epoch': 0.7} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3427 [2024-06-10 21:31:42,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1509.26 | bwd_inner_microstep: 1509.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 21:31:43,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.98 | bwd_microstep: 679.70 | bwd_inner_microstep: 679.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3870 [2024-06-10 21:31:45,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.28 | bwd_microstep: 1495.15 | bwd_inner_microstep: 1495.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824 [2024-06-10 21:31:47,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.89 | bwd_microstep: 1513.11 | bwd_inner_microstep: 1513.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 21:31:49,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.91 | bwd_microstep: 1278.50 | bwd_inner_microstep: 1278.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 21:31:51,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 21:31:53,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1403.74 | bwd_inner_microstep: 1403.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:31:55,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1379.05 | bwd_inner_microstep: 1379.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 21:31:57,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.32 | bwd_microstep: 1387.40 | bwd_inner_microstep: 1387.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-10 21:31:58,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1255.35 | bwd_inner_microstep: 1255.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702 [2024-06-10 21:32:00,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.14 | bwd_microstep: 1423.01 | bwd_inner_microstep: 1422.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947 [2024-06-10 21:32:01,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.96 | bwd_microstep: 727.33 | bwd_inner_microstep: 727.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2195 [2024-06-10 21:32:03,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.03 | bwd_microstep: 859.45 | bwd_inner_microstep: 859.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-10 21:32:04,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.26 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1936 [2024-06-10 21:32:06,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.46 | bwd_microstep: 882.36 | bwd_inner_microstep: 882.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 21:32:08,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.97 | bwd_microstep: 1645.09 | bwd_inner_microstep: 1645.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 21:32:10,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.68 | bwd_microstep: 1377.04 | bwd_inner_microstep: 1377.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 21:32:12,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1385.42 | bwd_inner_microstep: 1385.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2609 [2024-06-10 21:32:13,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.24 | bwd_microstep: 1011.80 | bwd_inner_microstep: 1011.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3623 [2024-06-10 21:32:15,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.82 | bwd_microstep: 1356.98 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 21:32:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.77 | bwd_microstep: 1158.60 | bwd_inner_microstep: 1158.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 21:32:18,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.69 | bwd_microstep: 1354.56 | bwd_inner_microstep: 1354.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 21:32:21,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1503.66 | bwd_inner_microstep: 1503.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919 [2024-06-10 21:32:21,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.84 | bwd_microstep: 687.29 | bwd_inner_microstep: 687.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-10 21:32:23,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1387.31 | bwd_inner_microstep: 1387.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2067 [2024-06-10 21:32:25,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.89 | bwd_microstep: 914.92 | bwd_inner_microstep: 914.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3973 [2024-06-10 21:32:27,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.57 | bwd_microstep: 1607.23 | bwd_inner_microstep: 1607.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 21:32:28,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.19 | bwd_microstep: 799.12 | bwd_inner_microstep: 799.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3617 [2024-06-10 21:32:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1600.37 | bwd_inner_microstep: 1600.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-10 21:32:32,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.24 | bwd_microstep: 1307.97 | bwd_inner_microstep: 1307.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584 [2024-06-10 21:32:34,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.42 | bwd_microstep: 1400.32 | bwd_inner_microstep: 1400.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3808 [2024-06-10 21:32:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 21:32:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 8517.94 | bwd_inner_microstep: 2072.54 | bwd_allreduce_microstep: 6445.33 | step_microstep: 38.89 [2024-06-10 21:32:43,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15216.90 | bwd: 47475.53 | bwd_inner: 41029.22 | bwd_allreduce: 6445.60 | step: 40.40 {'loss': 1.1537, 'learning_rate': 8.698182476536316e-06, 'epoch': 0.7} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 21:32:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1370.63 | bwd_inner_microstep: 1370.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2322 [2024-06-10 21:32:46,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.48 | bwd_microstep: 882.18 | bwd_inner_microstep: 882.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3825 [2024-06-10 21:32:48,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1401.39 | bwd_inner_microstep: 1401.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792 [2024-06-10 21:32:50,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1471.04 | bwd_inner_microstep: 1471.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-10 21:32:52,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.49 | bwd_microstep: 1645.66 | bwd_inner_microstep: 1645.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-10 21:32:55,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 21:32:56,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1342.38 | bwd_inner_microstep: 1342.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2910 [2024-06-10 21:32:58,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.59 | bwd_microstep: 999.27 | bwd_inner_microstep: 999.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182 [2024-06-10 21:32:59,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.16 | bwd_microstep: 951.75 | bwd_inner_microstep: 951.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 21:33:01,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1402.52 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511 [2024-06-10 21:33:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1446.61 | bwd_inner_microstep: 1446.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3434 [2024-06-10 21:33:05,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3384 [2024-06-10 21:33:07,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.85 | bwd_microstep: 1240.12 | bwd_inner_microstep: 1240.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 21:33:09,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1384.41 | bwd_inner_microstep: 1384.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 21:33:11,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.74 | bwd_microstep: 1524.43 | bwd_inner_microstep: 1524.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1974 [2024-06-10 21:33:12,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.64 | bwd_microstep: 766.26 | bwd_inner_microstep: 766.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 21:33:14,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-10 21:33:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1434.99 | bwd_inner_microstep: 1434.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999 [2024-06-10 21:33:17,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.26 | bwd_microstep: 737.37 | bwd_inner_microstep: 737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.39 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539 [2024-06-10 21:33:18,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.19 | bwd_microstep: 1199.06 | bwd_inner_microstep: 1199.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1998 [2024-06-10 21:33:19,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.54 | bwd_microstep: 706.85 | bwd_inner_microstep: 706.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-10 21:33:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.72 | bwd_microstep: 1658.35 | bwd_inner_microstep: 1658.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 21:33:24,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.62 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 21:33:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.75 | bwd_microstep: 1654.18 | bwd_inner_microstep: 1654.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269 [2024-06-10 21:33:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.41 | bwd_microstep: 975.71 | bwd_inner_microstep: 975.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 21:33:29,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.11 | bwd_microstep: 1407.71 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805 [2024-06-10 21:33:31,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.54 | bwd_microstep: 1449.33 | bwd_inner_microstep: 1449.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3772 [2024-06-10 21:33:33,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.33 | bwd_microstep: 1437.16 | bwd_inner_microstep: 1437.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3782 [2024-06-10 21:33:35,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1380.82 | bwd_inner_microstep: 1380.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 21:33:37,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1449.28 | bwd_inner_microstep: 1449.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3761 [2024-06-10 21:33:39,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1637.24 | bwd_inner_microstep: 1637.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592 [2024-06-10 21:33:45,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-10 21:33:45,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.12 | bwd_microstep: 5240.13 | bwd_inner_microstep: 1908.13 | bwd_allreduce_microstep: 3331.93 | step_microstep: 38.79 [2024-06-10 21:33:45,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15829.38 | bwd: 46070.78 | bwd_inner: 42737.93 | bwd_allreduce: 3332.17 | step: 41.58 {'loss': 1.2004, 'learning_rate': 8.667235802205183e-06, 'epoch': 0.7} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3495 [2024-06-10 21:33:47,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.57 | bwd_microstep: 1506.58 | bwd_inner_microstep: 1506.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 21:33:49,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.90 | bwd_microstep: 1272.98 | bwd_inner_microstep: 1272.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2391 [2024-06-10 21:33:50,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 838.81 | bwd_inner_microstep: 838.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2756 [2024-06-10 21:33:52,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.71 | bwd_microstep: 1002.27 | bwd_inner_microstep: 1002.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4133 [2024-06-10 21:33:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.99 | bwd_microstep: 1589.07 | bwd_inner_microstep: 1589.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 21:33:56,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1251.56 | bwd_inner_microstep: 1251.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 21:33:57,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1245.29 | bwd_inner_microstep: 1245.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 21:33:59,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1392.44 | bwd_inner_microstep: 1392.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 21:34:01,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1247.60 | bwd_inner_microstep: 1247.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928 [2024-06-10 21:34:02,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.85 | bwd_microstep: 725.12 | bwd_inner_microstep: 725.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060 [2024-06-10 21:34:03,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.51 | bwd_microstep: 818.36 | bwd_inner_microstep: 818.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-10 21:34:05,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.51 | bwd_microstep: 1154.97 | bwd_inner_microstep: 1154.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-10 21:34:07,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.34 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3699 [2024-06-10 21:34:09,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1482.45 | bwd_inner_microstep: 1482.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2107 [2024-06-10 21:34:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.87 | bwd_microstep: 982.37 | bwd_inner_microstep: 982.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 21:34:12,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1346.09 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3945 [2024-06-10 21:34:14,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.57 | bwd_microstep: 1600.25 | bwd_inner_microstep: 1600.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-10 21:34:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1627.75 | bwd_inner_microstep: 1627.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 21:34:18,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.58 | bwd_microstep: 1292.23 | bwd_inner_microstep: 1292.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3539 [2024-06-10 21:34:20,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.64 | bwd_microstep: 1451.83 | bwd_inner_microstep: 1451.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 21:34:22,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1547.66 | bwd_inner_microstep: 1547.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 21:34:25,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.97 | bwd_microstep: 1658.83 | bwd_inner_microstep: 1658.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-10 21:34:26,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.35 | bwd_microstep: 1356.52 | bwd_inner_microstep: 1356.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543 [2024-06-10 21:34:28,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.47 | bwd_microstep: 1326.75 | bwd_inner_microstep: 1326.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782 [2024-06-10 21:34:30,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.14 | bwd_microstep: 1353.04 | bwd_inner_microstep: 1353.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 21:34:32,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1475.13 | bwd_inner_microstep: 1475.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 21:34:34,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.20 | bwd_microstep: 1373.55 | bwd_inner_microstep: 1373.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2286 [2024-06-10 21:34:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.84 | bwd_microstep: 1006.25 | bwd_inner_microstep: 1006.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3424 [2024-06-10 21:34:37,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1310.96 | bwd_inner_microstep: 1310.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 21:34:39,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1253.65 | bwd_inner_microstep: 1253.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 21:34:41,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.60 | bwd_microstep: 1651.77 | bwd_inner_microstep: 1651.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3431 [2024-06-10 21:34:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 21:34:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.59 | bwd_microstep: 5719.60 | bwd_inner_microstep: 1721.02 | bwd_allreduce_microstep: 3998.52 | step_microstep: 38.62 [2024-06-10 21:34:48,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15773.81 | bwd: 46208.79 | bwd_inner: 42209.36 | bwd_allreduce: 3998.75 | step: 40.03 �██▉ | 1206/1726 [20:52:16<9:06:33, 63.06s/it] 70%|██████▉ | 1206/1726 [20:52:16<9:06:33, 63.06s/it] 70%|██████▉ | 1207/1726 [20:53:16<8:56:50, 62.06s/it] 70%|██████▉ | 1207/1726 [20:53:16<8:56:50, 62.06s/it] 70%|██████▉ | 1208/1726 [20:54:17<8:52:17, 61.66s/it] 70%|██████▉ | 1208/1726 [20:54:17<8:52:17, 61.66s/it] 70%|███████ | 1209/1726 [20:55:20<8:54:47, 62.07s/it] 70%|███████ | 1209/1726 [20:55:20<8:54:47, 62.07s/it] 70%|███████ | 1210/1726 [20:56:22<8:54:11, 62.12s/it] 70%|███████ | 1210/1726 [20:56:22<8:54:11, 62.12s/it] 70%|███████ | 1211/1726{'loss': 1.1983, 'learning_rate': 8.636329041810632e-06, 'epoch': 0.7} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-10 21:34:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.78 | bwd_microstep: 1271.65 | bwd_inner_microstep: 1271.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 21:34:51,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 21:34:53,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1349.92 | bwd_inner_microstep: 1349.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-10 21:34:55,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.92 | bwd_microstep: 1210.53 | bwd_inner_microstep: 1210.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 21:34:56,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1276.50 | bwd_inner_microstep: 1276.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-10 21:34:58,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.73 | bwd_microstep: 1277.40 | bwd_inner_microstep: 1277.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2240 [2024-06-10 21:34:59,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.99 | bwd_microstep: 895.20 | bwd_inner_microstep: 895.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-10 21:35:01,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.72 | bwd_microstep: 1483.43 | bwd_inner_microstep: 1483.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887 [2024-06-10 21:35:02,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.82 | bwd_microstep: 682.46 | bwd_inner_microstep: 682.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-10 21:35:03,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.82 | bwd_microstep: 698.50 | bwd_inner_microstep: 698.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 21:35:05,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.31 | bwd_microstep: 1248.95 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 21:35:07,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1380.46 | bwd_inner_microstep: 1380.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-10 21:35:09,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1512.14 | bwd_inner_microstep: 1512.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2199 [2024-06-10 21:35:10,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.95 | bwd_microstep: 958.39 | bwd_inner_microstep: 958.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3541 [2024-06-10 21:35:13,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.62 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 21:35:14,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.01 | bwd_microstep: 1387.97 | bwd_inner_microstep: 1387.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-10 21:35:16,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1883 [2024-06-10 21:35:17,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.12 | bwd_microstep: 711.35 | bwd_inner_microstep: 711.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-10 21:35:19,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.67 | bwd_microstep: 818.44 | bwd_inner_microstep: 818.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-10 21:35:20,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.36 | bwd_inner_microstep: 1405.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 21:35:22,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1389.79 | bwd_inner_microstep: 1389.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3610 [2024-06-10 21:35:24,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.24 | bwd_microstep: 1457.43 | bwd_inner_microstep: 1457.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-10 21:35:26,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.59 | bwd_microstep: 1430.77 | bwd_inner_microstep: 1430.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 21:35:28,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.41 | bwd_microstep: 1391.79 | bwd_inner_microstep: 1391.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-10 21:35:30,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.69 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-10 21:35:32,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.34 | bwd_microstep: 1415.83 | bwd_inner_microstep: 1415.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2245 [2024-06-10 21:35:34,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.43 | bwd_microstep: 1002.29 | bwd_inner_microstep: 1002.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3818 [2024-06-10 21:35:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1621.75 | bwd_inner_microstep: 1621.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-10 21:35:38,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.39 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3724 [2024-06-10 21:35:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1477.06 | bwd_inner_microstep: 1477.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050 [2024-06-10 21:35:41,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.62 | bwd_microstep: 909.55 | bwd_inner_microstep: 909.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-10 21:35:49,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-10 21:35:49,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.49 | bwd_microstep: 7443.35 | bwd_inner_microstep: 1448.26 | bwd_allreduce_microstep: 5995.03 | step_microstep: 37.95 [2024-06-10 21:35:49,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14980.17 | bwd: 45969.78 | bwd_inner: 39973.84 | bwd_allreduce: 5995.26 | step: 39.41 {'loss': 1.233, 'learning_rate': 8.605462304206129e-06, 'epoch': 0.7} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-10 21:35:51,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-10 21:35:53,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.87 | bwd_microstep: 1618.92 | bwd_inner_microstep: 1618.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 21:35:55,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.39 | bwd_microstep: 1242.49 | bwd_inner_microstep: 1242.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-10 21:35:56,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.76 | bwd_microstep: 677.75 | bwd_inner_microstep: 677.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 21:35:58,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.45 | bwd_microstep: 1380.93 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945 [2024-06-10 21:35:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.49 | bwd_microstep: 702.45 | bwd_inner_microstep: 702.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2219 [2024-06-10 21:36:00,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.18 | bwd_microstep: 957.82 | bwd_inner_microstep: 957.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 21:36:01,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 790.61 | bwd_inner_microstep: 790.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 21:36:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3483 [2024-06-10 21:36:05,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.09 | bwd_microstep: 1440.58 | bwd_inner_microstep: 1440.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431 [2024-06-10 21:36:07,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 1412.24 | bwd_inner_microstep: 1412.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 21:36:09,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 21:36:10,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1276.03 | bwd_inner_microstep: 1276.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665 [2024-06-10 21:36:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1414.74 | bwd_inner_microstep: 1414.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422 [2024-06-10 21:36:14,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.23 | bwd_microstep: 1540.62 | bwd_inner_microstep: 1540.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3684 [2024-06-10 21:36:17,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.11 | bwd_microstep: 1554.12 | bwd_inner_microstep: 1554.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640 [2024-06-10 21:36:18,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1317.43 | bwd_inner_microstep: 1317.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016 [2024-06-10 21:36:19,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.22 | bwd_microstep: 711.33 | bwd_inner_microstep: 711.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 21:36:21,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.63 | bwd_microstep: 1288.40 | bwd_inner_microstep: 1288.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 21:36:23,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-10 21:36:25,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.97 | bwd_microstep: 1182.21 | bwd_inner_microstep: 1182.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3822 [2024-06-10 21:36:27,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.43 | bwd_microstep: 1687.83 | bwd_inner_microstep: 1687.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2912 [2024-06-10 21:36:29,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.19 | bwd_microstep: 1092.89 | bwd_inner_microstep: 1092.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-10 21:36:30,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.90 | bwd_microstep: 1409.11 | bwd_inner_microstep: 1409.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3818 [2024-06-10 21:36:33,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.62 | bwd_microstep: 1516.50 | bwd_inner_microstep: 1516.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-10 21:36:35,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.48 | bwd_microstep: 1499.45 | bwd_inner_microstep: 1499.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-10 21:36:37,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1428.33 | bwd_inner_microstep: 1428.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 21:36:39,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1400.14 | bwd_inner_microstep: 1400.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3443 [2024-06-10 21:36:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.98 | bwd_microstep: 1316.58 | bwd_inner_microstep: 1316.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-10 21:36:42,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1532.63 | bwd_inner_microstep: 1532.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-10 21:36:44,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1405.18 | bwd_inner_microstep: 1405.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3812 [2024-06-10 21:36:52,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 21:36:52,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.81 | bwd_microstep: 7036.94 | bwd_inner_microstep: 2120.08 | bwd_allreduce_microstep: 4916.81 | step_microstep: 38.12 [2024-06-10 21:36:52,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15742.73 | bwd: 47200.68 | bwd_inner: 42282.97 | bwd_allreduce: 4917.04 | step: 39.62 {'loss': 1.2115, 'learning_rate': 8.57463569810415e-06, 'epoch': 0.7} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-10 21:36:54,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.31 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-10 21:36:56,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1295.73 | bwd_inner_microstep: 1295.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940 [2024-06-10 21:36:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.39 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 21:37:00,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-10 21:37:02,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.15 | bwd_microstep: 1548.71 | bwd_inner_microstep: 1548.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3786 [2024-06-10 21:37:04,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.10 | bwd_microstep: 1645.93 | bwd_inner_microstep: 1645.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3733 [2024-06-10 21:37:06,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.86 | bwd_microstep: 1461.25 | bwd_inner_microstep: 1461.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1884 [2024-06-10 21:37:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.53 | bwd_microstep: 745.44 | bwd_inner_microstep: 745.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984 [2024-06-10 21:37:09,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.71 | bwd_microstep: 735.38 | bwd_inner_microstep: 735.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3963 [2024-06-10 21:37:11,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1498.59 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 21:37:13,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1391.17 | bwd_inner_microstep: 1391.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3503 [2024-06-10 21:37:14,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1435.37 | bwd_inner_microstep: 1435.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-10 21:37:16,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.41 | bwd_microstep: 786.04 | bwd_inner_microstep: 786.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1944 [2024-06-10 21:37:17,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.53 | bwd_microstep: 760.54 | bwd_inner_microstep: 760.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-10 21:37:19,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1482.67 | bwd_inner_microstep: 1482.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423 [2024-06-10 21:37:20,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-10 21:37:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.02 | bwd_microstep: 1614.97 | bwd_inner_microstep: 1614.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-10 21:37:25,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.66 | bwd_microstep: 1311.14 | bwd_inner_microstep: 1311.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 21:37:27,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.43 | bwd_microstep: 1457.96 | bwd_inner_microstep: 1457.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-10 21:37:29,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.33 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3487 [2024-06-10 21:37:31,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.92 | bwd_microstep: 1543.75 | bwd_inner_microstep: 1543.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-10 21:37:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-10 21:37:35,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.19 | bwd_microstep: 1320.78 | bwd_inner_microstep: 1320.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 21:37:36,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1256.17 | bwd_inner_microstep: 1256.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-10 21:37:38,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1422.50 | bwd_inner_microstep: 1422.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3388 [2024-06-10 21:37:40,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.70 | bwd_microstep: 1338.96 | bwd_inner_microstep: 1338.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3828 [2024-06-10 21:37:42,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.59 | bwd_microstep: 1706.90 | bwd_inner_microstep: 1706.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-10 21:37:44,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1411.90 | bwd_inner_microstep: 1411.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2084 [2024-06-10 21:37:46,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.75 | bwd_microstep: 848.83 | bwd_inner_microstep: 848.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-10 21:37:48,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.49 | bwd_microstep: 1623.70 | bwd_inner_microstep: 1623.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3487 [2024-06-10 21:37:50,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.57 | bwd_microstep: 1506.93 | bwd_inner_microstep: 1506.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3581 [2024-06-10 21:37:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.07 | optimizer_step: 6.58 [2024-06-10 21:37:56,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.05 | bwd_microstep: 5009.65 | bwd_inner_microstep: 1875.85 | bwd_allreduce_microstep: 3133.75 | step_microstep: 38.38 [2024-06-10 21:37:56,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16223.93 | bwd: 46813.14 | bwd_inner: 43678.50 | bwd_allreduce: 3133.97 | step: 39.93 {'loss': 1.1902, 'learning_rate': 8.543849332075862e-06, 'epoch': 0.7} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4285 [2024-06-10 21:37:58,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.98 | bwd_microstep: 1517.48 | bwd_inner_microstep: 1517.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3886 [2024-06-10 21:38:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.84 | bwd_microstep: 1583.80 | bwd_inner_microstep: 1583.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3873 [2024-06-10 21:38:02,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1317.64 | bwd_inner_microstep: 1317.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3787 [2024-06-10 21:38:04,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1378.65 | bwd_inner_microstep: 1378.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 21:38:06,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.61 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 21:38:07,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.04 | bwd_microstep: 1245.50 | bwd_inner_microstep: 1245.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3510 [2024-06-10 21:38:09,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1350.52 | bwd_inner_microstep: 1350.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 21:38:11,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.73 | bwd_microstep: 1151.40 | bwd_inner_microstep: 1151.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2225 [2024-06-10 21:38:12,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.01 | bwd_microstep: 798.75 | bwd_inner_microstep: 798.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 21:38:14,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.47 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-10 21:38:15,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.70 | bwd_microstep: 1309.15 | bwd_inner_microstep: 1309.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3523 [2024-06-10 21:38:17,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.45 | bwd_microstep: 1458.14 | bwd_inner_microstep: 1458.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 21:38:19,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.16 | bwd_microstep: 1492.07 | bwd_inner_microstep: 1492.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 21:38:21,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1348.70 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-10 21:38:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-10 21:38:26,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.80 | bwd_microstep: 1526.86 | bwd_inner_microstep: 1526.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 21:38:27,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1288.65 | bwd_inner_microstep: 1288.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3536 [2024-06-10 21:38:29,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.30 | bwd_microstep: 1522.57 | bwd_inner_microstep: 1522.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3627 [2024-06-10 21:38:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1588.65 | bwd_inner_microstep: 1588.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 21:38:34,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.69 | bwd_microstep: 1381.64 | bwd_inner_microstep: 1381.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 21:38:36,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1502.38 | bwd_inner_microstep: 1502.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 21:38:37,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.77 | bwd_microstep: 1288.00 | bwd_inner_microstep: 1287.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577 [2024-06-10 21:38:39,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.87 | bwd_microstep: 1301.09 | bwd_inner_microstep: 1301.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 21:38:41,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.66 | bwd_microstep: 1412.43 | bwd_inner_microstep: 1412.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020 [2024-06-10 21:38:42,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.19 | bwd_microstep: 808.31 | bwd_inner_microstep: 808.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2281 [2024-06-10 21:38:44,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.06 | bwd_microstep: 1005.20 | bwd_inner_microstep: 1005.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-10 21:38:46,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1534.62 | bwd_inner_microstep: 1534.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-10 21:38:48,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.89 | bwd_microstep: 1615.73 | bwd_inner_microstep: 1615.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819 [2024-06-10 21:38:50,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.61 | bwd_microstep: 1724.71 | bwd_inner_microstep: 1724.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-10 21:38:53,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.77 | bwd_microstep: 1596.25 | bwd_inner_microstep: 1596.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2227 [2024-06-10 21:38:54,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.34 | bwd_microstep: 958.53 | bwd_inner_microstep: 958.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465 [2024-06-10 21:38:56,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-10 21:38:56,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.29 | bwd_microstep: 1612.79 | bwd_inner_microstep: 1605.01 | bwd_allreduce_microstep: 7.74 | step_microstep: 37.85 [2024-06-10 21:38:56,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16401.71 | bwd: 43848.03 | bwd_inner: 43839.40 | bwd_allreduce: 7.97 | step: 39.32 {'loss': 1.1769, 'learning_rate': 8.513103314550657e-06, 'epoch': 0.7} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 21:38:57,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 781.04 | bwd_inner_microstep: 781.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3937 [2024-06-10 21:38:59,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1592.19 | bwd_inner_microstep: 1592.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 21:39:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1553.09 | bwd_inner_microstep: 1553.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-10 21:39:04,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1478.44 | bwd_inner_microstep: 1478.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 21:39:05,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.32 | bwd_microstep: 1151.79 | bwd_inner_microstep: 1151.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862 [2024-06-10 21:39:07,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1460.32 | bwd_inner_microstep: 1460.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 21:39:09,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1352.49 | bwd_inner_microstep: 1352.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3489 [2024-06-10 21:39:11,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1430.83 | bwd_inner_microstep: 1430.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 21:39:13,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1290.52 | bwd_inner_microstep: 1290.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 21:39:15,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1389.04 | bwd_inner_microstep: 1389.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3904 [2024-06-10 21:39:17,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.01 | bwd_microstep: 1394.44 | bwd_inner_microstep: 1394.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510 [2024-06-10 21:39:18,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.30 | bwd_microstep: 1225.48 | bwd_inner_microstep: 1225.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-10 21:39:20,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.81 | bwd_microstep: 1153.61 | bwd_inner_microstep: 1153.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1913 [2024-06-10 21:39:21,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.02 | bwd_microstep: 717.60 | bwd_inner_microstep: 717.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2097 [2024-06-10 21:39:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.03 | bwd_microstep: 770.74 | bwd_inner_microstep: 770.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 21:39:24,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1378.59 | bwd_inner_microstep: 1378.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3524 [2024-06-10 21:39:26,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.43 | bwd_microstep: 1520.75 | bwd_inner_microstep: 1520.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-10 21:39:28,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.30 | bwd_microstep: 1285.41 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-10 21:39:30,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.57 | bwd_microstep: 1709.48 | bwd_inner_microstep: 1709.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917 [2024-06-10 21:39:31,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.08 | bwd_microstep: 779.38 | bwd_inner_microstep: 779.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 21:39:33,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 1432.07 | bwd_inner_microstep: 1432.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3072 [2024-06-10 21:39:35,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.76 | bwd_microstep: 1177.26 | bwd_inner_microstep: 1177.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692 [2024-06-10 21:39:37,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1426.39 | bwd_inner_microstep: 1426.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586 [2024-06-10 21:39:39,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.89 | bwd_microstep: 1606.99 | bwd_inner_microstep: 1606.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-10 21:39:41,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.95 | bwd_microstep: 1182.10 | bwd_inner_microstep: 1182.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 617 [2024-06-10 21:39:41,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.21 | bwd_microstep: 261.23 | bwd_inner_microstep: 261.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 21:39:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.40 | bwd_microstep: 1500.19 | bwd_inner_microstep: 1500.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-10 21:39:45,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1219.73 | bwd_inner_microstep: 1219.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768 [2024-06-10 21:39:47,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.72 | bwd_microstep: 1449.21 | bwd_inner_microstep: 1449.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-10 21:39:49,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 1609.41 | bwd_inner_microstep: 1609.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-10 21:39:51,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.60 | bwd_microstep: 1497.43 | bwd_inner_microstep: 1497.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2044 [2024-06-10 21:39:57,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.20 | optimizer_step: 6.63 [2024-06-10 21:39:57,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.85 | bwd_microstep: 5314.25 | bwd_inner_microstep: 1038.19 | bwd_allreduce_microstep: 4276.01 | step_microstep: 37.96 [2024-06-10 21:39:57,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15266.18 | bwd: 45091.50 | bwd_inner: 40814.57 | bwd_allreduce: 4276.24 | step: 39.44 [20:57:24<8:53:39, 62.17s/it] 70%|███████ | 1211/1726 [20:57:24<8:53:39, 62.17s/it] 70%|███████ | 1212/1726 [20:58:26<8:50:18, 61.90s/it] 70%|███████ | 1212/1726 [20:58:26<8:50:18, 61.90s/it] 70%|███████ | 1213/1726 [20:59:29<8:52:47, 62.31s/it] 70%|███████ | 1213/1726 [20:59:29<8:52:47, 62.31s/it] 70%|███████ | 1214/1726 [21:00:32<8:54:27, 62.63s/it] 70%|███████ | 1214/1726 [21:00:32<8:54:27, 62.63s/it] 70%|███████ | 1215/1726 [21:01:33<8:48:11, 62.02s/it] 70%|███████ | 1215/1726 [21:01:33<8:48:11, 62.02s/it] 70%|███████ | 1216/1726 [21:02:34<8:43:45, 61.6{'loss': 1.2051, 'learning_rate': 8.482397753815872e-06, 'epoch': 0.7} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 21:39:59,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1330.34 | bwd_inner_microstep: 1330.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 21:40:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.77 | bwd_microstep: 1473.11 | bwd_inner_microstep: 1473.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 21:40:03,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.81 | bwd_microstep: 1403.30 | bwd_inner_microstep: 1403.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3864 [2024-06-10 21:40:05,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.37 | bwd_microstep: 1660.94 | bwd_inner_microstep: 1660.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-10 21:40:07,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.38 | bwd_microstep: 1276.67 | bwd_inner_microstep: 1276.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-10 21:40:09,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.71 | bwd_microstep: 1651.13 | bwd_inner_microstep: 1651.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-10 21:40:11,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1250.03 | bwd_inner_microstep: 1250.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-10 21:40:13,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1413.19 | bwd_inner_microstep: 1413.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 21:40:14,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1248.77 | bwd_inner_microstep: 1248.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481 [2024-06-10 21:40:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1429.73 | bwd_inner_microstep: 1429.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408 [2024-06-10 21:40:18,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1274.90 | bwd_inner_microstep: 1274.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-10 21:40:20,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1346.02 | bwd_inner_microstep: 1346.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 21:40:22,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.92 | bwd_microstep: 1520.84 | bwd_inner_microstep: 1520.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2446 [2024-06-10 21:40:24,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 420.91 | bwd_microstep: 1132.57 | bwd_inner_microstep: 1132.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-10 21:40:26,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1508.56 | bwd_inner_microstep: 1508.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-10 21:40:28,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1522.02 | bwd_inner_microstep: 1521.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2402 [2024-06-10 21:40:29,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.70 | bwd_microstep: 965.17 | bwd_inner_microstep: 965.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2115 [2024-06-10 21:40:30,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.84 | bwd_microstep: 735.56 | bwd_inner_microstep: 735.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988 [2024-06-10 21:40:31,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.23 | bwd_microstep: 827.24 | bwd_inner_microstep: 827.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-10 21:40:33,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.22 | bwd_microstep: 1314.07 | bwd_inner_microstep: 1314.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3678 [2024-06-10 21:40:35,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.89 | bwd_microstep: 1520.47 | bwd_inner_microstep: 1520.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 21:40:37,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1398.50 | bwd_inner_microstep: 1398.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 21:40:39,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.81 | bwd_microstep: 1452.51 | bwd_inner_microstep: 1452.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 21:40:41,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1557.76 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747 [2024-06-10 21:40:43,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.85 | bwd_microstep: 1539.59 | bwd_inner_microstep: 1539.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1262 [2024-06-10 21:40:44,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 175.49 | bwd_microstep: 455.04 | bwd_inner_microstep: 455.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 21:40:46,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944 [2024-06-10 21:40:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.37 | bwd_microstep: 727.27 | bwd_inner_microstep: 727.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813 [2024-06-10 21:40:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.93 | bwd_microstep: 1699.52 | bwd_inner_microstep: 1699.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-10 21:40:51,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.95 | bwd_microstep: 1444.78 | bwd_inner_microstep: 1444.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 21:40:53,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 1543.56 | bwd_inner_microstep: 1543.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 21:40:59,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 21:40:59,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.78 | bwd_microstep: 5584.40 | bwd_inner_microstep: 2014.54 | bwd_allreduce_microstep: 3569.81 | step_microstep: 38.24 [2024-06-10 21:40:59,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15864.09 | bwd: 46461.88 | bwd_inner: 42891.17 | bwd_allreduce: 3570.04 | step: 39.73 {'loss': 1.1925, 'learning_rate': 8.451732758016322e-06, 'epoch': 0.71} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:41:01,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1371.50 | bwd_inner_microstep: 1371.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-10 21:41:02,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.27 | bwd_microstep: 707.52 | bwd_inner_microstep: 707.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459 [2024-06-10 21:41:04,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.76 | bwd_microstep: 1435.34 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-10 21:41:06,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.53 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 21:41:08,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.16 | bwd_microstep: 1242.94 | bwd_inner_microstep: 1242.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 21:41:10,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.97 | bwd_microstep: 1386.51 | bwd_inner_microstep: 1386.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 21:41:12,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1387.06 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956 [2024-06-10 21:41:13,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.86 | bwd_microstep: 701.60 | bwd_inner_microstep: 701.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 21:41:15,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1377.13 | bwd_inner_microstep: 1377.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3724 [2024-06-10 21:41:17,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1594.94 | bwd_inner_microstep: 1594.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1980 [2024-06-10 21:41:18,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.32 | bwd_microstep: 896.44 | bwd_inner_microstep: 896.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-10 21:41:20,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1493.98 | bwd_inner_microstep: 1493.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503 [2024-06-10 21:41:22,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1443.40 | bwd_inner_microstep: 1443.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975 [2024-06-10 21:41:23,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.55 | bwd_microstep: 858.46 | bwd_inner_microstep: 858.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661 [2024-06-10 21:41:25,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.03 | bwd_microstep: 1566.66 | bwd_inner_microstep: 1566.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 21:41:27,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1392.40 | bwd_inner_microstep: 1392.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-10 21:41:29,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.42 | bwd_microstep: 1286.23 | bwd_inner_microstep: 1286.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3675 [2024-06-10 21:41:31,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.29 | bwd_microstep: 1259.47 | bwd_inner_microstep: 1259.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2679 [2024-06-10 21:41:32,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.16 | bwd_microstep: 1118.52 | bwd_inner_microstep: 1118.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 21:41:34,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.29 | bwd_microstep: 1287.19 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 21:41:36,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1352.56 | bwd_inner_microstep: 1352.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2114 [2024-06-10 21:41:37,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.51 | bwd_microstep: 827.18 | bwd_inner_microstep: 827.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 21:41:39,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.61 | bwd_microstep: 1507.93 | bwd_inner_microstep: 1507.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1912 [2024-06-10 21:41:40,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 716.14 | bwd_inner_microstep: 716.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 21:41:42,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535 [2024-06-10 21:41:44,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1423.67 | bwd_inner_microstep: 1423.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-10 21:41:46,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.61 | bwd_microstep: 1507.55 | bwd_inner_microstep: 1507.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2032 [2024-06-10 21:41:47,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.50 | bwd_microstep: 714.61 | bwd_inner_microstep: 714.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-10 21:41:49,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.02 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-10 21:41:51,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.89 | bwd_microstep: 973.09 | bwd_inner_microstep: 973.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3806 [2024-06-10 21:41:53,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.15 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3636 [2024-06-10 21:42:02,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-10 21:42:02,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 8732.98 | bwd_inner_microstep: 1740.27 | bwd_allreduce_microstep: 6992.66 | step_microstep: 37.76 [2024-06-10 21:42:02,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14971.86 | bwd: 47118.75 | bwd_inner: 40125.19 | bwd_allreduce: 6992.89 | step: 39.20 {'loss': 1.1659, 'learning_rate': 8.421108435153964e-06, 'epoch': 0.71} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 21:42:04,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 1465.12 | bwd_inner_microstep: 1465.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3865 [2024-06-10 21:42:06,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.11 | bwd_microstep: 1361.98 | bwd_inner_microstep: 1361.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-10 21:42:08,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.27 | bwd_microstep: 1552.18 | bwd_inner_microstep: 1552.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 21:42:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-10 21:42:12,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1430.85 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899 [2024-06-10 21:42:13,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.99 | bwd_microstep: 777.37 | bwd_inner_microstep: 777.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732 [2024-06-10 21:42:15,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.14 | bwd_microstep: 1532.52 | bwd_inner_microstep: 1532.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 21:42:17,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1386.91 | bwd_inner_microstep: 1386.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-10 21:42:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.05 | bwd_microstep: 1413.17 | bwd_inner_microstep: 1413.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751 [2024-06-10 21:42:21,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.35 | bwd_microstep: 1640.04 | bwd_inner_microstep: 1640.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3546 [2024-06-10 21:42:23,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.18 | bwd_microstep: 1425.67 | bwd_inner_microstep: 1425.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 21:42:25,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-10 21:42:27,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.19 | bwd_microstep: 1525.17 | bwd_inner_microstep: 1525.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 21:42:29,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.36 | bwd_microstep: 1487.61 | bwd_inner_microstep: 1487.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 21:42:31,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1450.64 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3665 [2024-06-10 21:42:34,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.88 | bwd_microstep: 1720.26 | bwd_inner_microstep: 1720.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-10 21:42:36,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.83 | bwd_microstep: 1435.94 | bwd_inner_microstep: 1435.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3013 [2024-06-10 21:42:37,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.34 | bwd_microstep: 1275.02 | bwd_inner_microstep: 1274.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982 [2024-06-10 21:42:38,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.27 | bwd_microstep: 827.02 | bwd_inner_microstep: 826.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 21:42:40,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1378.40 | bwd_inner_microstep: 1378.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620 [2024-06-10 21:42:41,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.29 | bwd_microstep: 261.94 | bwd_inner_microstep: 261.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2171 [2024-06-10 21:42:42,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.53 | bwd_microstep: 760.66 | bwd_inner_microstep: 760.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 21:42:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-10 21:42:46,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1664.06 | bwd_inner_microstep: 1664.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3599 [2024-06-10 21:42:48,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1339.93 | bwd_inner_microstep: 1339.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3638 [2024-06-10 21:42:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.16 | bwd_microstep: 1419.63 | bwd_inner_microstep: 1419.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 21:42:51,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.13 | bwd_microstep: 1180.78 | bwd_inner_microstep: 1180.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-10 21:42:54,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.58 | bwd_microstep: 1758.13 | bwd_inner_microstep: 1758.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 21:42:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.12 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 21:42:58,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.32 | bwd_microstep: 1551.08 | bwd_inner_microstep: 1551.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 21:43:00,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.84 | bwd_microstep: 1547.24 | bwd_inner_microstep: 1547.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3806 [2024-06-10 21:43:03,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.60 [2024-06-10 21:43:03,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1925.72 | bwd_inner_microstep: 1571.11 | bwd_allreduce_microstep: 354.55 | step_microstep: 37.40 [2024-06-10 21:43:03,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16309.34 | bwd: 44138.97 | bwd_inner: 43783.52 | bwd_allreduce: 354.78 | step: 38.84 {'loss': 1.1002, 'learning_rate': 8.390524893087505e-06, 'epoch': 0.71} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470 [2024-06-10 21:43:05,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.29 | bwd_microstep: 1565.40 | bwd_inner_microstep: 1565.31 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 21:43:07,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1352.67 | bwd_inner_microstep: 1352.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:43:09,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1373.85 | bwd_inner_microstep: 1373.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 21:43:10,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 21:43:12,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.18 | bwd_microstep: 1390.81 | bwd_inner_microstep: 1390.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 21:43:14,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.98 | bwd_microstep: 1354.87 | bwd_inner_microstep: 1354.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3614 [2024-06-10 21:43:16,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1373.12 | bwd_inner_microstep: 1373.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2919 [2024-06-10 21:43:18,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.45 | bwd_microstep: 1091.02 | bwd_inner_microstep: 1091.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 21:43:19,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 793.15 | bwd_inner_microstep: 793.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3408 [2024-06-10 21:43:20,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.04 | bwd_microstep: 1181.44 | bwd_inner_microstep: 1181.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 21:43:22,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1380.32 | bwd_inner_microstep: 1380.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3499 [2024-06-10 21:43:24,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1316.05 | bwd_inner_microstep: 1316.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 21:43:26,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1344.20 | bwd_inner_microstep: 1344.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2608 [2024-06-10 21:43:27,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.34 | bwd_microstep: 1044.34 | bwd_inner_microstep: 1044.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3415 [2024-06-10 21:43:29,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1539.05 | bwd_inner_microstep: 1539.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3392 [2024-06-10 21:43:31,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.28 | bwd_microstep: 1337.30 | bwd_inner_microstep: 1337.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915 [2024-06-10 21:43:34,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.13 | bwd_microstep: 1592.83 | bwd_inner_microstep: 1592.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089 [2024-06-10 21:43:35,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.77 | bwd_microstep: 820.49 | bwd_inner_microstep: 820.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069 [2024-06-10 21:43:36,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.92 | bwd_microstep: 817.41 | bwd_inner_microstep: 817.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657 [2024-06-10 21:43:38,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.13 | bwd_microstep: 1418.86 | bwd_inner_microstep: 1418.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074 [2024-06-10 21:43:39,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.64 | bwd_microstep: 823.98 | bwd_inner_microstep: 823.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-10 21:43:41,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.21 | bwd_microstep: 1602.42 | bwd_inner_microstep: 1602.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167 [2024-06-10 21:43:42,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.60 | bwd_microstep: 852.83 | bwd_inner_microstep: 852.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928 [2024-06-10 21:43:43,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.70 | bwd_microstep: 697.09 | bwd_inner_microstep: 697.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3773 [2024-06-10 21:43:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.41 | bwd_microstep: 1250.33 | bwd_inner_microstep: 1250.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-10 21:43:47,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1295.08 | bwd_inner_microstep: 1295.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-10 21:43:49,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.25 | bwd_microstep: 1553.66 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829 [2024-06-10 21:43:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.83 | bwd_microstep: 1389.32 | bwd_inner_microstep: 1389.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 21:43:53,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.82 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2037 [2024-06-10 21:43:54,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.82 | bwd_microstep: 810.73 | bwd_inner_microstep: 810.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 21:43:56,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1349.67 | bwd_inner_microstep: 1349.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3601 [2024-06-10 21:44:04,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.27 | optimizer_step: 6.63 [2024-06-10 21:44:04,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.96 | bwd_microstep: 7727.74 | bwd_inner_microstep: 1769.39 | bwd_allreduce_microstep: 5958.30 | step_microstep: 38.54 [2024-06-10 21:44:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14976.38 | bwd: 46069.69 | bwd_inner: 40110.41 | bwd_allreduce: 5958.58 | step: 40.08 {'loss': 1.1271, 'learning_rate': 8.359982239532016e-06, 'epoch': 0.71} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 21:44:06,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.03 | bwd_microstep: 1242.46 | bwd_inner_microstep: 1242.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3966 [2024-06-10 21:44:08,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1692.91 | bwd_inner_microstep: 1692.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3843 [2024-06-10 21:44:10,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.34 | bwd_microstep: 1586.12 | bwd_inner_microstep: 1586.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2267 [2024-06-10 21:44:12,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.09 | bwd_microstep: 968.75 | bwd_inner_microstep: 968.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 21:44:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.63 | bwd_microstep: 1280.83 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3485 [2024-06-10 21:44:15,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.49 | bwd_microstep: 1348.13 | bwd_inner_microstep: 1348.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-10 21:44:17,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1247.13 | bwd_inner_microstep: 1247.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-10 21:44:19,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1251.42 | bwd_inner_microstep: 1251.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3691 [2024-06-10 21:44:21,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.12 | bwd_microstep: 1587.39 | bwd_inner_microstep: 1587.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 21:44:23,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.25 | bwd_microstep: 1352.12 | bwd_inner_microstep: 1352.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3695 [2024-06-10 21:44:25,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1659.36 | bwd_inner_microstep: 1659.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 21:44:27,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.08 | bwd_microstep: 1379.13 | bwd_inner_microstep: 1379.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3445 [2024-06-10 21:44:29,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1301.03 | bwd_inner_microstep: 1301.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603 [2024-06-10 21:44:31,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.98 | bwd_microstep: 1467.38 | bwd_inner_microstep: 1467.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 21:44:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.89 | bwd_microstep: 1371.14 | bwd_inner_microstep: 1371.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 21:44:34,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.08 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2481 [2024-06-10 21:44:36,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.20 | bwd_microstep: 1002.63 | bwd_inner_microstep: 1002.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628 [2024-06-10 21:44:38,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1441.95 | bwd_inner_microstep: 1441.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3651 [2024-06-10 21:44:40,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.87 | bwd_microstep: 1465.69 | bwd_inner_microstep: 1465.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610 [2024-06-10 21:44:42,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1611.36 | bwd_inner_microstep: 1611.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-10 21:44:44,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.69 | bwd_microstep: 1306.19 | bwd_inner_microstep: 1306.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 21:44:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-10 21:44:48,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.18 | bwd_microstep: 1404.98 | bwd_inner_microstep: 1404.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 21:44:50,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1353.61 | bwd_inner_microstep: 1353.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 21:44:52,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1609.11 | bwd_inner_microstep: 1609.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-10 21:44:54,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.44 | bwd_microstep: 1253.89 | bwd_inner_microstep: 1253.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-10 21:44:56,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1535.64 | bwd_inner_microstep: 1535.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3435 [2024-06-10 21:44:58,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.43 | bwd_microstep: 1406.14 | bwd_inner_microstep: 1406.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2045 [2024-06-10 21:44:59,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.74 | bwd_microstep: 906.40 | bwd_inner_microstep: 906.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 21:45:01,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1500.44 | bwd_inner_microstep: 1500.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 21:45:03,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.91 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 21:45:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.13 | optimizer_step: 6.60 [2024-06-10 21:45:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 2171.30 | bwd_inner_microstep: 1508.55 | bwd_allreduce_microstep: 662.70 | step_microstep: 37.63 [2024-06-10 21:45:06,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16507.99 | bwd: 45054.85 | bwd_inner: 44391.25 | bwd_allreduce: 662.93 | step: 39.11 2s/it] 70%|███████ | 1216/1726 [21:02:34<8:43:45, 61.62s/it] 71%|███████ | 1217/1726 [21:03:36<8:45:22, 61.93s/it] 71%|███████ | 1217/1726 [21:03:36<8:45:22, 61.93s/it] 71%|███████ | 1218/1726 [21:04:39<8:45:34, 62.08s/it] 71%|███████ | 1218/1726 [21:04:39<8:45:34, 62.08s/it] 71%|███████ | 1219/1726 [21:05:39<8:41:15, 61.69s/it] 71%|███████ | 1219/1726 [21:05:39<8:41:15, 61.69s/it] 71%|███████ | 1220/1726 [21:06:41<8:39:26, 61.59s/it] 71%|███████ | 1220/1726 [21:06:41<8:39:26, 61.59s/it] 71%|███████ | 1221/1726 [21:07:43<8:39:10, 61.69s/it] {'loss': 1.2058, 'learning_rate': 8.329480582058574e-06, 'epoch': 0.71} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510 [2024-06-10 21:45:08,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1218.80 | bwd_inner_microstep: 1218.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3477 [2024-06-10 21:45:10,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.83 | bwd_microstep: 1344.26 | bwd_inner_microstep: 1344.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-10 21:45:11,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1397.21 | bwd_inner_microstep: 1397.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-10 21:45:14,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.10 | bwd_microstep: 1645.41 | bwd_inner_microstep: 1645.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 21:45:15,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1245.82 | bwd_inner_microstep: 1245.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 21:45:17,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1279.58 | bwd_inner_microstep: 1279.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 21:45:19,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1246.69 | bwd_inner_microstep: 1246.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 21:45:21,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1246.36 | bwd_inner_microstep: 1246.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1018 [2024-06-10 21:45:21,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 163.64 | bwd_microstep: 427.81 | bwd_inner_microstep: 427.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 21:45:22,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.60 | bwd_inner_microstep: 790.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 21:45:24,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1284.75 | bwd_inner_microstep: 1284.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 21:45:26,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.45 | bwd_microstep: 1279.83 | bwd_inner_microstep: 1279.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 21:45:28,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1480.24 | bwd_inner_microstep: 1480.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-10 21:45:30,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1492.03 | bwd_inner_microstep: 1492.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3628 [2024-06-10 21:45:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.33 | bwd_microstep: 1644.80 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 21:45:34,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.17 | bwd_microstep: 1377.66 | bwd_inner_microstep: 1377.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 21:45:36,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.92 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 21:45:38,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1413.58 | bwd_inner_microstep: 1413.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 21:45:40,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3539 [2024-06-10 21:45:42,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1325.72 | bwd_inner_microstep: 1325.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442 [2024-06-10 21:45:43,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.88 | bwd_microstep: 1181.65 | bwd_inner_microstep: 1181.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2206 [2024-06-10 21:45:44,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.20 | bwd_microstep: 767.09 | bwd_inner_microstep: 767.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1947 [2024-06-10 21:45:45,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.43 | bwd_microstep: 761.69 | bwd_inner_microstep: 761.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 21:45:47,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1413.62 | bwd_inner_microstep: 1413.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3423 [2024-06-10 21:45:49,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1300.29 | bwd_inner_microstep: 1300.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2185 [2024-06-10 21:45:50,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.89 | bwd_microstep: 889.88 | bwd_inner_microstep: 889.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3564 [2024-06-10 21:45:52,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1360.72 | bwd_inner_microstep: 1360.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3387 [2024-06-10 21:45:54,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.19 | bwd_microstep: 1438.13 | bwd_inner_microstep: 1438.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2277 [2024-06-10 21:45:55,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.60 | bwd_microstep: 828.19 | bwd_inner_microstep: 828.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3540 [2024-06-10 21:45:57,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1448.84 | bwd_inner_microstep: 1448.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3780 [2024-06-10 21:45:59,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.08 | bwd_microstep: 1444.70 | bwd_inner_microstep: 1444.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2055 [2024-06-10 21:46:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.29 | optimizer_step: 6.62 [2024-06-10 21:46:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.45 | bwd_microstep: 8414.19 | bwd_inner_microstep: 974.90 | bwd_allreduce_microstep: 7439.23 | step_microstep: 38.34 [2024-06-10 21:46:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14869.54 | bwd: 47071.54 | bwd_inner: 39631.40 | bwd_allreduce: 7439.46 | step: 39.78 {'loss': 1.2015, 'learning_rate': 8.299020028093844e-06, 'epoch': 0.71} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-10 21:46:10,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1399.84 | bwd_inner_microstep: 1399.71 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3981 [2024-06-10 21:46:12,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.41 | bwd_microstep: 1703.05 | bwd_inner_microstep: 1703.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4223 [2024-06-10 21:46:15,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1654.14 | bwd_inner_microstep: 1654.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3811 [2024-06-10 21:46:17,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.91 | bwd_microstep: 1498.29 | bwd_inner_microstep: 1498.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 21:46:19,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740 [2024-06-10 21:46:21,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.15 | bwd_microstep: 1331.06 | bwd_inner_microstep: 1331.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-10 21:46:22,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486 [2024-06-10 21:46:23,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.89 | bwd_microstep: 1186.48 | bwd_inner_microstep: 1186.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 21:46:25,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-10 21:46:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1250.32 | bwd_inner_microstep: 1250.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 21:46:29,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.63 | bwd_inner_microstep: 1245.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 21:46:31,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1336.96 | bwd_inner_microstep: 1336.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-10 21:46:32,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.94 | bwd_microstep: 1441.56 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659 [2024-06-10 21:46:34,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.97 | bwd_microstep: 1438.88 | bwd_inner_microstep: 1438.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3420 [2024-06-10 21:46:37,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.66 | bwd_microstep: 1536.56 | bwd_inner_microstep: 1536.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-10 21:46:39,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.76 | bwd_microstep: 1501.62 | bwd_inner_microstep: 1501.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 21:46:41,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.01 | bwd_microstep: 1487.07 | bwd_inner_microstep: 1487.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2111 [2024-06-10 21:46:42,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.50 | bwd_microstep: 1016.14 | bwd_inner_microstep: 1016.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 21:46:44,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1287.40 | bwd_inner_microstep: 1287.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-10 21:46:45,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 976.06 | bwd_inner_microstep: 976.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2170 [2024-06-10 21:46:46,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.24 | bwd_microstep: 885.95 | bwd_inner_microstep: 885.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-10 21:46:48,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.47 | bwd_microstep: 972.49 | bwd_inner_microstep: 972.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1942 [2024-06-10 21:46:49,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.86 | bwd_microstep: 760.65 | bwd_inner_microstep: 760.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2273 [2024-06-10 21:46:50,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.10 | bwd_microstep: 1067.49 | bwd_inner_microstep: 1067.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 21:46:52,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1452.92 | bwd_inner_microstep: 1452.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2282 [2024-06-10 21:46:54,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.21 | bwd_microstep: 1006.31 | bwd_inner_microstep: 1006.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3800 [2024-06-10 21:46:56,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.42 | bwd_microstep: 1651.58 | bwd_inner_microstep: 1651.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-10 21:46:57,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.51 | bwd_microstep: 697.87 | bwd_inner_microstep: 697.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 21:46:59,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1652.27 | bwd_inner_microstep: 1652.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004 [2024-06-10 21:47:00,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.11 | bwd_microstep: 709.73 | bwd_inner_microstep: 709.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-10 21:47:02,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.27 | bwd_microstep: 1494.45 | bwd_inner_microstep: 1494.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2238 [2024-06-10 21:47:10,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.12 | optimizer_step: 6.59 [2024-06-10 21:47:10,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.27 | bwd_microstep: 7615.54 | bwd_inner_microstep: 982.37 | bwd_allreduce_microstep: 6633.10 | step_microstep: 38.03 [2024-06-10 21:47:10,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14990.09 | bwd: 46776.48 | bwd_inner: 40142.34 | bwd_allreduce: 6633.40 | step: 39.66 {'loss': 1.2402, 'learning_rate': 8.268600684919765e-06, 'epoch': 0.71} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 21:47:12,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1462.20 | bwd_inner_microstep: 1462.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3971 [2024-06-10 21:47:14,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.44 | bwd_microstep: 1496.86 | bwd_inner_microstep: 1496.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 21:47:16,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.28 | bwd_microstep: 1444.68 | bwd_inner_microstep: 1444.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 21:47:18,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.05 | bwd_microstep: 1287.55 | bwd_inner_microstep: 1287.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 21:47:20,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.42 | bwd_microstep: 1283.16 | bwd_inner_microstep: 1283.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 21:47:22,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1394.02 | bwd_inner_microstep: 1393.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 21:47:24,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.51 | bwd_microstep: 1379.35 | bwd_inner_microstep: 1379.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 21:47:26,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.78 | bwd_microstep: 1281.09 | bwd_inner_microstep: 1281.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629 [2024-06-10 21:47:28,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1535.08 | bwd_inner_microstep: 1535.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3038 [2024-06-10 21:47:29,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.29 | bwd_microstep: 1203.61 | bwd_inner_microstep: 1203.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.86 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553 [2024-06-10 21:47:31,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.43 | bwd_microstep: 1325.66 | bwd_inner_microstep: 1325.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-10 21:47:33,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 21:47:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1241.74 | bwd_inner_microstep: 1241.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 21:47:37,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.54 | bwd_microstep: 1479.16 | bwd_inner_microstep: 1479.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-10 21:47:39,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1378.54 | bwd_inner_microstep: 1378.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-10 21:47:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.80 | bwd_microstep: 1288.86 | bwd_inner_microstep: 1288.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3647 [2024-06-10 21:47:42,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.81 | bwd_microstep: 1315.21 | bwd_inner_microstep: 1315.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899 [2024-06-10 21:47:43,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.07 | bwd_microstep: 371.48 | bwd_inner_microstep: 371.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 21:47:45,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.16 | bwd_microstep: 1393.94 | bwd_inner_microstep: 1393.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 21:47:47,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1653.80 | bwd_inner_microstep: 1653.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 21:47:49,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.33 | bwd_microstep: 1399.34 | bwd_inner_microstep: 1399.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 21:47:51,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.90 | bwd_microstep: 1634.62 | bwd_inner_microstep: 1634.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974 [2024-06-10 21:47:52,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.33 | bwd_microstep: 704.84 | bwd_inner_microstep: 704.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-10 21:47:53,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.82 | bwd_microstep: 813.38 | bwd_inner_microstep: 813.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3817 [2024-06-10 21:47:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1474.91 | bwd_inner_microstep: 1474.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481 [2024-06-10 21:47:57,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.52 | bwd_microstep: 1217.51 | bwd_inner_microstep: 1217.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 21:47:59,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1380.91 | bwd_inner_microstep: 1380.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 21:48:01,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.52 | bwd_microstep: 1298.37 | bwd_inner_microstep: 1298.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-10 21:48:03,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.70 | bwd_microstep: 1341.92 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3600 [2024-06-10 21:48:05,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.10 | bwd_microstep: 1665.89 | bwd_inner_microstep: 1665.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818 [2024-06-10 21:48:07,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.65 | bwd_microstep: 1599.40 | bwd_inner_microstep: 1599.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054 [2024-06-10 21:48:09,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.28 | optimizer_step: 6.59 [2024-06-10 21:48:09,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.00 | bwd_microstep: 1964.44 | bwd_inner_microstep: 1041.43 | bwd_allreduce_microstep: 922.96 | step_microstep: 39.16 [2024-06-10 21:48:09,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15783.97 | bwd: 43058.72 | bwd_inner: 42134.85 | bwd_allreduce: 923.19 | step: 42.45 {'loss': 1.2039, 'learning_rate': 8.238222659673071e-06, 'epoch': 0.71} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3464 [2024-06-10 21:48:12,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.73 | bwd_microstep: 1570.26 | bwd_inner_microstep: 1570.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3928 [2024-06-10 21:48:14,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.23 | bwd_microstep: 1493.37 | bwd_inner_microstep: 1493.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2411 [2024-06-10 21:48:15,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.65 | bwd_microstep: 1001.68 | bwd_inner_microstep: 1001.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4267 [2024-06-10 21:48:18,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.40 | bwd_microstep: 1765.40 | bwd_inner_microstep: 1765.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-10 21:48:20,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1446.90 | bwd_inner_microstep: 1446.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 21:48:22,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.33 | bwd_microstep: 1540.78 | bwd_inner_microstep: 1540.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741 [2024-06-10 21:48:24,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.52 | bwd_microstep: 1533.84 | bwd_inner_microstep: 1533.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-10 21:48:26,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.78 | bwd_microstep: 1283.90 | bwd_inner_microstep: 1283.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 21:48:27,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 21:48:29,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.54 | bwd_microstep: 1384.64 | bwd_inner_microstep: 1384.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 21:48:31,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1427.28 | bwd_inner_microstep: 1427.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-10 21:48:33,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1397.55 | bwd_inner_microstep: 1397.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452 [2024-06-10 21:48:35,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.27 | bwd_microstep: 1318.42 | bwd_inner_microstep: 1318.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542 [2024-06-10 21:48:37,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.73 | bwd_microstep: 1454.97 | bwd_inner_microstep: 1454.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-10 21:48:39,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.19 | bwd_microstep: 1311.59 | bwd_inner_microstep: 1311.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3660 [2024-06-10 21:48:41,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.83 | bwd_microstep: 1716.98 | bwd_inner_microstep: 1716.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 21:48:43,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1285.31 | bwd_inner_microstep: 1285.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 21:48:45,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.83 | bwd_microstep: 1482.59 | bwd_inner_microstep: 1482.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089 [2024-06-10 21:48:46,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.46 | bwd_microstep: 854.30 | bwd_inner_microstep: 854.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3962 [2024-06-10 21:48:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.88 | bwd_microstep: 1803.85 | bwd_inner_microstep: 1803.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647 [2024-06-10 21:48:51,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.59 | bwd_microstep: 1615.23 | bwd_inner_microstep: 1615.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466 [2024-06-10 21:48:53,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.11 | bwd_microstep: 1181.40 | bwd_inner_microstep: 1181.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678 [2024-06-10 21:48:55,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.96 | bwd_microstep: 1429.17 | bwd_inner_microstep: 1429.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-10 21:48:56,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1288.56 | bwd_inner_microstep: 1288.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3585 [2024-06-10 21:48:58,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.88 | bwd_microstep: 1428.06 | bwd_inner_microstep: 1428.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3546 [2024-06-10 21:49:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.92 | bwd_microstep: 1567.45 | bwd_inner_microstep: 1567.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469 [2024-06-10 21:49:02,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.11 | bwd_microstep: 1309.42 | bwd_inner_microstep: 1309.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2073 [2024-06-10 21:49:04,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.20 | bwd_microstep: 916.33 | bwd_inner_microstep: 916.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 21:49:06,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.70 | bwd_microstep: 1399.50 | bwd_inner_microstep: 1399.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3449 [2024-06-10 21:49:08,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.36 | bwd_microstep: 1514.04 | bwd_inner_microstep: 1514.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 21:49:10,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1505.22 | bwd_inner_microstep: 1505.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-10 21:49:12,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.04 | optimizer_step: 6.62 [2024-06-10 21:49:12,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.93 | bwd_microstep: 1450.26 | bwd_inner_microstep: 1442.61 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.35 [2024-06-10 21:49:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16814.27 | bwd: 45063.10 | bwd_inner: 45054.60 | bwd_allreduce: 7.83 | step: 38.83 {'loss': 1.1784, 'learning_rate': 8.207886059345034e-06, 'epoch': 0.71} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422 [2024-06-10 21:49:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1377.29 | bwd_inner_microstep: 1377.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2448 [2024-06-10 21:49:15,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.48 | bwd_microstep: 948.13 | bwd_inner_microstep: 948.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 21:49:17,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.54 | bwd_microstep: 1343.50 | bwd_inner_microstep: 1343.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843 [2024-06-10 21:49:19,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1663.75 | bwd_inner_microstep: 1663.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-10 21:49:21,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1444.40 | bwd_inner_microstep: 1444.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 21:49:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.23 | bwd_microstep: 1383.64 | bwd_inner_microstep: 1383.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3771 [2024-06-10 21:49:25,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1488.19 | bwd_inner_microstep: 1488.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 21:49:27,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.05 | bwd_microstep: 1285.87 | bwd_inner_microstep: 1285.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473 [2024-06-10 21:49:28,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1183.99 | bwd_inner_microstep: 1183.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 21:49:30,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916 [2024-06-10 21:49:31,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.71 | bwd_microstep: 718.02 | bwd_inner_microstep: 718.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010 [2024-06-10 21:49:32,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.19 | bwd_microstep: 806.24 | bwd_inner_microstep: 806.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3511 [2024-06-10 21:49:34,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.50 | bwd_microstep: 1445.14 | bwd_inner_microstep: 1445.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1928 [2024-06-10 21:49:36,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.52 | bwd_microstep: 851.48 | bwd_inner_microstep: 851.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3641 [2024-06-10 21:49:38,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.99 | bwd_microstep: 1551.44 | bwd_inner_microstep: 1551.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 21:49:40,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.03 | bwd_microstep: 1340.02 | bwd_inner_microstep: 1340.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 21:49:42,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897 [2024-06-10 21:49:43,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.61 | bwd_microstep: 777.92 | bwd_inner_microstep: 777.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850 [2024-06-10 21:49:45,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.33 | bwd_microstep: 1664.61 | bwd_inner_microstep: 1664.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3869 [2024-06-10 21:49:47,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1471.90 | bwd_inner_microstep: 1471.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-10 21:49:49,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1491.81 | bwd_inner_microstep: 1491.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 21:49:51,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.69 | bwd_microstep: 1344.27 | bwd_inner_microstep: 1344.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-10 21:49:53,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1576.32 | bwd_inner_microstep: 1576.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 21:49:55,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1348.55 | bwd_inner_microstep: 1348.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-10 21:49:57,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1247.99 | bwd_inner_microstep: 1247.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2017 [2024-06-10 21:49:58,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.24 | bwd_microstep: 713.96 | bwd_inner_microstep: 713.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012 [2024-06-10 21:49:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.00 | bwd_microstep: 803.07 | bwd_inner_microstep: 803.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 21:50:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.29 | bwd_microstep: 1287.93 | bwd_inner_microstep: 1287.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3599 [2024-06-10 21:50:03,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.93 | bwd_microstep: 1372.06 | bwd_inner_microstep: 1372.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 21:50:04,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1297.46 | bwd_inner_microstep: 1297.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584 [2024-06-10 21:50:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.73 | bwd_microstep: 1528.43 | bwd_inner_microstep: 1528.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771 [2024-06-10 21:50:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.34 | optimizer_step: 6.60 [2024-06-10 21:50:12,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.28 | bwd_microstep: 4958.90 | bwd_inner_microstep: 1781.41 | bwd_allreduce_microstep: 3177.43 | step_microstep: 38.56 [2024-06-10 21:50:12,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15438.52 | bwd: 44631.40 | bwd_inner: 41453.04 | bwd_allreduce: 3177.67 | step: 40.07 71%|███████ | 1221/1726 [21:07:43<8:39:10, 61.69s/it] 71%|███████ | 1222/1726 [21:08:45<8:39:36, 61.86s/it] 71%|███████ | 1222/1726 [21:08:45<8:39:36, 61.86s/it] 71%|███████ | 1223/1726 [21:09:47<8:39:11, 61.93s/it] 71%|███████ | 1223/1726 [21:09:47<8:39:11, 61.93s/it] 71%|███████ | 1224/1726 [21:10:46<8:31:13, 61.10s/it] 71%|███████ | 1224/1726 [21:10:46<8:31:13, 61.10s/it] 71%|███████ | 1225/1726 [21:11:48<8:33:00, 61.44s/it] 71%|███████ | 1225/1726 [21:11:48<8:33:00, 61.44s/it] 71%|███████ | 1226/1726 [21:12:49<8:29:22, 61.13s/it] {'loss': 1.1285, 'learning_rate': 8.177590990780988e-06, 'epoch': 0.71} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 21:50:14,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1248.43 | bwd_inner_microstep: 1248.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2437 [2024-06-10 21:50:15,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.92 | bwd_microstep: 1011.65 | bwd_inner_microstep: 1011.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2332 [2024-06-10 21:50:17,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.74 | bwd_microstep: 984.16 | bwd_inner_microstep: 984.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 21:50:18,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1242.82 | bwd_inner_microstep: 1242.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-10 21:50:20,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.86 | bwd_microstep: 1279.93 | bwd_inner_microstep: 1279.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 21:50:22,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.86 | bwd_microstep: 1384.65 | bwd_inner_microstep: 1384.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1987 [2024-06-10 21:50:23,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.88 | bwd_microstep: 737.48 | bwd_inner_microstep: 737.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-10 21:50:25,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1532.45 | bwd_inner_microstep: 1532.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098 [2024-06-10 21:50:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.53 | bwd_microstep: 852.87 | bwd_inner_microstep: 852.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674 [2024-06-10 21:50:29,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.14 | bwd_microstep: 1582.72 | bwd_inner_microstep: 1582.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2957 [2024-06-10 21:50:30,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.57 | bwd_microstep: 1192.23 | bwd_inner_microstep: 1192.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3448 [2024-06-10 21:50:32,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.22 | bwd_microstep: 1548.73 | bwd_inner_microstep: 1548.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1950 [2024-06-10 21:50:34,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.82 | bwd_microstep: 885.07 | bwd_inner_microstep: 885.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 21:50:35,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1348.90 | bwd_inner_microstep: 1348.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2731 [2024-06-10 21:50:37,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.09 | bwd_microstep: 1232.31 | bwd_inner_microstep: 1232.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478 [2024-06-10 21:50:39,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1184.82 | bwd_inner_microstep: 1184.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-10 21:50:40,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.91 | bwd_microstep: 1156.21 | bwd_inner_microstep: 1156.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521 [2024-06-10 21:50:42,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1454.70 | bwd_inner_microstep: 1454.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 21:50:44,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3658 [2024-06-10 21:50:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.99 | bwd_microstep: 1450.17 | bwd_inner_microstep: 1450.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3449 [2024-06-10 21:50:48,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-10 21:50:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.72 | bwd_microstep: 802.20 | bwd_inner_microstep: 802.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 21:50:51,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.84 | bwd_microstep: 1285.40 | bwd_inner_microstep: 1285.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 21:50:53,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.89 | bwd_microstep: 1500.71 | bwd_inner_microstep: 1500.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-10 21:50:55,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1308.77 | bwd_inner_microstep: 1308.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3713 [2024-06-10 21:50:57,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1478.81 | bwd_inner_microstep: 1478.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 21:50:59,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1543.00 | bwd_inner_microstep: 1542.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-10 21:51:01,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1634.94 | bwd_inner_microstep: 1634.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-10 21:51:03,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1450.58 | bwd_inner_microstep: 1450.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3425 [2024-06-10 21:51:05,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1308.11 | bwd_inner_microstep: 1308.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-10 21:51:07,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.48 | bwd_microstep: 1589.18 | bwd_inner_microstep: 1589.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-10 21:51:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.22 | optimizer_step: 6.58 [2024-06-10 21:51:15,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 6936.90 | bwd_inner_microstep: 1699.79 | bwd_allreduce_microstep: 5237.06 | step_microstep: 37.96 [2024-06-10 21:51:15,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15547.55 | bwd: 46945.22 | bwd_inner: 41707.23 | bwd_allreduce: 5237.29 | step: 39.50 {'loss': 1.1495, 'learning_rate': 8.147337560680022e-06, 'epoch': 0.71} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-10 21:51:16,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 890.09 | bwd_inner_microstep: 889.95 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-10 21:51:18,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1475.34 | bwd_inner_microstep: 1475.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3844 [2024-06-10 21:51:20,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1456.80 | bwd_inner_microstep: 1456.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431 [2024-06-10 21:51:22,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.68 | bwd_microstep: 1157.11 | bwd_inner_microstep: 1157.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228 [2024-06-10 21:51:23,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.45 | bwd_microstep: 957.52 | bwd_inner_microstep: 957.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 21:51:25,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1388.86 | bwd_inner_microstep: 1388.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884 [2024-06-10 21:51:26,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.98 | bwd_microstep: 679.32 | bwd_inner_microstep: 679.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638 [2024-06-10 21:51:28,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.64 | bwd_microstep: 1313.41 | bwd_inner_microstep: 1313.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 21:51:30,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1388.40 | bwd_inner_microstep: 1388.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 21:51:31,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1180.25 | bwd_inner_microstep: 1180.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2140 [2024-06-10 21:51:33,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.77 | bwd_microstep: 891.53 | bwd_inner_microstep: 891.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487 [2024-06-10 21:51:35,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.71 | bwd_microstep: 1576.72 | bwd_inner_microstep: 1576.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 21:51:37,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1391.90 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463 [2024-06-10 21:51:39,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1404.65 | bwd_inner_microstep: 1404.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 21:51:41,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.80 | bwd_microstep: 1382.51 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3559 [2024-06-10 21:51:43,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.63 | bwd_microstep: 1691.99 | bwd_inner_microstep: 1691.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823 [2024-06-10 21:51:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.60 | bwd_microstep: 1749.21 | bwd_inner_microstep: 1749.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-10 21:51:46,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 799.92 | bwd_inner_microstep: 799.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2294 [2024-06-10 21:51:48,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.89 | bwd_microstep: 975.12 | bwd_inner_microstep: 975.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-10 21:51:50,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.07 | bwd_microstep: 1408.66 | bwd_inner_microstep: 1408.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 21:51:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1494.20 | bwd_inner_microstep: 1494.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-10 21:51:54,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.23 | bwd_microstep: 1343.13 | bwd_inner_microstep: 1343.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3608 [2024-06-10 21:51:56,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.38 | bwd_microstep: 1641.01 | bwd_inner_microstep: 1640.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 21:51:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 1454.38 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 21:52:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1279.88 | bwd_inner_microstep: 1279.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 21:52:02,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1495.63 | bwd_inner_microstep: 1495.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-10 21:52:03,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.80 | bwd_microstep: 1200.94 | bwd_inner_microstep: 1200.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565 [2024-06-10 21:52:06,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-10 21:52:07,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.70 | bwd_microstep: 802.02 | bwd_inner_microstep: 801.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 21:52:09,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.45 | bwd_microstep: 1477.13 | bwd_inner_microstep: 1477.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3765 [2024-06-10 21:52:11,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.68 | bwd_microstep: 1843.68 | bwd_inner_microstep: 1843.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2139 [2024-06-10 21:52:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-10 21:52:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.10 | bwd_microstep: 6108.03 | bwd_inner_microstep: 982.33 | bwd_allreduce_microstep: 5125.65 | step_microstep: 37.81 [2024-06-10 21:52:18,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15578.73 | bwd: 46857.63 | bwd_inner: 41730.98 | bwd_allreduce: 5125.93 | step: 39.28 {'loss': 1.18, 'learning_rate': 8.11712587559455e-06, 'epoch': 0.71} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483 [2024-06-10 21:52:20,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1569.70 | bwd_inner_microstep: 1569.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3947 [2024-06-10 21:52:22,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.11 | bwd_microstep: 1489.02 | bwd_inner_microstep: 1488.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4471 [2024-06-10 21:52:24,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.36 | bwd_microstep: 1824.91 | bwd_inner_microstep: 1824.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-10 21:52:26,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1492.70 | bwd_inner_microstep: 1492.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 21:52:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 1278.67 | bwd_inner_microstep: 1278.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 21:52:30,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-10 21:52:32,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.49 | bwd_microstep: 1274.55 | bwd_inner_microstep: 1274.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 21:52:34,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.59 | bwd_microstep: 1473.00 | bwd_inner_microstep: 1472.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 21:52:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.23 | bwd_microstep: 1403.02 | bwd_inner_microstep: 1402.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3479 [2024-06-10 21:52:38,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 21:52:40,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1376.52 | bwd_inner_microstep: 1376.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497 [2024-06-10 21:52:42,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.48 | bwd_microstep: 1503.18 | bwd_inner_microstep: 1503.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 21:52:44,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1384.21 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 21:52:45,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1353.46 | bwd_inner_microstep: 1353.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3503 [2024-06-10 21:52:47,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.03 | bwd_microstep: 1462.71 | bwd_inner_microstep: 1462.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 21:52:49,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1377.22 | bwd_inner_microstep: 1377.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 21:52:51,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1510.38 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-10 21:52:53,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.43 | bwd_microstep: 1386.87 | bwd_inner_microstep: 1386.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831 [2024-06-10 21:52:55,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1385.82 | bwd_inner_microstep: 1385.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-10 21:52:57,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1343.98 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-10 21:52:59,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.68 | bwd_microstep: 1622.91 | bwd_inner_microstep: 1622.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-10 21:53:01,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-10 21:53:03,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1484.62 | bwd_inner_microstep: 1484.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 21:53:05,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.13 | bwd_microstep: 1551.70 | bwd_inner_microstep: 1551.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2003 [2024-06-10 21:53:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.69 | bwd_microstep: 739.10 | bwd_inner_microstep: 739.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829 [2024-06-10 21:53:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.73 | bwd_microstep: 1358.04 | bwd_inner_microstep: 1358.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-10 21:53:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-10 21:53:12,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.45 | bwd_microstep: 1440.42 | bwd_inner_microstep: 1440.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2026 [2024-06-10 21:53:14,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.04 | bwd_microstep: 839.17 | bwd_inner_microstep: 839.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2033 [2024-06-10 21:53:15,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.01 | bwd_microstep: 714.15 | bwd_inner_microstep: 714.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 21:53:16,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1408.10 | bwd_inner_microstep: 1408.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3572 [2024-06-10 21:53:21,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.32 | optimizer_step: 6.59 [2024-06-10 21:53:21,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.19 | bwd_microstep: 4266.48 | bwd_inner_microstep: 1757.07 | bwd_allreduce_microstep: 2509.34 | step_microstep: 38.57 [2024-06-10 21:53:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16511.06 | bwd: 46825.65 | bwd_inner: 44315.39 | bwd_allreduce: 2509.58 | step: 39.98 {'loss': 1.2666, 'learning_rate': 8.08695604192997e-06, 'epoch': 0.71} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3513 [2024-06-10 21:53:23,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.39 | bwd_microstep: 1340.15 | bwd_inner_microstep: 1340.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1930 [2024-06-10 21:53:24,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 789.42 | bwd_inner_microstep: 789.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 21:53:26,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1377.67 | bwd_inner_microstep: 1377.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3877 [2024-06-10 21:53:28,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.49 | bwd_microstep: 1480.37 | bwd_inner_microstep: 1480.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-10 21:53:30,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 21:53:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.08 | bwd_microstep: 1292.31 | bwd_inner_microstep: 1292.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851 [2024-06-10 21:53:34,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1463.22 | bwd_inner_microstep: 1463.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2383 [2024-06-10 21:53:35,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 932.68 | bwd_inner_microstep: 932.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 21:53:37,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1247.71 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3691 [2024-06-10 21:53:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1327.16 | bwd_inner_microstep: 1327.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2101 [2024-06-10 21:53:40,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.95 | bwd_microstep: 760.32 | bwd_inner_microstep: 760.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 21:53:42,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1348.97 | bwd_inner_microstep: 1348.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3520 [2024-06-10 21:53:44,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1461.40 | bwd_inner_microstep: 1461.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-10 21:53:46,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.44 | bwd_microstep: 1447.27 | bwd_inner_microstep: 1447.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1958 [2024-06-10 21:53:47,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.65 | bwd_microstep: 852.51 | bwd_inner_microstep: 852.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 21:53:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1253.97 | bwd_inner_microstep: 1253.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970 [2024-06-10 21:53:50,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 798.29 | bwd_inner_microstep: 798.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-10 21:53:52,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1387.42 | bwd_inner_microstep: 1387.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3600 [2024-06-10 21:53:54,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.23 | bwd_microstep: 1470.61 | bwd_inner_microstep: 1470.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-10 21:53:56,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.20 | bwd_microstep: 1300.52 | bwd_inner_microstep: 1300.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-10 21:53:58,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.62 | bwd_microstep: 1660.49 | bwd_inner_microstep: 1660.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 21:54:00,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1253.44 | bwd_inner_microstep: 1253.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 21:54:02,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1349.45 | bwd_inner_microstep: 1349.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3703 [2024-06-10 21:54:04,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.95 | bwd_microstep: 1483.06 | bwd_inner_microstep: 1483.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-10 21:54:05,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 798.95 | bwd_inner_microstep: 798.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 21:54:06,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.85 | bwd_microstep: 885.07 | bwd_inner_microstep: 885.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3583 [2024-06-10 21:54:08,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1501.90 | bwd_inner_microstep: 1501.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3612 [2024-06-10 21:54:10,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1705.28 | bwd_inner_microstep: 1705.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 21:54:12,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 1493.21 | bwd_inner_microstep: 1493.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900 [2024-06-10 21:54:14,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.79 | bwd_microstep: 777.40 | bwd_inner_microstep: 777.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-10 21:54:16,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.36 | bwd_microstep: 1496.80 | bwd_inner_microstep: 1496.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3587 [2024-06-10 21:54:23,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 21:54:23,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 6900.89 | bwd_inner_microstep: 1543.84 | bwd_allreduce_microstep: 5357.00 | step_microstep: 38.01 [2024-06-10 21:54:23,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15209.21 | bwd: 46127.06 | bwd_inner: 40769.17 | bwd_allreduce: 5357.22 | step: 39.49 {'loss': 1.1472, 'learning_rate': 8.056828165944282e-06, 'epoch': 0.71} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034 [2024-06-10 21:54:24,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.25 | bwd_microstep: 893.40 | bwd_inner_microstep: 893.33 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-10 21:54:26,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1375.68 | bwd_inner_microstep: 1375.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846 [2024-06-10 21:54:28,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.58 | bwd_microstep: 1455.41 | bwd_inner_microstep: 1455.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 21:54:30,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.85 | bwd_microstep: 1490.67 | bwd_inner_microstep: 1490.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 21:54:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1380.26 | bwd_inner_microstep: 1380.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 21:54:34,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.62 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-10 21:54:36,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-10 21:54:37,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.02 | bwd_microstep: 678.34 | bwd_inner_microstep: 678.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3410 [2024-06-10 21:54:38,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.73 | bwd_microstep: 1197.44 | bwd_inner_microstep: 1197.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2645 [2024-06-10 21:54:40,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.97 | bwd_microstep: 1114.50 | bwd_inner_microstep: 1114.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1878 [2024-06-10 21:54:41,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.94 | bwd_microstep: 832.80 | bwd_inner_microstep: 832.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795 [2024-06-10 21:54:43,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.17 | bwd_microstep: 1640.13 | bwd_inner_microstep: 1640.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-10 21:54:45,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.63 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 21:54:46,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.82 | bwd_microstep: 792.52 | bwd_inner_microstep: 792.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1389 [2024-06-10 21:54:47,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.86 | bwd_microstep: 525.60 | bwd_inner_microstep: 525.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659 [2024-06-10 21:54:49,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1456.36 | bwd_inner_microstep: 1456.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633 [2024-06-10 21:54:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1311.32 | bwd_inner_microstep: 1311.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-10 21:54:53,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.92 | bwd_microstep: 1419.39 | bwd_inner_microstep: 1419.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 21:54:55,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.51 | bwd_microstep: 1286.30 | bwd_inner_microstep: 1286.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 21:54:56,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1255.20 | bwd_inner_microstep: 1255.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449 [2024-06-10 21:54:58,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.20 | bwd_microstep: 1314.98 | bwd_inner_microstep: 1314.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3711 [2024-06-10 21:55:00,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1332.27 | bwd_inner_microstep: 1332.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 21:55:02,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.03 | bwd_microstep: 1465.37 | bwd_inner_microstep: 1465.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3819 [2024-06-10 21:55:04,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.73 | bwd_microstep: 1414.16 | bwd_inner_microstep: 1414.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2245 [2024-06-10 21:55:05,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.88 | bwd_microstep: 871.96 | bwd_inner_microstep: 871.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 903 [2024-06-10 21:55:06,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 159.67 | bwd_microstep: 404.11 | bwd_inner_microstep: 404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2060 [2024-06-10 21:55:07,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.65 | bwd_microstep: 911.24 | bwd_inner_microstep: 911.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2041 [2024-06-10 21:55:08,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.66 | bwd_microstep: 933.32 | bwd_inner_microstep: 933.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596 [2024-06-10 21:55:10,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101 [2024-06-10 21:55:11,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.74 | bwd_microstep: 821.91 | bwd_inner_microstep: 821.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3610 [2024-06-10 21:55:14,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.20 | bwd_microstep: 1772.16 | bwd_inner_microstep: 1772.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-10 21:55:25,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.50 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 21:55:25,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 10113.09 | bwd_inner_microstep: 1870.93 | bwd_allreduce_microstep: 8242.10 | step_microstep: 39.07 [2024-06-10 21:55:25,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14382.61 | bwd: 46857.01 | bwd_inner: 38613.94 | bwd_allreduce: 8242.37 | step: 40.52 {'loss': 1.1982, 'learning_rate': 8.026742353747698e-06, 'epoch': 0.71} 71%|███████ | 1226/1726 [21:12:49<8:29:22, 61.13s/it] 71%|███████ | 1227/1726 [21:13:52<8:32:35, 61.64s/it] 71%|███████ | 1227/1726 [21:13:52<8:32:35, 61.64s/it] 71%|███████ | 1228/1726 [21:14:54<8:34:22, 61.97s/it] 71%|███████ | 1228/1726 [21:14:54<8:34:22, 61.97s/it] 71%|███████ | 1229/1726 [21:15:58<8:37:34, 62.48s/it] 71%|███████ | 1229/1726 [21:15:58<8:37:34, 62.48s/it] 71%|███████▏ | 1230/1726 [21:17:00<8:34:29, 62.24s/it] 71%|███████▏ | 1230/1726 [21:17:00<8:34:29, 62.24s/it] 71%|███████▏ | 1231/1726 [21:18:01<8:31:47, 62.04s/it] dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-10 21:55:26,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.74 | bwd_microstep: 1368.68 | bwd_inner_microstep: 1368.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 21:55:28,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.24 | bwd_microstep: 1285.91 | bwd_inner_microstep: 1285.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-10 21:55:30,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.01 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-10 21:55:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.73 | bwd_microstep: 1434.13 | bwd_inner_microstep: 1434.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-10 21:55:34,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1379.43 | bwd_inner_microstep: 1379.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755 [2024-06-10 21:55:36,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.27 | bwd_microstep: 1535.45 | bwd_inner_microstep: 1535.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 21:55:38,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.45 | bwd_microstep: 1187.83 | bwd_inner_microstep: 1187.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-10 21:55:40,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1393.36 | bwd_inner_microstep: 1393.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 21:55:42,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1385.20 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 21:55:44,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1449.16 | bwd_inner_microstep: 1449.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-10 21:55:46,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 21:55:47,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.09 | bwd_microstep: 1291.58 | bwd_inner_microstep: 1291.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3645 [2024-06-10 21:55:49,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.09 | bwd_microstep: 1471.83 | bwd_inner_microstep: 1471.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 21:55:51,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.28 | bwd_microstep: 1383.20 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 21:55:53,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1375.68 | bwd_inner_microstep: 1375.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3631 [2024-06-10 21:55:56,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.66 | bwd_microstep: 1808.10 | bwd_inner_microstep: 1808.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404 [2024-06-10 21:55:57,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.90 | bwd_microstep: 1308.99 | bwd_inner_microstep: 1308.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2135 [2024-06-10 21:55:59,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.00 | bwd_microstep: 928.31 | bwd_inner_microstep: 928.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-10 21:56:01,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.27 | bwd_microstep: 1648.52 | bwd_inner_microstep: 1648.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3667 [2024-06-10 21:56:03,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.95 | bwd_microstep: 1486.31 | bwd_inner_microstep: 1486.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 940 [2024-06-10 21:56:04,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.37 | bwd_microstep: 377.84 | bwd_inner_microstep: 377.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-10 21:56:06,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.62 | bwd_microstep: 1601.51 | bwd_inner_microstep: 1601.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 21:56:08,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.51 | bwd_microstep: 1258.56 | bwd_inner_microstep: 1258.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 21:56:09,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1286.65 | bwd_inner_microstep: 1286.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 21:56:11,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.33 | bwd_microstep: 1401.85 | bwd_inner_microstep: 1401.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649 [2024-06-10 21:56:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1610.98 | bwd_inner_microstep: 1610.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 21:56:15,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1409.72 | bwd_inner_microstep: 1409.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-10 21:56:17,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.01 | bwd_microstep: 1485.62 | bwd_inner_microstep: 1485.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 21:56:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.48 | bwd_microstep: 1445.30 | bwd_inner_microstep: 1445.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2233 [2024-06-10 21:56:21,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.75 | bwd_microstep: 835.93 | bwd_inner_microstep: 835.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3815 [2024-06-10 21:56:23,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.97 | bwd_microstep: 1703.75 | bwd_inner_microstep: 1703.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2359 [2024-06-10 21:56:25,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.03 | optimizer_step: 6.58 [2024-06-10 21:56:25,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.74 | bwd_microstep: 2073.34 | bwd_inner_microstep: 1202.97 | bwd_allreduce_microstep: 870.32 | step_microstep: 37.49 [2024-06-10 21:56:25,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16189.44 | bwd: 44347.62 | bwd_inner: 43476.40 | bwd_allreduce: 870.54 | step: 38.99 {'loss': 1.1704, 'learning_rate': 7.996698711302315e-06, 'epoch': 0.71} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535 [2024-06-10 21:56:27,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1417.68 | bwd_inner_microstep: 1417.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-10 21:56:29,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.41 | bwd_microstep: 788.78 | bwd_inner_microstep: 788.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3396 [2024-06-10 21:56:30,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1342.84 | bwd_inner_microstep: 1342.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-10 21:56:32,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 21:56:34,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1384.08 | bwd_inner_microstep: 1384.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4124 [2024-06-10 21:56:36,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 1440.03 | bwd_inner_microstep: 1440.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 21:56:38,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1479.94 | bwd_inner_microstep: 1479.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3698 [2024-06-10 21:56:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1456.72 | bwd_inner_microstep: 1456.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907 [2024-06-10 21:56:41,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.18 | bwd_microstep: 775.66 | bwd_inner_microstep: 775.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-10 21:56:43,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.70 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689 [2024-06-10 21:56:45,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.91 | bwd_microstep: 1616.18 | bwd_inner_microstep: 1616.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3677 [2024-06-10 21:56:47,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.64 | bwd_microstep: 1580.74 | bwd_inner_microstep: 1580.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 21:56:49,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3390 [2024-06-10 21:56:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.89 | bwd_microstep: 1273.69 | bwd_inner_microstep: 1273.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055 [2024-06-10 21:56:52,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.89 | bwd_microstep: 914.31 | bwd_inner_microstep: 914.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519 [2024-06-10 21:56:54,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.57 | bwd_microstep: 1418.91 | bwd_inner_microstep: 1418.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128 [2024-06-10 21:56:55,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.78 | bwd_microstep: 831.56 | bwd_inner_microstep: 831.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 21:56:57,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1285.30 | bwd_inner_microstep: 1285.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 21:56:59,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1557.51 | bwd_inner_microstep: 1557.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 21:57:01,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.35 | bwd_microstep: 1490.93 | bwd_inner_microstep: 1490.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702 [2024-06-10 21:57:03,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-10 21:57:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.20 | bwd_microstep: 1431.91 | bwd_inner_microstep: 1431.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 21:57:07,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.59 | bwd_microstep: 1457.29 | bwd_inner_microstep: 1457.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-10 21:57:08,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.25 | bwd_microstep: 804.93 | bwd_inner_microstep: 804.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3553 [2024-06-10 21:57:10,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.94 | bwd_microstep: 1260.79 | bwd_inner_microstep: 1260.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-10 21:57:11,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.38 | bwd_microstep: 815.35 | bwd_inner_microstep: 815.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2030 [2024-06-10 21:57:12,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.06 | bwd_microstep: 866.01 | bwd_inner_microstep: 865.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-10 21:57:15,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.29 | bwd_microstep: 1535.48 | bwd_inner_microstep: 1535.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2059 [2024-06-10 21:57:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.01 | bwd_microstep: 847.61 | bwd_inner_microstep: 847.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-10 21:57:17,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.21 | bwd_microstep: 974.98 | bwd_inner_microstep: 974.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826 [2024-06-10 21:57:19,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1480.27 | bwd_inner_microstep: 1480.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2034 [2024-06-10 21:57:26,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.10 | optimizer_step: 6.60 [2024-06-10 21:57:26,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.84 | bwd_microstep: 6731.60 | bwd_inner_microstep: 1037.58 | bwd_allreduce_microstep: 5693.98 | step_microstep: 37.86 [2024-06-10 21:57:26,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14882.22 | bwd: 45564.05 | bwd_inner: 39869.17 | bwd_allreduce: 5694.20 | step: 39.30 {'loss': 1.217, 'learning_rate': 7.966697344421658e-06, 'epoch': 0.71} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 21:57:28,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.40 | bwd_microstep: 1269.10 | bwd_inner_microstep: 1269.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 21:57:30,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1343.96 | bwd_inner_microstep: 1343.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 21:57:32,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.42 | bwd_microstep: 1475.54 | bwd_inner_microstep: 1475.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3877 [2024-06-10 21:57:34,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.98 | bwd_microstep: 1545.00 | bwd_inner_microstep: 1544.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-10 21:57:36,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.73 | bwd_microstep: 1476.87 | bwd_inner_microstep: 1476.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 21:57:37,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 788.74 | bwd_inner_microstep: 788.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3740 [2024-06-10 21:57:39,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1632.54 | bwd_inner_microstep: 1632.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 21:57:41,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.10 | bwd_microstep: 1352.46 | bwd_inner_microstep: 1352.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 21:57:43,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1244.79 | bwd_inner_microstep: 1244.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 21:57:45,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1292.27 | bwd_inner_microstep: 1292.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 21:57:47,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.06 | bwd_microstep: 1277.42 | bwd_inner_microstep: 1277.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 21:57:48,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1388.02 | bwd_inner_microstep: 1387.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487 [2024-06-10 21:57:50,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.10 | bwd_microstep: 1314.07 | bwd_inner_microstep: 1314.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2173 [2024-06-10 21:57:52,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.82 | bwd_microstep: 945.65 | bwd_inner_microstep: 945.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-10 21:57:53,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.54 | bwd_microstep: 673.92 | bwd_inner_microstep: 673.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-10 21:57:54,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.58 | bwd_microstep: 1306.14 | bwd_inner_microstep: 1306.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2173 [2024-06-10 21:57:56,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.91 | bwd_microstep: 852.01 | bwd_inner_microstep: 851.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 21:57:57,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.89 | bwd_microstep: 1253.51 | bwd_inner_microstep: 1253.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3653 [2024-06-10 21:57:59,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1481.30 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-10 21:58:01,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.29 | bwd_microstep: 1492.19 | bwd_inner_microstep: 1492.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2678 [2024-06-10 21:58:03,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.45 | bwd_microstep: 1125.34 | bwd_inner_microstep: 1125.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 21:58:05,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.76 | bwd_microstep: 1658.80 | bwd_inner_microstep: 1658.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-10 21:58:07,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1295.03 | bwd_inner_microstep: 1295.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3579 [2024-06-10 21:58:09,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.40 | bwd_microstep: 1205.82 | bwd_inner_microstep: 1205.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-10 21:58:11,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.00 | bwd_microstep: 1310.03 | bwd_inner_microstep: 1310.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3851 [2024-06-10 21:58:13,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.54 | bwd_microstep: 1600.12 | bwd_inner_microstep: 1600.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952 [2024-06-10 21:58:14,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.08 | bwd_microstep: 729.30 | bwd_inner_microstep: 729.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3737 [2024-06-10 21:58:16,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.97 | bwd_microstep: 1680.75 | bwd_inner_microstep: 1680.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3558 [2024-06-10 21:58:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.29 | bwd_microstep: 1471.17 | bwd_inner_microstep: 1471.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-10 21:58:20,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.16 | bwd_microstep: 1486.46 | bwd_inner_microstep: 1486.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3770 [2024-06-10 21:58:22,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.63 | bwd_microstep: 1673.28 | bwd_inner_microstep: 1673.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-10 21:58:27,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.19 | optimizer_step: 6.59 [2024-06-10 21:58:27,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.35 | bwd_microstep: 4247.78 | bwd_inner_microstep: 1816.97 | bwd_allreduce_microstep: 2430.76 | step_microstep: 37.79 [2024-06-10 21:58:27,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15845.78 | bwd: 44889.40 | bwd_inner: 42457.74 | bwd_allreduce: 2430.99 | step: 39.30 {'loss': 1.159, 'learning_rate': 7.936738358770409e-06, 'epoch': 0.71} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414 [2024-06-10 21:58:29,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.89 | bwd_microstep: 1365.40 | bwd_inner_microstep: 1365.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3979 [2024-06-10 21:58:32,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.24 | bwd_microstep: 1702.80 | bwd_inner_microstep: 1702.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3936 [2024-06-10 21:58:34,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1491.59 | bwd_inner_microstep: 1491.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-10 21:58:36,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.61 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 21:58:38,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.28 | bwd_microstep: 1378.07 | bwd_inner_microstep: 1378.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 21:58:39,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.20 | bwd_microstep: 791.27 | bwd_inner_microstep: 791.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-10 21:58:41,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1286.85 | bwd_inner_microstep: 1286.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644 [2024-06-10 21:58:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.31 | bwd_microstep: 1316.78 | bwd_inner_microstep: 1316.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 21:58:44,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1388.22 | bwd_inner_microstep: 1388.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2083 [2024-06-10 21:58:46,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.67 | bwd_microstep: 919.39 | bwd_inner_microstep: 919.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 21:58:48,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1525.26 | bwd_inner_microstep: 1525.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2137 [2024-06-10 21:58:49,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.08 | bwd_microstep: 1022.88 | bwd_inner_microstep: 1022.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431 [2024-06-10 21:58:51,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1407.07 | bwd_inner_microstep: 1407.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3513 [2024-06-10 21:58:53,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.28 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-10 21:58:54,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 792.28 | bwd_inner_microstep: 792.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-10 21:58:56,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.04 | bwd_microstep: 1497.65 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 21:58:58,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1391.07 | bwd_inner_microstep: 1391.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 21:59:00,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1656.35 | bwd_inner_microstep: 1656.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-10 21:59:02,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1427.98 | bwd_inner_microstep: 1427.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999 [2024-06-10 21:59:03,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.43 | bwd_microstep: 707.73 | bwd_inner_microstep: 707.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-10 21:59:04,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.85 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-10 21:59:06,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3548 [2024-06-10 21:59:08,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.73 | bwd_microstep: 1457.91 | bwd_inner_microstep: 1457.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916 [2024-06-10 21:59:09,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.46 | bwd_microstep: 687.55 | bwd_inner_microstep: 687.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471 [2024-06-10 21:59:11,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1313.64 | bwd_inner_microstep: 1313.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2010 [2024-06-10 21:59:12,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.96 | bwd_microstep: 802.75 | bwd_inner_microstep: 802.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3822 [2024-06-10 21:59:14,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.92 | bwd_microstep: 1536.18 | bwd_inner_microstep: 1536.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-10 21:59:16,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.30 | bwd_microstep: 1406.03 | bwd_inner_microstep: 1406.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 909 [2024-06-10 21:59:17,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.65 | bwd_microstep: 373.73 | bwd_inner_microstep: 373.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2001 [2024-06-10 21:59:18,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.52 | bwd_microstep: 828.98 | bwd_inner_microstep: 828.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 21:59:20,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.01 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3468 [2024-06-10 21:59:28,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 21:59:28,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 8161.69 | bwd_inner_microstep: 1624.34 | bwd_allreduce_microstep: 6537.30 | step_microstep: 38.75 [2024-06-10 21:59:28,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14733.38 | bwd: 46026.93 | bwd_inner: 39488.72 | bwd_allreduce: 6537.52 | step: 40.16 {'loss': 1.2384, 'learning_rate': 7.90682185986394e-06, 'epoch': 0.72} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3478 [2024-06-10 21:59:30,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1496.74 | bwd_inner_microstep: 1496.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3913 [2024-06-10 21:59:33,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1682.13 | bwd_inner_microstep: 1682.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876 [2024-06-10 21:59:35,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.36 | bwd_microstep: 1579.14 | bwd_inner_microstep: 1579.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 21:59:36,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.02 | bwd_microstep: 789.15 | bwd_inner_microstep: 789.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 21:59:38,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.93 | bwd_microstep: 1337.43 | bwd_inner_microstep: 1337.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-10 21:59:39,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.73 | bwd_microstep: 789.16 | bwd_inner_microstep: 789.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 21:59:41,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1382.85 | bwd_inner_microstep: 1382.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-10 21:59:43,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.53 | bwd_microstep: 1290.03 | bwd_inner_microstep: 1290.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2369 [2024-06-10 21:59:44,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.14 | bwd_microstep: 929.49 | bwd_inner_microstep: 929.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-10 21:59:46,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.92 | bwd_microstep: 1182.11 | bwd_inner_microstep: 1182.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 21:59:48,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.23 | bwd_microstep: 1503.80 | bwd_inner_microstep: 1503.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 21:59:50,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1393.58 | bwd_inner_microstep: 1393.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 21:59:52,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1383.00 | bwd_inner_microstep: 1382.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2118 [2024-06-10 21:59:53,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.01 | bwd_microstep: 859.02 | bwd_inner_microstep: 858.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-10 21:59:55,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.18 | bwd_microstep: 1609.14 | bwd_inner_microstep: 1609.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520 [2024-06-10 21:59:57,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.05 | bwd_microstep: 1583.16 | bwd_inner_microstep: 1583.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 21:59:59,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.90 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527 [2024-06-10 22:00:01,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.97 | bwd_microstep: 1351.33 | bwd_inner_microstep: 1351.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 22:00:03,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1280.12 | bwd_inner_microstep: 1280.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-10 22:00:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.08 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 22:00:07,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1555.50 | bwd_inner_microstep: 1555.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-10 22:00:09,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1487.41 | bwd_inner_microstep: 1487.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2006 [2024-06-10 22:00:10,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.75 | bwd_microstep: 740.07 | bwd_inner_microstep: 740.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3813 [2024-06-10 22:00:12,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1515.49 | bwd_inner_microstep: 1515.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1896 [2024-06-10 22:00:13,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.74 | bwd_microstep: 746.45 | bwd_inner_microstep: 746.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2057 [2024-06-10 22:00:14,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 819.77 | bwd_inner_microstep: 819.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-10 22:00:15,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.49 | bwd_microstep: 798.04 | bwd_inner_microstep: 798.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736 [2024-06-10 22:00:17,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1460.90 | bwd_inner_microstep: 1460.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 22:00:19,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.56 | bwd_microstep: 1498.66 | bwd_inner_microstep: 1498.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 22:00:21,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.56 | bwd_microstep: 1597.82 | bwd_inner_microstep: 1597.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 22:00:23,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1250.19 | bwd_inner_microstep: 1250.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3761 [2024-06-10 22:00:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-10 22:00:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.35 | bwd_microstep: 7572.38 | bwd_inner_microstep: 1742.25 | bwd_allreduce_microstep: 5830.06 | step_microstep: 38.34 [2024-06-10 22:00:31,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15444.22 | bwd: 47275.62 | bwd_inner: 41444.64 | bwd_allreduce: 5830.31 | step: 39.81 {'loss': 1.1897, 'learning_rate': 7.87694795306802e-06, 'epoch': 0.72} 71%|███████▏ | 1231/1726 [21:18:01<8:31:47, 62.04s/it] 71%|███████▏ | 1232/1726 [21:19:02<8:27:52, 61.68s/it] 71%|███████▏ | 1232/1726 [21:19:02<8:27:52, 61.68s/it] 71%|███████▏ | 1233/1726 [21:20:03<8:24:35, 61.41s/it] 71%|███████▏ | 1233/1726 [21:20:03<8:24:35, 61.41s/it] 71%|███████▏ | 1234/1726 [21:21:04<8:22:43, 61.31s/it] 71%|███████▏ | 1234/1726 [21:21:04<8:22:43, 61.31s/it] 72%|███████▏ | 1235/1726 [21:22:05<8:21:08, 61.24s/it] 72%|███████▏ | 1235/1726 [21:22:05<8:21:08, 61.24s/it] 72%|███████▏ | 1236/1726 [21:23:08<8:24:32, 61.78s/it] 72%|dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1886 [2024-06-10 22:00:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 793.57 | bwd_inner_microstep: 793.46 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926 [2024-06-10 22:00:35,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1585.98 | bwd_inner_microstep: 1585.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 22:00:37,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1340.27 | bwd_inner_microstep: 1340.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-10 22:00:39,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.79 | bwd_microstep: 1475.33 | bwd_inner_microstep: 1475.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3766 [2024-06-10 22:00:41,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.21 | bwd_microstep: 1534.13 | bwd_inner_microstep: 1534.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-10 22:00:43,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3731 [2024-06-10 22:00:44,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.79 | bwd_microstep: 1267.14 | bwd_inner_microstep: 1267.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 22:00:46,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.19 | bwd_microstep: 1283.98 | bwd_inner_microstep: 1283.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-10 22:00:47,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.71 | bwd_microstep: 790.57 | bwd_inner_microstep: 790.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 22:00:49,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.97 | bwd_microstep: 1280.03 | bwd_inner_microstep: 1280.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-10 22:00:51,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.23 | bwd_microstep: 1342.77 | bwd_inner_microstep: 1342.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-10 22:00:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1413.24 | bwd_inner_microstep: 1413.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1975 [2024-06-10 22:00:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.12 | bwd_microstep: 854.83 | bwd_inner_microstep: 854.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3481 [2024-06-10 22:00:56,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.03 | bwd_microstep: 1540.12 | bwd_inner_microstep: 1540.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3634 [2024-06-10 22:00:58,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.74 | bwd_microstep: 1640.89 | bwd_inner_microstep: 1640.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3551 [2024-06-10 22:01:01,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 1585.93 | bwd_inner_microstep: 1585.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 22:01:03,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1557.55 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3629 [2024-06-10 22:01:04,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623 [2024-06-10 22:01:07,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.11 | bwd_microstep: 1610.25 | bwd_inner_microstep: 1610.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 22:01:09,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.70 | bwd_microstep: 1403.87 | bwd_inner_microstep: 1403.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-10 22:01:11,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1414.52 | bwd_inner_microstep: 1414.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 22:01:13,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1495.72 | bwd_inner_microstep: 1495.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113 [2024-06-10 22:01:14,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.15 | bwd_microstep: 829.23 | bwd_inner_microstep: 829.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3468 [2024-06-10 22:01:16,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1400.52 | bwd_inner_microstep: 1400.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-10 22:01:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.12 | bwd_microstep: 1487.44 | bwd_inner_microstep: 1487.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404 [2024-06-10 22:01:20,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.03 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652 [2024-06-10 22:01:22,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1325.27 | bwd_inner_microstep: 1325.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 22:01:23,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.58 | bwd_microstep: 1254.23 | bwd_inner_microstep: 1254.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725 [2024-06-10 22:01:25,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1339.74 | bwd_inner_microstep: 1339.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-10 22:01:27,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1280.72 | bwd_inner_microstep: 1280.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-10 22:01:29,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1311.38 | bwd_inner_microstep: 1311.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 22:01:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-10 22:01:33,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 3225.70 | bwd_inner_microstep: 1443.05 | bwd_allreduce_microstep: 1782.60 | step_microstep: 37.86 [2024-06-10 22:01:33,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.12 | bwd: 44745.78 | bwd_inner: 42962.18 | bwd_allreduce: 1782.88 | step: 39.31 {'loss': 1.2072, 'learning_rate': 7.847116743598392e-06, 'epoch': 0.72} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 22:01:35,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1474.06 | bwd_inner_microstep: 1473.88 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4036 [2024-06-10 22:01:37,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.44 | bwd_microstep: 1715.69 | bwd_inner_microstep: 1715.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2395 [2024-06-10 22:01:38,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.43 | bwd_microstep: 904.51 | bwd_inner_microstep: 904.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3833 [2024-06-10 22:01:40,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1457.77 | bwd_inner_microstep: 1457.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3402 [2024-06-10 22:01:42,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.33 | bwd_microstep: 1211.14 | bwd_inner_microstep: 1211.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795 [2024-06-10 22:01:44,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.72 | bwd_microstep: 1446.14 | bwd_inner_microstep: 1446.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4085 [2024-06-10 22:01:46,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1626.35 | bwd_inner_microstep: 1626.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 22:01:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.84 | bwd_microstep: 795.35 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-10 22:01:49,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.71 | bwd_microstep: 1429.52 | bwd_inner_microstep: 1429.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-10 22:01:51,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1282.76 | bwd_inner_microstep: 1282.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 22:01:53,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.50 | bwd_microstep: 1289.26 | bwd_inner_microstep: 1289.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519 [2024-06-10 22:01:55,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1320.81 | bwd_inner_microstep: 1320.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3493 [2024-06-10 22:01:56,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1350.26 | bwd_inner_microstep: 1350.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652 [2024-06-10 22:01:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.05 | bwd_microstep: 1320.50 | bwd_inner_microstep: 1320.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677 [2024-06-10 22:02:01,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.77 | bwd_microstep: 1719.38 | bwd_inner_microstep: 1719.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-10 22:02:03,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.10 | bwd_microstep: 1524.61 | bwd_inner_microstep: 1524.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-10 22:02:05,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.28 | bwd_microstep: 1616.84 | bwd_inner_microstep: 1616.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3828 [2024-06-10 22:02:07,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.11 | bwd_microstep: 1707.69 | bwd_inner_microstep: 1707.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2029 [2024-06-10 22:02:08,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.72 | bwd_microstep: 808.04 | bwd_inner_microstep: 808.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 22:02:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.89 | bwd_microstep: 1399.10 | bwd_inner_microstep: 1399.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 22:02:12,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1410.93 | bwd_inner_microstep: 1410.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-10 22:02:14,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.76 | bwd_microstep: 1487.16 | bwd_inner_microstep: 1487.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-10 22:02:15,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 802.88 | bwd_inner_microstep: 802.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-10 22:02:18,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1511.36 | bwd_inner_microstep: 1511.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 22:02:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.25 | bwd_microstep: 1547.79 | bwd_inner_microstep: 1547.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032 [2024-06-10 22:02:21,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.17 | bwd_microstep: 809.90 | bwd_inner_microstep: 809.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-10 22:02:22,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 819.40 | bwd_inner_microstep: 819.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 22:02:24,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1508.20 | bwd_inner_microstep: 1508.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 22:02:26,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.06 | bwd_microstep: 1648.53 | bwd_inner_microstep: 1648.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-10 22:02:28,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.75 | bwd_microstep: 1003.94 | bwd_inner_microstep: 1003.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 22:02:30,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1504.40 | bwd_inner_microstep: 1504.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3779 [2024-06-10 22:02:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-10 22:02:35,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.62 | bwd_microstep: 4503.78 | bwd_inner_microstep: 1987.64 | bwd_allreduce_microstep: 2516.08 | step_microstep: 38.83 [2024-06-10 22:02:35,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16099.62 | bwd: 45958.09 | bwd_inner: 43440.96 | bwd_allreduce: 2516.39 | step: 40.36 {'loss': 1.1691, 'learning_rate': 7.817328336520412e-06, 'epoch': 0.72} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 22:02:37,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.18 | bwd_microstep: 1334.95 | bwd_inner_microstep: 1334.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3870 [2024-06-10 22:02:39,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.75 | bwd_microstep: 1564.43 | bwd_inner_microstep: 1564.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-10 22:02:41,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.28 | bwd_microstep: 1183.18 | bwd_inner_microstep: 1183.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 22:02:43,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1380.65 | bwd_inner_microstep: 1380.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 22:02:44,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.41 | bwd_microstep: 1340.57 | bwd_inner_microstep: 1340.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 22:02:46,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1391.54 | bwd_inner_microstep: 1391.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-10 22:02:47,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.67 | bwd_microstep: 790.82 | bwd_inner_microstep: 790.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2756 [2024-06-10 22:02:49,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 1047.34 | bwd_inner_microstep: 1047.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 22:02:51,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1391.48 | bwd_inner_microstep: 1391.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406 [2024-06-10 22:02:52,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.82 | bwd_microstep: 1180.75 | bwd_inner_microstep: 1180.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-10 22:02:54,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.81 | bwd_microstep: 1315.22 | bwd_inner_microstep: 1315.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1935 [2024-06-10 22:02:55,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 885.48 | bwd_inner_microstep: 885.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2109 [2024-06-10 22:02:57,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.89 | bwd_microstep: 950.51 | bwd_inner_microstep: 950.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655 [2024-06-10 22:02:59,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.22 | bwd_microstep: 1577.51 | bwd_inner_microstep: 1577.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645 [2024-06-10 22:03:01,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.33 | bwd_microstep: 1708.71 | bwd_inner_microstep: 1708.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-10 22:03:03,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 1439.40 | bwd_inner_microstep: 1439.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105 [2024-06-10 22:03:04,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.12 | bwd_microstep: 824.52 | bwd_inner_microstep: 824.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-10 22:03:06,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1495.13 | bwd_inner_microstep: 1495.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 22:03:08,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.81 | bwd_microstep: 1392.04 | bwd_inner_microstep: 1392.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3541 [2024-06-10 22:03:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.11 | bwd_microstep: 1201.11 | bwd_inner_microstep: 1201.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2263 [2024-06-10 22:03:11,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.01 | bwd_microstep: 780.21 | bwd_inner_microstep: 780.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3815 [2024-06-10 22:03:13,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.88 | bwd_microstep: 1508.72 | bwd_inner_microstep: 1508.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3675 [2024-06-10 22:03:15,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.01 | bwd_microstep: 1527.92 | bwd_inner_microstep: 1527.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3785 [2024-06-10 22:03:18,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.52 | bwd_microstep: 1611.40 | bwd_inner_microstep: 1611.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3705 [2024-06-10 22:03:20,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.83 | bwd_microstep: 1426.50 | bwd_inner_microstep: 1426.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 22:03:22,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.75 | bwd_microstep: 1604.04 | bwd_inner_microstep: 1604.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-10 22:03:24,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1441.74 | bwd_inner_microstep: 1441.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3452 [2024-06-10 22:03:26,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1303.26 | bwd_inner_microstep: 1303.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019 [2024-06-10 22:03:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.73 | bwd_microstep: 846.44 | bwd_inner_microstep: 846.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3810 [2024-06-10 22:03:29,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.93 | bwd_microstep: 1719.31 | bwd_inner_microstep: 1719.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3807 [2024-06-10 22:03:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.21 | bwd_microstep: 1583.24 | bwd_inner_microstep: 1583.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3814 [2024-06-10 22:03:36,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.19 | optimizer_step: 6.62 [2024-06-10 22:03:36,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.54 | bwd_microstep: 4348.85 | bwd_inner_microstep: 1819.44 | bwd_allreduce_microstep: 2529.35 | step_microstep: 38.59 [2024-06-10 22:03:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15844.98 | bwd: 45097.00 | bwd_inner: 42566.73 | bwd_allreduce: 2529.59 | step: 40.07 {'loss': 1.1587, 'learning_rate': 7.787582836748692e-06, 'epoch': 0.72} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3463 [2024-06-10 22:03:38,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.74 | bwd_microstep: 1467.32 | bwd_inner_microstep: 1467.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3924 [2024-06-10 22:03:41,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.05 | bwd_microstep: 1785.24 | bwd_inner_microstep: 1785.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3857 [2024-06-10 22:03:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1393.95 | bwd_inner_microstep: 1393.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 22:03:44,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.85 | bwd_microstep: 1247.56 | bwd_inner_microstep: 1247.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399 [2024-06-10 22:03:46,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.74 | bwd_microstep: 1147.56 | bwd_inner_microstep: 1147.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 22:03:48,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1374.17 | bwd_inner_microstep: 1374.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-10 22:03:49,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.51 | bwd_microstep: 790.71 | bwd_inner_microstep: 790.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 22:03:51,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.31 | bwd_microstep: 1284.97 | bwd_inner_microstep: 1284.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699 [2024-06-10 22:03:53,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.41 | bwd_microstep: 1623.76 | bwd_inner_microstep: 1623.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 22:03:55,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1248.31 | bwd_inner_microstep: 1248.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-10 22:03:56,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1248.54 | bwd_inner_microstep: 1248.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2123 [2024-06-10 22:03:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.14 | bwd_microstep: 828.78 | bwd_inner_microstep: 828.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416 [2024-06-10 22:03:59,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.91 | bwd_microstep: 1276.40 | bwd_inner_microstep: 1276.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-10 22:04:01,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.79 | bwd_microstep: 1447.77 | bwd_inner_microstep: 1447.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-10 22:04:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.93 | bwd_microstep: 1707.26 | bwd_inner_microstep: 1707.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 22:04:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.11 | bwd_microstep: 1511.10 | bwd_inner_microstep: 1511.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 22:04:08,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.23 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516 [2024-06-10 22:04:10,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1417.86 | bwd_inner_microstep: 1417.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 22:04:12,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1380.48 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2294 [2024-06-10 22:04:13,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.26 | bwd_microstep: 877.51 | bwd_inner_microstep: 877.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2130 [2024-06-10 22:04:14,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.49 | bwd_microstep: 830.66 | bwd_inner_microstep: 830.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-10 22:04:16,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1491.98 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 22:04:18,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.41 | bwd_microstep: 1505.79 | bwd_inner_microstep: 1505.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128 [2024-06-10 22:04:19,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.91 | bwd_microstep: 831.68 | bwd_inner_microstep: 831.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3556 [2024-06-10 22:04:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1232.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 22:04:23,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.71 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822 [2024-06-10 22:04:25,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1582.68 | bwd_inner_microstep: 1582.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2669 [2024-06-10 22:04:27,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.02 | bwd_microstep: 1216.04 | bwd_inner_microstep: 1216.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-10 22:04:29,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.31 | bwd_microstep: 1495.02 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583 [2024-06-10 22:04:31,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.24 | bwd_microstep: 1692.76 | bwd_inner_microstep: 1692.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 22:04:33,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.55 | bwd_microstep: 1277.31 | bwd_inner_microstep: 1277.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3570 [2024-06-10 22:04:38,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.06 | optimizer_step: 6.63 [2024-06-10 22:04:38,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.46 | bwd_microstep: 4907.58 | bwd_inner_microstep: 1918.75 | bwd_allreduce_microstep: 2988.78 | step_microstep: 37.62 [2024-06-10 22:04:38,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15963.14 | bwd: 45871.81 | bwd_inner: 42882.12 | bwd_allreduce: 2989.01 | step: 39.30 {'loss': 1.1737, 'learning_rate': 7.757880349046742e-06, 'epoch': 0.72} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4151 [2024-06-10 22:04:41,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.46 | bwd_microstep: 1619.40 | bwd_inner_microstep: 1619.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937 [2024-06-10 22:04:43,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1495.26 | bwd_inner_microstep: 1495.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3915 [2024-06-10 22:04:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.37 | bwd_microstep: 1440.67 | bwd_inner_microstep: 1440.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-10 22:04:47,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.36 | bwd_microstep: 1650.74 | bwd_inner_microstep: 1650.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 22:04:49,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1245.33 | bwd_inner_microstep: 1245.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1918 [2024-06-10 22:04:50,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.21 | bwd_microstep: 779.71 | bwd_inner_microstep: 779.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 22:04:52,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.02 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783 [2024-06-10 22:04:54,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.41 | bwd_microstep: 1451.70 | bwd_inner_microstep: 1451.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-10 22:04:56,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1626.93 | bwd_inner_microstep: 1626.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 22:04:58,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.35 | bwd_microstep: 1249.20 | bwd_inner_microstep: 1249.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2157 [2024-06-10 22:04:59,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.65 | bwd_microstep: 912.81 | bwd_inner_microstep: 912.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3702 [2024-06-10 22:05:01,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1909 [2024-06-10 22:05:02,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.25 | bwd_microstep: 810.47 | bwd_inner_microstep: 810.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2187 [2024-06-10 22:05:03,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.01 | bwd_microstep: 1050.31 | bwd_inner_microstep: 1050.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3656 [2024-06-10 22:05:05,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.21 | bwd_microstep: 1576.19 | bwd_inner_microstep: 1576.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-10 22:05:07,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1344.26 | bwd_inner_microstep: 1344.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825 [2024-06-10 22:05:09,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.06 | bwd_microstep: 1583.41 | bwd_inner_microstep: 1583.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660 [2024-06-10 22:05:11,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.52 | bwd_microstep: 1429.56 | bwd_inner_microstep: 1429.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 22:05:13,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.10 | bwd_microstep: 1488.26 | bwd_inner_microstep: 1488.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-10 22:05:16,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1654.99 | bwd_inner_microstep: 1654.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 22:05:18,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1396.22 | bwd_inner_microstep: 1396.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-10 22:05:19,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.89 | bwd_microstep: 1358.93 | bwd_inner_microstep: 1358.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3671 [2024-06-10 22:05:21,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.95 | bwd_microstep: 1259.69 | bwd_inner_microstep: 1259.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-10 22:05:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1290.15 | bwd_inner_microstep: 1290.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 22:05:25,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1544.04 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-10 22:05:27,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1288.59 | bwd_inner_microstep: 1288.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824 [2024-06-10 22:05:29,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.37 | bwd_microstep: 1584.28 | bwd_inner_microstep: 1584.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546 [2024-06-10 22:05:31,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.43 | bwd_microstep: 1586.11 | bwd_inner_microstep: 1586.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559 [2024-06-10 22:05:33,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.33 | bwd_microstep: 1595.05 | bwd_inner_microstep: 1595.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 22:05:35,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1339.70 | bwd_inner_microstep: 1339.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729 [2024-06-10 22:05:37,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1465.39 | bwd_inner_microstep: 1465.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585 [2024-06-10 22:05:41,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.03 | optimizer_step: 6.59 [2024-06-10 22:05:41,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.60 | bwd_microstep: 2596.49 | bwd_inner_microstep: 1799.23 | bwd_allreduce_microstep: 797.21 | step_microstep: 37.54 [2024-06-10 22:05:41,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16533.67 | bwd: 45278.04 | bwd_inner: 44479.94 | bwd_allreduce: 797.43 | step: 38.99 {'loss': 1.22, 'learning_rate': 7.728220978026563e-06, 'epoch': 0.72} ███████▏ | 1236/1726 [21:23:08<8:24:32, 61.78s/it] 72%|███████▏ | 1237/1726 [21:24:09<8:21:52, 61.58s/it] 72%|███████▏ | 1237/1726 [21:24:09<8:21:52, 61.58s/it] 72%|███████▏ | 1238/1726 [21:25:12<8:22:51, 61.83s/it] 72%|███████▏ | 1238/1726 [21:25:12<8:22:51, 61.83s/it] 72%|███████▏ | 1239/1726 [21:26:13<8:20:29, 61.66s/it] 72%|███████▏ | 1239/1726 [21:26:13<8:20:29, 61.66s/it] 72%|███████▏ | 1240/1726 [21:27:15<8:20:45, 61.82s/it] 72%|███████▏ | 1240/1726 [21:27:15<8:20:45, 61.82s/it] 72%|███████▏ | 1241/1726 [21:28:17<8:20:30, 61.92s/it] 72%|█�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376 [2024-06-10 22:05:42,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.90 | bwd_microstep: 1018.99 | bwd_inner_microstep: 1018.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4006 [2024-06-10 22:05:44,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.80 | bwd_microstep: 1505.82 | bwd_inner_microstep: 1505.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3864 [2024-06-10 22:05:46,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.67 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3806 [2024-06-10 22:05:48,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.08 | bwd_microstep: 1350.47 | bwd_inner_microstep: 1350.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 22:05:50,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2246 [2024-06-10 22:05:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 805.12 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-10 22:05:53,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.23 | bwd_microstep: 1426.43 | bwd_inner_microstep: 1426.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 22:05:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1275.89 | bwd_inner_microstep: 1275.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-10 22:05:57,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.67 | bwd_microstep: 1530.48 | bwd_inner_microstep: 1530.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2096 [2024-06-10 22:05:58,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.44 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3713 [2024-06-10 22:06:00,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1587.65 | bwd_inner_microstep: 1587.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2628 [2024-06-10 22:06:02,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.48 | bwd_microstep: 1014.28 | bwd_inner_microstep: 1014.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3451 [2024-06-10 22:06:04,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.10 | bwd_microstep: 1541.90 | bwd_inner_microstep: 1541.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3691 [2024-06-10 22:06:06,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.36 | bwd_microstep: 1718.58 | bwd_inner_microstep: 1718.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 22:06:08,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.00 | bwd_microstep: 1379.12 | bwd_inner_microstep: 1379.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3937 [2024-06-10 22:06:10,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.64 | bwd_microstep: 1687.35 | bwd_inner_microstep: 1687.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3383 [2024-06-10 22:06:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.39 | bwd_microstep: 1364.98 | bwd_inner_microstep: 1364.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 22:06:14,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.16 | bwd_microstep: 1611.28 | bwd_inner_microstep: 1611.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539 [2024-06-10 22:06:17,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.30 | bwd_microstep: 1583.87 | bwd_inner_microstep: 1583.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-10 22:06:19,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 1646.81 | bwd_inner_microstep: 1646.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3824 [2024-06-10 22:06:21,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.63 | bwd_microstep: 1593.19 | bwd_inner_microstep: 1593.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-10 22:06:23,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1461.34 | bwd_inner_microstep: 1461.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-10 22:06:25,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.14 | bwd_microstep: 1253.96 | bwd_inner_microstep: 1253.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 22:06:27,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1508.55 | bwd_inner_microstep: 1508.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-10 22:06:29,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1428.81 | bwd_inner_microstep: 1428.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-10 22:06:31,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1358.53 | bwd_inner_microstep: 1358.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655 [2024-06-10 22:06:33,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.01 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3602 [2024-06-10 22:06:35,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1307.34 | bwd_inner_microstep: 1307.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3815 [2024-06-10 22:06:37,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.89 | bwd_microstep: 1386.06 | bwd_inner_microstep: 1386.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-10 22:06:38,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1404.11 | bwd_inner_microstep: 1404.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3731 [2024-06-10 22:06:41,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1594.32 | bwd_inner_microstep: 1594.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1017 [2024-06-10 22:06:42,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.02 | optimizer_step: 6.61 [2024-06-10 22:06:42,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.61 | bwd_microstep: 1069.01 | bwd_inner_microstep: 452.31 | bwd_allreduce_microstep: 616.66 | step_microstep: 38.09 [2024-06-10 22:06:42,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16424.29 | bwd: 44598.28 | bwd_inner: 43980.73 | bwd_allreduce: 616.89 | step: 39.59 {'loss': 1.1748, 'learning_rate': 7.698604828148306e-06, 'epoch': 0.72} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-10 22:06:43,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.25 | bwd_microstep: 674.66 | bwd_inner_microstep: 674.50 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3983 [2024-06-10 22:06:45,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.07 | bwd_microstep: 1456.28 | bwd_inner_microstep: 1456.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 22:06:47,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1278.63 | bwd_inner_microstep: 1278.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841 [2024-06-10 22:06:49,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.10 | bwd_microstep: 1557.08 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 22:06:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-10 22:06:52,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.52 | bwd_microstep: 788.65 | bwd_inner_microstep: 788.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-10 22:06:54,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.96 | bwd_microstep: 1620.20 | bwd_inner_microstep: 1620.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435 [2024-06-10 22:06:56,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.44 | bwd_microstep: 1157.21 | bwd_inner_microstep: 1157.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 22:06:58,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1252.95 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2093 [2024-06-10 22:06:59,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.53 | bwd_microstep: 824.42 | bwd_inner_microstep: 824.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 22:07:01,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.89 | bwd_inner_microstep: 1387.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512 [2024-06-10 22:07:03,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1418.07 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 22:07:05,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.94 | bwd_microstep: 1382.52 | bwd_inner_microstep: 1382.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3422 [2024-06-10 22:07:07,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.35 | bwd_microstep: 1510.48 | bwd_inner_microstep: 1510.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3396 [2024-06-10 22:07:08,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.69 | bwd_microstep: 1368.43 | bwd_inner_microstep: 1368.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-10 22:07:11,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.74 | bwd_microstep: 1487.61 | bwd_inner_microstep: 1487.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-10 22:07:12,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.25 | bwd_microstep: 798.20 | bwd_inner_microstep: 798.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385 [2024-06-10 22:07:14,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1431.35 | bwd_inner_microstep: 1431.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-10 22:07:16,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.90 | bwd_microstep: 1525.60 | bwd_inner_microstep: 1525.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2488 [2024-06-10 22:07:17,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 960.51 | bwd_inner_microstep: 960.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 22:07:19,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1382.02 | bwd_inner_microstep: 1381.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-10 22:07:21,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.72 | bwd_microstep: 1327.36 | bwd_inner_microstep: 1327.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2004 [2024-06-10 22:07:22,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 897.66 | bwd_inner_microstep: 897.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 22:07:24,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-10 22:07:26,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.86 | bwd_microstep: 1560.59 | bwd_inner_microstep: 1560.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733 [2024-06-10 22:07:28,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1431.02 | bwd_inner_microstep: 1430.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2170 [2024-06-10 22:07:29,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.51 | bwd_microstep: 855.20 | bwd_inner_microstep: 855.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3767 [2024-06-10 22:07:31,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.72 | bwd_microstep: 1348.12 | bwd_inner_microstep: 1348.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3435 [2024-06-10 22:07:33,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1412.68 | bwd_inner_microstep: 1412.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831 [2024-06-10 22:07:35,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.14 | bwd_microstep: 1753.43 | bwd_inner_microstep: 1753.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 22:07:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.56 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 22:07:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.26 | optimizer_gradients: 4.10 | optimizer_step: 6.62 [2024-06-10 22:07:45,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.06 | bwd_microstep: 6656.04 | bwd_inner_microstep: 1554.10 | bwd_allreduce_microstep: 5101.89 | step_microstep: 38.44 [2024-06-10 22:07:45,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15562.01 | bwd: 46851.82 | bwd_inner: 41748.92 | bwd_allreduce: 5102.18 | step: 39.91 {'loss': 1.1524, 'learning_rate': 7.669032003719894e-06, 'epoch': 0.72} dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4068 [2024-06-10 22:07:47,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.91 | bwd_microstep: 1803.12 | bwd_inner_microstep: 1803.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-10 22:07:49,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.84 | bwd_microstep: 1369.99 | bwd_inner_microstep: 1369.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-10 22:07:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1548.65 | bwd_inner_microstep: 1548.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 22:07:53,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 22:07:55,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 22:07:57,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.04 | bwd_microstep: 1528.25 | bwd_inner_microstep: 1528.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-10 22:07:59,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2057 [2024-06-10 22:08:00,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.04 | bwd_microstep: 752.77 | bwd_inner_microstep: 752.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3708 [2024-06-10 22:08:02,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.95 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3669 [2024-06-10 22:08:04,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.24 | bwd_microstep: 1544.53 | bwd_inner_microstep: 1544.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-10 22:08:06,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.94 | bwd_microstep: 1480.48 | bwd_inner_microstep: 1480.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3489 [2024-06-10 22:08:08,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.37 | bwd_microstep: 1440.86 | bwd_inner_microstep: 1440.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-10 22:08:10,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.21 | bwd_microstep: 1472.79 | bwd_inner_microstep: 1472.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3552 [2024-06-10 22:08:12,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.68 | bwd_microstep: 1694.57 | bwd_inner_microstep: 1694.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3657 [2024-06-10 22:08:14,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.09 | bwd_microstep: 1367.37 | bwd_inner_microstep: 1367.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461 [2024-06-10 22:08:16,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.68 | bwd_microstep: 1184.47 | bwd_inner_microstep: 1184.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 22:08:18,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1378.63 | bwd_inner_microstep: 1378.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827 [2024-06-10 22:08:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2277 [2024-06-10 22:08:21,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.13 | bwd_microstep: 908.53 | bwd_inner_microstep: 908.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2289 [2024-06-10 22:08:22,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.06 | bwd_microstep: 1003.46 | bwd_inner_microstep: 1003.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-10 22:08:24,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1514.59 | bwd_inner_microstep: 1514.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939 [2024-06-10 22:08:25,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.60 | bwd_microstep: 700.39 | bwd_inner_microstep: 700.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2003 [2024-06-10 22:08:27,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.82 | bwd_microstep: 899.12 | bwd_inner_microstep: 899.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-10 22:08:28,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.26 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831 [2024-06-10 22:08:31,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.82 | bwd_microstep: 1752.16 | bwd_inner_microstep: 1752.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651 [2024-06-10 22:08:33,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.68 | bwd_microstep: 1612.16 | bwd_inner_microstep: 1612.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3772 [2024-06-10 22:08:35,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.63 | bwd_microstep: 1344.66 | bwd_inner_microstep: 1344.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-10 22:08:37,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.24 | bwd_microstep: 1157.91 | bwd_inner_microstep: 1157.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2234 [2024-06-10 22:08:38,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.92 | bwd_microstep: 801.74 | bwd_inner_microstep: 801.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2158 [2024-06-10 22:08:39,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.94 | bwd_microstep: 950.71 | bwd_inner_microstep: 950.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 22:08:41,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.40 | bwd_microstep: 1293.49 | bwd_inner_microstep: 1293.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2040 [2024-06-10 22:08:44,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.09 | optimizer_step: 6.59 [2024-06-10 22:08:44,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.12 | bwd_microstep: 2811.64 | bwd_inner_microstep: 1091.94 | bwd_allreduce_microstep: 1719.64 | step_microstep: 37.80 [2024-06-10 22:08:44,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15578.81 | bwd: 43454.08 | bwd_inner: 41733.55 | bwd_allreduce: 1719.87 | step: 39.24 {'loss': 1.1829, 'learning_rate': 7.639502608896653e-06, 'epoch': 0.72} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866 [2024-06-10 22:08:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.74 | bwd_microstep: 699.00 | bwd_inner_microstep: 698.93 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3966 [2024-06-10 22:08:47,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.08 | bwd_microstep: 1594.68 | bwd_inner_microstep: 1594.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 22:08:49,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.49 | bwd_microstep: 1284.78 | bwd_inner_microstep: 1284.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3794 [2024-06-10 22:08:51,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1380.76 | bwd_inner_microstep: 1380.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 22:08:53,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422 [2024-06-10 22:08:55,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.16 | bwd_microstep: 1280.16 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 22:08:57,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1387.45 | bwd_inner_microstep: 1387.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 22:08:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.49 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 22:09:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500 [2024-06-10 22:09:02,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1415.20 | bwd_inner_microstep: 1415.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 22:09:04,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1384.61 | bwd_inner_microstep: 1384.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-10 22:09:06,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1389.65 | bwd_inner_microstep: 1389.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3507 [2024-06-10 22:09:08,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.30 | bwd_microstep: 1434.54 | bwd_inner_microstep: 1434.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 22:09:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.33 | bwd_microstep: 1391.54 | bwd_inner_microstep: 1391.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3659 [2024-06-10 22:09:12,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.61 | bwd_microstep: 1683.93 | bwd_inner_microstep: 1683.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2073 [2024-06-10 22:09:14,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.36 | bwd_microstep: 947.43 | bwd_inner_microstep: 947.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3700 [2024-06-10 22:09:15,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1234.95 | bwd_inner_microstep: 1234.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3619 [2024-06-10 22:09:17,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.06 | bwd_microstep: 1553.55 | bwd_inner_microstep: 1553.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-10 22:09:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.77 | bwd_microstep: 1649.25 | bwd_inner_microstep: 1649.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3878 [2024-06-10 22:09:22,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.36 | bwd_microstep: 1684.94 | bwd_inner_microstep: 1684.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 22:09:24,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.54 | bwd_microstep: 1160.90 | bwd_inner_microstep: 1160.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-10 22:09:26,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1484.60 | bwd_inner_microstep: 1484.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3528 [2024-06-10 22:09:28,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1551.69 | bwd_inner_microstep: 1551.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3550 [2024-06-10 22:09:30,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.11 | bwd_microstep: 1563.50 | bwd_inner_microstep: 1563.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3545 [2024-06-10 22:09:32,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.18 | bwd_microstep: 1589.94 | bwd_inner_microstep: 1589.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 22:09:34,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.54 | bwd_microstep: 1291.13 | bwd_inner_microstep: 1291.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-10 22:09:36,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.15 | bwd_microstep: 1350.14 | bwd_inner_microstep: 1350.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 22:09:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.04 | bwd_microstep: 694.81 | bwd_inner_microstep: 694.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3584 [2024-06-10 22:09:39,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.37 | bwd_microstep: 1407.47 | bwd_inner_microstep: 1407.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3584 [2024-06-10 22:09:41,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.66 | bwd_microstep: 1333.14 | bwd_inner_microstep: 1333.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2266 [2024-06-10 22:09:42,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.33 | bwd_microstep: 1005.55 | bwd_inner_microstep: 1005.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-10 22:09:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.02 | optimizer_step: 6.63 [2024-06-10 22:09:44,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1598.48 | bwd_inner_microstep: 1525.75 | bwd_allreduce_microstep: 72.68 | step_microstep: 37.57 [2024-06-10 22:09:44,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16232.39 | bwd: 43588.28 | bwd_inner: 43514.64 | bwd_allreduce: 72.94 | step: 39.05 {'loss': 1.1328, 'learning_rate': 7.61001674768098e-06, 'epoch': 0.72} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-10 22:09:46,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.26 | bwd_microstep: 1474.65 | bwd_inner_microstep: 1474.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-10 22:09:49,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1150.83 | bwd_inner_microstep: 1150.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 22:09:51,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.69 | bwd_microstep: 1242.12 | bwd_inner_microstep: 1242.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 22:09:53,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1354.23 | bwd_inner_microstep: 1354.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-10 22:09:55,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1539.35 | bwd_inner_microstep: 1539.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800 [2024-06-10 22:09:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.50 | bwd_microstep: 1352.41 | bwd_inner_microstep: 1352.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846 [2024-06-10 22:09:59,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.43 | bwd_microstep: 1458.78 | bwd_inner_microstep: 1458.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-10 22:10:01,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1344.94 | bwd_inner_microstep: 1344.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 22:10:03,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3739 [2024-06-10 22:10:05,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1430.87 | bwd_inner_microstep: 1430.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 22:10:07,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1511.62 | bwd_inner_microstep: 1511.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2207 [2024-06-10 22:10:08,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.14 | bwd_microstep: 930.18 | bwd_inner_microstep: 930.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1954 [2024-06-10 22:10:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.32 | bwd_microstep: 888.28 | bwd_inner_microstep: 888.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690 [2024-06-10 22:10:11,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1327.18 | bwd_inner_microstep: 1327.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941 [2024-06-10 22:10:12,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.05 | bwd_microstep: 883.82 | bwd_inner_microstep: 883.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 22:10:14,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-10 22:10:16,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.10 | bwd_microstep: 1342.15 | bwd_inner_microstep: 1342.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 22:10:18,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.36 | bwd_microstep: 1383.82 | bwd_inner_microstep: 1383.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-10 22:10:19,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.57 | bwd_microstep: 695.87 | bwd_inner_microstep: 695.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 22:10:21,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.01 | bwd_microstep: 1656.57 | bwd_inner_microstep: 1656.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2923 [2024-06-10 22:10:23,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.11 | bwd_microstep: 1095.80 | bwd_inner_microstep: 1095.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 22:10:25,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.45 | bwd_microstep: 1559.68 | bwd_inner_microstep: 1559.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 22:10:27,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1283.53 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3825 [2024-06-10 22:10:29,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.80 | bwd_microstep: 1752.06 | bwd_inner_microstep: 1752.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-10 22:10:31,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1356.11 | bwd_inner_microstep: 1356.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 22:10:33,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3865 [2024-06-10 22:10:35,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.43 | bwd_microstep: 1664.12 | bwd_inner_microstep: 1664.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-10 22:10:37,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.71 | bwd_microstep: 1448.17 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 22:10:39,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 22:10:41,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.51 | bwd_microstep: 1657.38 | bwd_inner_microstep: 1657.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3901 [2024-06-10 22:10:43,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.65 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3495 [2024-06-10 22:10:47,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 22:10:47,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 3456.66 | bwd_inner_microstep: 1629.10 | bwd_allreduce_microstep: 1827.51 | step_microstep: 37.78 [2024-06-10 22:10:47,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16183.02 | bwd: 45242.52 | bwd_inner: 43414.12 | bwd_allreduce: 1827.74 | step: 39.23 {'loss': 1.2109, 'learning_rate': 7.580574523921906e-06, 'epoch': 0.72} ��█████▏ | 1241/1726 [21:28:17<8:20:30, 61.92s/it] 72%|███████▏ | 1242/1726 [21:29:19<8:18:07, 61.75s/it] 72%|███████▏ | 1242/1726 [21:29:19<8:18:07, 61.75s/it] 72%|███████▏ | 1243/1726 [21:30:21<8:19:28, 62.05s/it] 72%|███████▏ | 1243/1726 [21:30:21<8:19:28, 62.05s/it] 72%|███████▏ | 1244/1726 [21:31:21<8:11:57, 61.24s/it] 72%|███████▏ | 1244/1726 [21:31:21<8:11:57, 61.24s/it] 72%|███████▏ | 1245/1726 [21:32:21<8:08:19, 60.91s/it] 72%|███████▏ | 1245/1726 [21:32:21<8:08:19, 60.91s/it] 72%|███████▏ | 1246/1726 [21:33:24<8:12:52, 61.61s/it] 72%|██�dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402 [2024-06-10 22:10:49,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.78 | bwd_microstep: 1167.86 | bwd_inner_microstep: 1167.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3906 [2024-06-10 22:10:51,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1517.82 | bwd_inner_microstep: 1517.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871 [2024-06-10 22:10:53,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 1662.22 | bwd_inner_microstep: 1662.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3777 [2024-06-10 22:10:55,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1505.31 | bwd_inner_microstep: 1505.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1902 [2024-06-10 22:10:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.54 | bwd_microstep: 747.08 | bwd_inner_microstep: 747.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-10 22:10:59,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1530.90 | bwd_inner_microstep: 1530.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-10 22:11:00,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.61 | bwd_microstep: 789.50 | bwd_inner_microstep: 789.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239 [2024-06-10 22:11:01,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.29 | bwd_microstep: 961.62 | bwd_inner_microstep: 961.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3499 [2024-06-10 22:11:03,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.93 | bwd_microstep: 1428.00 | bwd_inner_microstep: 1427.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1922 [2024-06-10 22:11:04,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 791.16 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 22:11:06,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.04 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 22:11:08,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-10 22:11:10,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.67 | bwd_microstep: 1417.68 | bwd_inner_microstep: 1417.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-10 22:11:12,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.96 | bwd_microstep: 1408.48 | bwd_inner_microstep: 1408.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3466 [2024-06-10 22:11:14,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.18 | bwd_microstep: 1243.91 | bwd_inner_microstep: 1243.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1909 [2024-06-10 22:11:15,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.35 | bwd_microstep: 715.04 | bwd_inner_microstep: 715.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943 [2024-06-10 22:11:16,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.75 | bwd_microstep: 1192.58 | bwd_inner_microstep: 1192.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-10 22:11:18,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.86 | bwd_microstep: 1509.99 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 22:11:20,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.66 | bwd_microstep: 795.70 | bwd_inner_microstep: 795.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3785 [2024-06-10 22:11:21,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3626 [2024-06-10 22:11:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.61 | bwd_microstep: 1471.07 | bwd_inner_microstep: 1471.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 22:11:25,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1280.25 | bwd_inner_microstep: 1280.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2429 [2024-06-10 22:11:26,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.04 | bwd_microstep: 941.22 | bwd_inner_microstep: 941.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-10 22:11:28,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.46 | bwd_microstep: 1401.79 | bwd_inner_microstep: 1401.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 22:11:31,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1653.75 | bwd_inner_microstep: 1653.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 818 [2024-06-10 22:11:31,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 131.77 | bwd_microstep: 342.37 | bwd_inner_microstep: 342.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3804 [2024-06-10 22:11:33,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.57 | bwd_microstep: 1717.47 | bwd_inner_microstep: 1717.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3793 [2024-06-10 22:11:36,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.35 | bwd_microstep: 1747.30 | bwd_inner_microstep: 1747.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 22:11:38,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1425.22 | bwd_inner_microstep: 1425.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-10 22:11:39,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.42 | bwd_microstep: 972.28 | bwd_inner_microstep: 972.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-10 22:11:41,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.10 | bwd_microstep: 1593.89 | bwd_inner_microstep: 1593.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809 [2024-06-10 22:11:50,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.18 | optimizer_step: 6.61 [2024-06-10 22:11:50,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.18 | bwd_microstep: 7717.68 | bwd_inner_microstep: 1565.46 | bwd_allreduce_microstep: 6152.16 | step_microstep: 38.28 [2024-06-10 22:11:50,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.58 | bwd: 46796.51 | bwd_inner: 40643.44 | bwd_allreduce: 6152.40 | step: 39.74 {'loss': 1.2112, 'learning_rate': 7.5511760413148e-06, 'epoch': 0.72} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3388 [2024-06-10 22:11:51,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.21 | bwd_microstep: 1235.05 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-10 22:11:53,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.66 | bwd_microstep: 1280.93 | bwd_inner_microstep: 1280.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3834 [2024-06-10 22:11:55,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.15 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 22:11:57,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.65 | bwd_microstep: 1240.38 | bwd_inner_microstep: 1240.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-10 22:11:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1243.14 | bwd_inner_microstep: 1243.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2633 [2024-06-10 22:12:00,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.52 | bwd_microstep: 1066.24 | bwd_inner_microstep: 1066.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3571 [2024-06-10 22:12:02,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.12 | bwd_microstep: 1203.55 | bwd_inner_microstep: 1203.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 22:12:04,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2933 [2024-06-10 22:12:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.59 | bwd_microstep: 1177.72 | bwd_inner_microstep: 1177.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 22:12:07,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.17 | bwd_microstep: 1524.15 | bwd_inner_microstep: 1524.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3506 [2024-06-10 22:12:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.43 | bwd_microstep: 1331.54 | bwd_inner_microstep: 1331.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 22:12:11,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.03 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635 [2024-06-10 22:12:13,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.87 | bwd_microstep: 1605.89 | bwd_inner_microstep: 1605.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-10 22:12:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1627.60 | bwd_inner_microstep: 1627.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 22:12:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1345.77 | bwd_inner_microstep: 1345.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2094 [2024-06-10 22:12:19,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.87 | bwd_microstep: 880.05 | bwd_inner_microstep: 880.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3683 [2024-06-10 22:12:20,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.20 | bwd_microstep: 1423.91 | bwd_inner_microstep: 1423.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638 [2024-06-10 22:12:23,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1574.15 | bwd_inner_microstep: 1574.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-10 22:12:24,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.66 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-10 22:12:26,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.16 | bwd_microstep: 1422.31 | bwd_inner_microstep: 1422.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2610 [2024-06-10 22:12:28,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.37 | bwd_microstep: 1001.33 | bwd_inner_microstep: 1001.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 874 [2024-06-10 22:12:28,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 152.13 | bwd_microstep: 397.65 | bwd_inner_microstep: 397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2947 [2024-06-10 22:12:30,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.15 | bwd_microstep: 1194.02 | bwd_inner_microstep: 1194.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3819 [2024-06-10 22:12:32,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.05 | bwd_microstep: 1594.81 | bwd_inner_microstep: 1594.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3819 [2024-06-10 22:12:34,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1407.83 | bwd_inner_microstep: 1407.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-10 22:12:36,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1253.04 | bwd_inner_microstep: 1253.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 22:12:38,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1606.90 | bwd_inner_microstep: 1606.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 22:12:40,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.02 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 22:12:42,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.18 | bwd_microstep: 1373.42 | bwd_inner_microstep: 1373.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2078 [2024-06-10 22:12:43,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.16 | bwd_microstep: 916.87 | bwd_inner_microstep: 916.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 22:12:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.38 | bwd_microstep: 1750.44 | bwd_inner_microstep: 1750.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-10 22:12:49,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.17 | optimizer_step: 6.62 [2024-06-10 22:12:49,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 3441.67 | bwd_inner_microstep: 1749.82 | bwd_allreduce_microstep: 1691.80 | step_microstep: 38.98 [2024-06-10 22:12:49,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15686.71 | bwd: 43818.71 | bwd_inner: 42125.91 | bwd_allreduce: 1692.03 | step: 40.47 {'loss': 1.2234, 'learning_rate': 7.521821403400955e-06, 'epoch': 0.72} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-10 22:12:52,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1473.58 | bwd_inner_microstep: 1473.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3994 [2024-06-10 22:12:54,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.25 | bwd_microstep: 1502.00 | bwd_inner_microstep: 1501.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 22:12:56,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1479.81 | bwd_inner_microstep: 1479.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 22:12:58,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.41 | bwd_microstep: 1448.74 | bwd_inner_microstep: 1448.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 22:13:00,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.35 | bwd_microstep: 1551.48 | bwd_inner_microstep: 1551.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 22:13:01,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1243.84 | bwd_inner_microstep: 1243.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883 [2024-06-10 22:13:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.00 | bwd_microstep: 679.51 | bwd_inner_microstep: 679.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2100 [2024-06-10 22:13:04,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.38 | bwd_microstep: 823.16 | bwd_inner_microstep: 823.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-10 22:13:05,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.54 | bwd_microstep: 789.09 | bwd_inner_microstep: 789.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 22:13:07,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3694 [2024-06-10 22:13:09,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.49 | bwd_microstep: 1627.09 | bwd_inner_microstep: 1627.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-10 22:13:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.41 | bwd_microstep: 1528.30 | bwd_inner_microstep: 1528.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2133 [2024-06-10 22:13:12,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.53 | bwd_microstep: 971.55 | bwd_inner_microstep: 971.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 22:13:14,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1494.11 | bwd_inner_microstep: 1494.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-10 22:13:16,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.17 | bwd_microstep: 1379.22 | bwd_inner_microstep: 1379.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863 [2024-06-10 22:13:19,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.85 | bwd_microstep: 1659.70 | bwd_inner_microstep: 1659.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459 [2024-06-10 22:13:21,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.40 | bwd_microstep: 1438.20 | bwd_inner_microstep: 1438.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 22:13:22,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-10 22:13:25,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1496.56 | bwd_inner_microstep: 1496.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-10 22:13:27,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1552.27 | bwd_inner_microstep: 1552.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 22:13:29,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1507.59 | bwd_inner_microstep: 1507.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 22:13:31,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.74 | bwd_microstep: 1646.51 | bwd_inner_microstep: 1646.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-10 22:13:33,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 22:13:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1504.81 | bwd_inner_microstep: 1504.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-10 22:13:37,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1391.89 | bwd_inner_microstep: 1391.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-10 22:13:39,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1251.88 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-10 22:13:41,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1499.81 | bwd_inner_microstep: 1499.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2891 [2024-06-10 22:13:43,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.73 | bwd_microstep: 1184.20 | bwd_inner_microstep: 1184.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3769 [2024-06-10 22:13:44,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.33 | bwd_microstep: 1376.85 | bwd_inner_microstep: 1376.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 22:13:46,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.86 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585 [2024-06-10 22:13:48,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.61 | bwd_microstep: 1526.76 | bwd_inner_microstep: 1526.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3799 [2024-06-10 22:13:51,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.06 | optimizer_step: 6.60 [2024-06-10 22:13:51,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1689.20 | bwd_inner_microstep: 1681.54 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.36 [2024-06-10 22:13:51,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16533.18 | bwd: 44283.85 | bwd_inner: 44275.34 | bwd_allreduce: 7.84 | step: 38.92 {'loss': 1.1818, 'learning_rate': 7.492510713567265e-06, 'epoch': 0.72} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470 [2024-06-10 22:13:53,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1402.62 | bwd_inner_microstep: 1402.45 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-10 22:13:54,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1145.27 | bwd_inner_microstep: 1145.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3394 [2024-06-10 22:13:56,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1341.31 | bwd_inner_microstep: 1341.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 22:13:58,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1380.16 | bwd_inner_microstep: 1380.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-10 22:14:00,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1533.29 | bwd_inner_microstep: 1533.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-10 22:14:02,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1247.34 | bwd_inner_microstep: 1247.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 22:14:04,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1381.59 | bwd_inner_microstep: 1381.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2196 [2024-06-10 22:14:05,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.93 | bwd_microstep: 953.65 | bwd_inner_microstep: 953.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3421 [2024-06-10 22:14:07,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.15 | bwd_microstep: 1293.91 | bwd_inner_microstep: 1293.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1894 [2024-06-10 22:14:08,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.44 | bwd_microstep: 712.52 | bwd_inner_microstep: 712.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-10 22:14:10,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.42 | bwd_microstep: 1276.66 | bwd_inner_microstep: 1276.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 22:14:11,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1243.91 | bwd_inner_microstep: 1243.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505 [2024-06-10 22:14:13,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1430.15 | bwd_inner_microstep: 1430.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1920 [2024-06-10 22:14:14,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.95 | bwd_microstep: 778.68 | bwd_inner_microstep: 778.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2728 [2024-06-10 22:14:16,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.13 | bwd_microstep: 1139.01 | bwd_inner_microstep: 1138.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138 [2024-06-10 22:14:17,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.81 | bwd_microstep: 832.03 | bwd_inner_microstep: 832.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 22:14:19,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1555.03 | bwd_inner_microstep: 1555.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 22:14:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.12 | bwd_microstep: 1298.27 | bwd_inner_microstep: 1298.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 22:14:23,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1390.66 | bwd_inner_microstep: 1390.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-10 22:14:25,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.24 | bwd_microstep: 1432.75 | bwd_inner_microstep: 1432.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-10 22:14:27,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.31 | bwd_microstep: 1354.07 | bwd_inner_microstep: 1354.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997 [2024-06-10 22:14:28,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.82 | bwd_microstep: 706.35 | bwd_inner_microstep: 706.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-10 22:14:30,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1512.54 | bwd_inner_microstep: 1512.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 22:14:32,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1279.85 | bwd_inner_microstep: 1279.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716 [2024-06-10 22:14:33,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.22 | bwd_microstep: 1337.01 | bwd_inner_microstep: 1336.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 22:14:35,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.32 | bwd_microstep: 1404.78 | bwd_inner_microstep: 1404.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 22:14:37,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1377.75 | bwd_inner_microstep: 1377.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-10 22:14:39,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.98 | bwd_microstep: 1453.79 | bwd_inner_microstep: 1453.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-10 22:14:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1550.12 | bwd_inner_microstep: 1550.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 22:14:43,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.19 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2201 [2024-06-10 22:14:45,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.44 | bwd_microstep: 985.79 | bwd_inner_microstep: 985.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 22:14:53,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.56 [2024-06-10 22:14:53,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 7515.15 | bwd_inner_microstep: 1566.40 | bwd_allreduce_microstep: 5948.68 | step_microstep: 38.81 [2024-06-10 22:14:53,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15234.66 | bwd: 46621.49 | bwd_inner: 40671.75 | bwd_allreduce: 5948.99 | step: 40.34 {'loss': 1.1904, 'learning_rate': 7.463244075045815e-06, 'epoch': 0.72} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-10 22:14:55,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.58 | bwd_microstep: 1363.94 | bwd_inner_microstep: 1363.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3926 [2024-06-10 22:14:57,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.59 | bwd_microstep: 1587.21 | bwd_inner_microstep: 1587.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849 [2024-06-10 22:14:59,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.27 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 22:15:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.01 | bwd_microstep: 1649.23 | bwd_inner_microstep: 1649.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-10 22:15:03,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.99 | bwd_microstep: 1463.84 | bwd_inner_microstep: 1463.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-10 22:15:05,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.88 | bwd_microstep: 1274.51 | bwd_inner_microstep: 1274.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3487 [2024-06-10 22:15:07,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1216.45 | bwd_inner_microstep: 1216.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 22:15:09,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1382.60 | bwd_inner_microstep: 1382.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3702 [2024-06-10 22:15:11,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1457.87 | bwd_inner_microstep: 1457.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3711 [2024-06-10 22:15:13,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.55 | bwd_microstep: 1655.89 | bwd_inner_microstep: 1655.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-10 22:15:15,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.32 | bwd_microstep: 1288.60 | bwd_inner_microstep: 1288.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3678 [2024-06-10 22:15:17,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.70 | bwd_microstep: 1619.83 | bwd_inner_microstep: 1619.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3542 [2024-06-10 22:15:19,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1561.57 | bwd_inner_microstep: 1561.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 22:15:21,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1286.80 | bwd_inner_microstep: 1286.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3657 [2024-06-10 22:15:23,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.15 | bwd_microstep: 1662.01 | bwd_inner_microstep: 1661.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-10 22:15:25,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.29 | bwd_microstep: 1595.07 | bwd_inner_microstep: 1595.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-10 22:15:27,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1341.53 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 22:15:29,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.77 | bwd_microstep: 1490.91 | bwd_inner_microstep: 1490.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-10 22:15:30,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.80 | bwd_microstep: 806.04 | bwd_inner_microstep: 806.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845 [2024-06-10 22:15:33,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.24 | bwd_microstep: 1666.90 | bwd_inner_microstep: 1666.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-10 22:15:34,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.75 | bwd_microstep: 1184.18 | bwd_inner_microstep: 1184.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-10 22:15:36,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 975.34 | bwd_inner_microstep: 975.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3886 [2024-06-10 22:15:38,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.78 | bwd_microstep: 1782.50 | bwd_inner_microstep: 1782.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-10 22:15:40,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1398.80 | bwd_inner_microstep: 1398.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-10 22:15:43,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.38 | bwd_microstep: 1750.65 | bwd_inner_microstep: 1750.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-10 22:15:45,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.35 | bwd_microstep: 1645.64 | bwd_inner_microstep: 1645.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1968 [2024-06-10 22:15:46,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.72 | bwd_microstep: 703.53 | bwd_inner_microstep: 703.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2892 [2024-06-10 22:15:47,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.42 | bwd_microstep: 1025.40 | bwd_inner_microstep: 1025.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-10 22:15:49,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.36 | bwd_microstep: 1329.71 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3565 [2024-06-10 22:15:51,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.60 | bwd_microstep: 1423.13 | bwd_inner_microstep: 1423.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-10 22:15:53,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1512.05 | bwd_inner_microstep: 1512.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2064 [2024-06-10 22:15:55,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-10 22:15:55,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.59 | bwd_microstep: 1958.15 | bwd_inner_microstep: 918.86 | bwd_allreduce_microstep: 1039.25 | step_microstep: 38.06 [2024-06-10 22:15:55,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16626.29 | bwd: 45617.26 | bwd_inner: 44577.12 | bwd_allreduce: 1039.48 | step: 39.51 {'loss': 1.1749, 'learning_rate': 7.434021590913573e-06, 'epoch': 0.72} �████▏ | 1246/1726 [21:33:24<8:12:52, 61.61s/it] 72%|███████▏ | 1247/1726 [21:34:26<8:13:23, 61.80s/it] 72%|███████▏ | 1247/1726 [21:34:26<8:13:23, 61.80s/it] 72%|███████▏ | 1248/1726 [21:35:26<8:07:39, 61.21s/it] 72%|███████▏ | 1248/1726 [21:35:26<8:07:39, 61.21s/it] 72%|███████▏ | 1249/1726 [21:36:27<8:06:29, 61.19s/it] 72%|███████▏ | 1249/1726 [21:36:27<8:06:29, 61.19s/it] 72%|███████▏ | 1250/1726 [21:37:30<8:07:49, 61.49s/it] 72%|███████▏ | 1250/1726 [21:37:30<8:07:49, 61.49s/it] 72%|███████▏ | 1251/1726 [21:38:32<8:09:23, 61.82s/it] 72%|████dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 22:15:57,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.16 | bwd_microstep: 1471.14 | bwd_inner_microstep: 1471.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3968 [2024-06-10 22:15:59,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.70 | bwd_microstep: 1430.30 | bwd_inner_microstep: 1430.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3880 [2024-06-10 22:16:02,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.70 | bwd_microstep: 1517.41 | bwd_inner_microstep: 1517.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 22:16:03,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.83 | bwd_microstep: 1347.19 | bwd_inner_microstep: 1346.03 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511 [2024-06-10 22:16:05,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1321.32 | bwd_inner_microstep: 1321.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3524 [2024-06-10 22:16:07,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1228.39 | bwd_inner_microstep: 1228.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-10 22:16:09,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1352.00 | bwd_inner_microstep: 1351.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-10 22:16:11,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1247.61 | bwd_inner_microstep: 1247.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440 [2024-06-10 22:16:12,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.07 | bwd_microstep: 1155.02 | bwd_inner_microstep: 1155.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3500 [2024-06-10 22:16:14,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1247.98 | bwd_inner_microstep: 1247.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 22:16:16,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.90 | bwd_microstep: 1259.74 | bwd_inner_microstep: 1259.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 22:16:17,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.58 | bwd_microstep: 1285.90 | bwd_inner_microstep: 1285.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3446 [2024-06-10 22:16:19,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1337.21 | bwd_inner_microstep: 1337.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 22:16:21,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1389.21 | bwd_inner_microstep: 1389.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 22:16:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.23 | bwd_microstep: 1380.99 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2157 [2024-06-10 22:16:24,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.60 | bwd_microstep: 1044.36 | bwd_inner_microstep: 1044.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 22:16:27,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.97 | bwd_microstep: 1655.37 | bwd_inner_microstep: 1655.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3620 [2024-06-10 22:16:29,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.51 | bwd_microstep: 1648.38 | bwd_inner_microstep: 1648.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-10 22:16:31,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-10 22:16:34,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.94 | bwd_microstep: 1710.72 | bwd_inner_microstep: 1710.57 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-10 22:16:35,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1285.66 | bwd_inner_microstep: 1285.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 22:16:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.26 | bwd_microstep: 1292.54 | bwd_inner_microstep: 1292.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-10 22:16:39,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1378.26 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514 [2024-06-10 22:16:41,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.63 | bwd_microstep: 1195.84 | bwd_inner_microstep: 1195.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 22:16:42,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.54 | bwd_microstep: 877.39 | bwd_inner_microstep: 877.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3460 [2024-06-10 22:16:44,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.90 | bwd_microstep: 1213.82 | bwd_inner_microstep: 1213.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3558 [2024-06-10 22:16:45,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.02 | bwd_microstep: 1207.74 | bwd_inner_microstep: 1207.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3804 [2024-06-10 22:16:48,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1686.03 | bwd_inner_microstep: 1686.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2208 [2024-06-10 22:16:49,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.81 | bwd_microstep: 957.02 | bwd_inner_microstep: 956.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-10 22:16:51,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1447.08 | bwd_inner_microstep: 1447.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806 [2024-06-10 22:16:53,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.84 | bwd_microstep: 1604.31 | bwd_inner_microstep: 1604.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3769 [2024-06-10 22:16:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.44 | optimizer_gradients: 4.27 | optimizer_step: 6.59 [2024-06-10 22:16:59,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.32 | bwd_microstep: 4989.67 | bwd_inner_microstep: 2202.87 | bwd_allreduce_microstep: 2786.74 | step_microstep: 39.12 [2024-06-10 22:16:59,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16271.60 | bwd: 46767.10 | bwd_inner: 43979.23 | bwd_allreduce: 2787.07 | step: 40.78 {'loss': 1.1265, 'learning_rate': 7.404843364091951e-06, 'epoch': 0.73} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3467 [2024-06-10 22:17:01,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1514.02 | bwd_inner_microstep: 1514.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3920 [2024-06-10 22:17:03,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1326.55 | bwd_inner_microstep: 1326.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798 [2024-06-10 22:17:05,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.68 | bwd_microstep: 1441.82 | bwd_inner_microstep: 1441.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1275 [2024-06-10 22:17:05,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 186.31 | bwd_microstep: 487.01 | bwd_inner_microstep: 486.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-10 22:17:07,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.42 | bwd_microstep: 1432.42 | bwd_inner_microstep: 1432.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 22:17:09,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-10 22:17:11,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.23 | bwd_microstep: 1184.63 | bwd_inner_microstep: 1184.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904 [2024-06-10 22:17:12,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.58 | bwd_microstep: 682.98 | bwd_inner_microstep: 682.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3756 [2024-06-10 22:17:14,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.39 | bwd_microstep: 1644.03 | bwd_inner_microstep: 1644.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486 [2024-06-10 22:17:16,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.22 | bwd_microstep: 1438.86 | bwd_inner_microstep: 1438.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2884 [2024-06-10 22:17:18,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.71 | bwd_microstep: 1122.37 | bwd_inner_microstep: 1122.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695 [2024-06-10 22:17:20,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.03 | bwd_microstep: 1615.94 | bwd_inner_microstep: 1615.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1959 [2024-06-10 22:17:21,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.08 | bwd_microstep: 886.89 | bwd_inner_microstep: 886.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-10 22:17:23,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1444.44 | bwd_inner_microstep: 1444.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3656 [2024-06-10 22:17:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.15 | bwd_microstep: 1443.55 | bwd_inner_microstep: 1443.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141 [2024-06-10 22:17:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.29 | bwd_microstep: 834.87 | bwd_inner_microstep: 834.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1921 [2024-06-10 22:17:27,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.73 | bwd_microstep: 726.71 | bwd_inner_microstep: 726.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 22:17:29,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.59 | bwd_microstep: 1280.15 | bwd_inner_microstep: 1280.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-10 22:17:31,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.89 | bwd_microstep: 1553.94 | bwd_inner_microstep: 1553.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-10 22:17:33,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3528 [2024-06-10 22:17:35,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.38 | bwd_microstep: 1198.96 | bwd_inner_microstep: 1198.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 22:17:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443 [2024-06-10 22:17:39,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1443.97 | bwd_inner_microstep: 1443.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 22:17:41,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.79 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-10 22:17:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1314.55 | bwd_inner_microstep: 1314.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-10 22:17:45,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.61 | bwd_microstep: 1653.56 | bwd_inner_microstep: 1653.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818 [2024-06-10 22:17:47,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.23 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2929 [2024-06-10 22:17:49,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.78 | bwd_microstep: 1190.36 | bwd_inner_microstep: 1190.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 22:17:50,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 22:17:52,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.56 | bwd_microstep: 1548.44 | bwd_inner_microstep: 1548.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-10 22:17:55,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.86 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3613 [2024-06-10 22:18:02,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-10 22:18:02,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.56 | bwd_microstep: 6509.04 | bwd_inner_microstep: 1541.82 | bwd_allreduce_microstep: 4967.16 | step_microstep: 38.00 [2024-06-10 22:18:02,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15657.37 | bwd: 46840.23 | bwd_inner: 41872.18 | bwd_allreduce: 4967.39 | step: 39.50 {'loss': 1.1938, 'learning_rate': 7.37570949734653e-06, 'epoch': 0.73} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-10 22:18:04,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.57 | bwd_microstep: 1460.87 | bwd_inner_microstep: 1460.68 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3919 [2024-06-10 22:18:06,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1517.80 | bwd_inner_microstep: 1517.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3880 [2024-06-10 22:18:08,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1579.30 | bwd_inner_microstep: 1579.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 22:18:10,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.83 | bwd_microstep: 1178.84 | bwd_inner_microstep: 1178.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752 [2024-06-10 22:18:11,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1335.57 | bwd_inner_microstep: 1335.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-10 22:18:12,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.49 | bwd_microstep: 787.23 | bwd_inner_microstep: 787.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3480 [2024-06-10 22:18:14,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.07 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-10 22:18:16,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1247.37 | bwd_inner_microstep: 1247.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-10 22:18:18,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.46 | bwd_microstep: 1628.96 | bwd_inner_microstep: 1628.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 22:18:20,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1487.58 | bwd_inner_microstep: 1487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 22:18:22,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.17 | bwd_microstep: 1478.18 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3591 [2024-06-10 22:18:25,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1702.48 | bwd_inner_microstep: 1702.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-10 22:18:26,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.71 | bwd_microstep: 1299.76 | bwd_inner_microstep: 1299.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2030 [2024-06-10 22:18:27,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 744.95 | bwd_inner_microstep: 744.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 22:18:29,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1380.94 | bwd_inner_microstep: 1380.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 22:18:31,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.51 | bwd_microstep: 1347.97 | bwd_inner_microstep: 1347.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3734 [2024-06-10 22:18:33,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.59 | bwd_microstep: 1336.33 | bwd_inner_microstep: 1336.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-10 22:18:35,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1281.72 | bwd_inner_microstep: 1281.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615 [2024-06-10 22:18:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.66 | bwd_microstep: 1342.68 | bwd_inner_microstep: 1342.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 22:18:39,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1509.47 | bwd_inner_microstep: 1509.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005 [2024-06-10 22:18:40,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.87 | bwd_microstep: 838.19 | bwd_inner_microstep: 838.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3535 [2024-06-10 22:18:42,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1228.39 | bwd_inner_microstep: 1228.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678 [2024-06-10 22:18:44,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1426.86 | bwd_inner_microstep: 1426.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463 [2024-06-10 22:18:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1314.28 | bwd_inner_microstep: 1314.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-10 22:18:47,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3543 [2024-06-10 22:18:49,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1539.98 | bwd_inner_microstep: 1539.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3534 [2024-06-10 22:18:52,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1586.77 | bwd_inner_microstep: 1586.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816 [2024-06-10 22:18:54,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.41 | bwd_microstep: 1756.13 | bwd_inner_microstep: 1756.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 22:18:56,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1647.01 | bwd_inner_microstep: 1646.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2015 [2024-06-10 22:18:58,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.17 | bwd_microstep: 930.88 | bwd_inner_microstep: 930.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2538 [2024-06-10 22:18:59,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.47 | bwd_microstep: 1089.79 | bwd_inner_microstep: 1089.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-10 22:19:02,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.00 | optimizer_step: 6.57 [2024-06-10 22:19:02,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.94 | bwd_microstep: 2293.00 | bwd_inner_microstep: 1798.65 | bwd_allreduce_microstep: 494.30 | step_microstep: 37.28 [2024-06-10 22:19:02,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16188.30 | bwd: 43884.24 | bwd_inner: 43388.90 | bwd_allreduce: 494.61 | step: 38.98 {'loss': 1.1942, 'learning_rate': 7.3466200932866334e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 22:19:04,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.25 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 22:19:06,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1345.12 | bwd_inner_microstep: 1345.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-10 22:19:08,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1482.10 | bwd_inner_microstep: 1482.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 22:19:10,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.07 | bwd_microstep: 1246.18 | bwd_inner_microstep: 1246.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 22:19:11,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.44 | bwd_microstep: 1379.51 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3775 [2024-06-10 22:19:14,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.85 | bwd_microstep: 1541.37 | bwd_inner_microstep: 1541.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 22:19:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.65 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-10 22:19:17,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.96 | bwd_microstep: 1384.50 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1874 [2024-06-10 22:19:18,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 709.51 | bwd_inner_microstep: 709.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3404 [2024-06-10 22:19:20,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.86 | bwd_microstep: 1292.82 | bwd_inner_microstep: 1292.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2707 [2024-06-10 22:19:22,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.36 | bwd_microstep: 1208.05 | bwd_inner_microstep: 1208.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-10 22:19:24,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.66 | bwd_microstep: 1442.80 | bwd_inner_microstep: 1442.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3495 [2024-06-10 22:19:26,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.10 | bwd_microstep: 1548.21 | bwd_inner_microstep: 1548.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-10 22:19:28,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1510.61 | bwd_inner_microstep: 1510.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 22:19:30,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1390.25 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-10 22:19:31,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.68 | bwd_microstep: 685.18 | bwd_inner_microstep: 685.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 22:19:33,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1256.33 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477 [2024-06-10 22:19:34,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.12 | bwd_microstep: 1311.29 | bwd_inner_microstep: 1311.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-10 22:19:37,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.21 | bwd_microstep: 1524.75 | bwd_inner_microstep: 1524.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-10 22:19:38,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1410.96 | bwd_inner_microstep: 1410.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043 [2024-06-10 22:19:40,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.66 | bwd_microstep: 807.66 | bwd_inner_microstep: 807.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 22:19:41,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1351.01 | bwd_inner_microstep: 1350.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 22:19:43,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.77 | bwd_microstep: 1392.27 | bwd_inner_microstep: 1392.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-10 22:19:45,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.54 | bwd_microstep: 1529.90 | bwd_inner_microstep: 1529.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 22:19:48,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 1658.16 | bwd_inner_microstep: 1658.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 22:19:50,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1415.68 | bwd_inner_microstep: 1415.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2651 [2024-06-10 22:19:51,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.69 | bwd_microstep: 1006.77 | bwd_inner_microstep: 1006.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-10 22:19:53,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.65 | bwd_microstep: 1348.42 | bwd_inner_microstep: 1348.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-10 22:19:55,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-10 22:19:57,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1496.94 | bwd_inner_microstep: 1496.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766 [2024-06-10 22:19:59,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.93 | bwd_microstep: 1444.22 | bwd_inner_microstep: 1444.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3587 [2024-06-10 22:20:03,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-10 22:20:03,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 3754.59 | bwd_inner_microstep: 1729.07 | bwd_allreduce_microstep: 2025.47 | step_microstep: 37.94 [2024-06-10 22:20:03,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15986.16 | bwd: 44986.27 | bwd_inner: 42959.90 | bwd_allreduce: 2025.69 | step: 39.36 {'loss': 1.2394, 'learning_rate': 7.31757525436499e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-10 22:20:05,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1365.57 | bwd_inner_microstep: 1365.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 22:20:07,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1376.06 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 22:20:09,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.68 | bwd_microstep: 1275.14 | bwd_inner_microstep: 1275.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3877 [2024-06-10 22:20:11,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.60 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 22:20:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.35 | bwd_microstep: 675.62 | bwd_inner_microstep: 675.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-10 22:20:13,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.85 | bwd_microstep: 708.66 | bwd_inner_microstep: 708.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-10 22:20:15,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.58 | bwd_microstep: 1499.55 | bwd_inner_microstep: 1499.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-10 22:20:17,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.62 | bwd_microstep: 1345.62 | bwd_inner_microstep: 1345.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4084 [2024-06-10 22:20:19,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.62 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 22:20:21,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 22:20:23,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1285.42 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-10 22:20:25,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.60 | bwd_microstep: 1388.74 | bwd_inner_microstep: 1388.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:20:26,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.53 | bwd_microstep: 1287.54 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-10 22:20:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.20 | bwd_microstep: 893.35 | bwd_inner_microstep: 893.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2393 [2024-06-10 22:20:29,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.94 | bwd_microstep: 1028.44 | bwd_inner_microstep: 1028.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 22:20:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.89 | bwd_microstep: 1484.54 | bwd_inner_microstep: 1484.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3645 [2024-06-10 22:20:33,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 1647.98 | bwd_inner_microstep: 1647.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 22:20:35,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1286.93 | bwd_inner_microstep: 1286.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-10 22:20:37,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.18 | bwd_microstep: 1289.66 | bwd_inner_microstep: 1289.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647 [2024-06-10 22:20:39,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.28 | bwd_microstep: 1516.46 | bwd_inner_microstep: 1516.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-10 22:20:41,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.42 | bwd_microstep: 1494.28 | bwd_inner_microstep: 1494.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-10 22:20:43,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.71 | bwd_microstep: 1405.71 | bwd_inner_microstep: 1405.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2060 [2024-06-10 22:20:44,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.79 | bwd_microstep: 916.69 | bwd_inner_microstep: 916.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-10 22:20:45,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.47 | bwd_microstep: 731.74 | bwd_inner_microstep: 731.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-10 22:20:47,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1495.42 | bwd_inner_microstep: 1495.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-10 22:20:49,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1293.24 | bwd_inner_microstep: 1293.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3552 [2024-06-10 22:20:51,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.20 | bwd_microstep: 1358.48 | bwd_inner_microstep: 1358.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 22:20:53,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.28 | bwd_microstep: 1505.41 | bwd_inner_microstep: 1505.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-10 22:20:55,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.42 | bwd_microstep: 1550.83 | bwd_inner_microstep: 1550.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-10 22:20:57,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2054 [2024-06-10 22:20:58,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.63 | bwd_microstep: 911.38 | bwd_inner_microstep: 911.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-10 22:21:04,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.15 | optimizer_step: 6.57 [2024-06-10 22:21:04,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 4649.35 | bwd_inner_microstep: 1639.51 | bwd_allreduce_microstep: 3009.78 | step_microstep: 39.27 [2024-06-10 22:21:04,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15498.41 | bwd: 44526.91 | bwd_inner: 41516.19 | bwd_allreduce: 3010.01 | step: 40.79 {'loss': 1.1534, 'learning_rate': 7.2885750828773694e-06, 'epoch': 0.73} ███▏ | 1251/1726 [21:38:32<8:09:23, 61.82s/it] 73%|███████▎ | 1252/1726 [21:39:36<8:12:04, 62.29s/it] 73%|███████▎ | 1252/1726 [21:39:36<8:12:04, 62.29s/it] 73%|███████▎ | 1253/1726 [21:40:38<8:12:17, 62.45s/it] 73%|███████▎ | 1253/1726 [21:40:38<8:12:17, 62.45s/it] 73%|███████▎ | 1254/1726 [21:41:39<8:06:27, 61.84s/it] 73%|███████▎ | 1254/1726 [21:41:39<8:06:27, 61.84s/it] 73%|███████▎ | 1255/1726 [21:42:40<8:04:10, 61.68s/it] 73%|███████▎ | 1255/1726 [21:42:40<8:04:10, 61.68s/it] 73%|███████▎ | 1256/1726 [21:43:40<8:00:02, 61.28s/it] 73%|█████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 22:21:06,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 22:21:07,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.15 | bwd_microstep: 1241.29 | bwd_inner_microstep: 1241.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 22:21:09,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1342.33 | bwd_inner_microstep: 1342.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3799 [2024-06-10 22:21:11,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.21 | bwd_microstep: 1256.27 | bwd_inner_microstep: 1256.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3759 [2024-06-10 22:21:13,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.55 | bwd_microstep: 1638.82 | bwd_inner_microstep: 1638.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 22:21:15,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.75 | bwd_microstep: 1384.38 | bwd_inner_microstep: 1384.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:21:17,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.37 | bwd_microstep: 1286.47 | bwd_inner_microstep: 1286.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 22:21:19,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1382.81 | bwd_inner_microstep: 1382.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-10 22:21:21,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.54 | bwd_microstep: 1379.66 | bwd_inner_microstep: 1379.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3628 [2024-06-10 22:21:23,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.57 | bwd_microstep: 1564.70 | bwd_inner_microstep: 1564.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-10 22:21:25,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1473.09 | bwd_inner_microstep: 1473.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3677 [2024-06-10 22:21:27,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.99 | bwd_microstep: 1719.70 | bwd_inner_microstep: 1719.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-10 22:21:29,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.08 | bwd_microstep: 1343.16 | bwd_inner_microstep: 1343.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3454 [2024-06-10 22:21:31,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1313.87 | bwd_inner_microstep: 1313.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3672 [2024-06-10 22:21:33,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522 [2024-06-10 22:21:35,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1458.67 | bwd_inner_microstep: 1458.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639 [2024-06-10 22:21:37,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.49 | bwd_microstep: 1409.01 | bwd_inner_microstep: 1408.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529 [2024-06-10 22:21:38,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.16 | bwd_microstep: 1196.37 | bwd_inner_microstep: 1196.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-10 22:21:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.65 | bwd_microstep: 1280.50 | bwd_inner_microstep: 1280.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 22:21:42,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1382.82 | bwd_inner_microstep: 1382.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-10 22:21:44,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 22:21:46,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1256.47 | bwd_inner_microstep: 1256.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2177 [2024-06-10 22:21:47,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.83 | bwd_microstep: 857.61 | bwd_inner_microstep: 857.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3806 [2024-06-10 22:21:49,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1290.28 | bwd_inner_microstep: 1290.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-10 22:21:50,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.70 | bwd_microstep: 1299.15 | bwd_inner_microstep: 1299.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-10 22:21:52,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.88 | bwd_microstep: 1307.02 | bwd_inner_microstep: 1306.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060 [2024-06-10 22:21:53,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.84 | bwd_microstep: 850.75 | bwd_inner_microstep: 850.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-10 22:21:55,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.94 | bwd_microstep: 1389.18 | bwd_inner_microstep: 1389.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2171 [2024-06-10 22:21:57,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.54 | bwd_microstep: 1012.64 | bwd_inner_microstep: 1012.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3815 [2024-06-10 22:21:59,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.44 | bwd_microstep: 1817.88 | bwd_inner_microstep: 1817.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 22:22:01,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1603.45 | bwd_inner_microstep: 1603.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-10 22:22:04,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.64 [2024-06-10 22:22:04,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.96 | bwd_microstep: 1700.85 | bwd_inner_microstep: 1684.68 | bwd_allreduce_microstep: 16.12 | step_microstep: 39.14 [2024-06-10 22:22:04,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16253.62 | bwd: 43443.61 | bwd_inner: 43426.58 | bwd_allreduce: 16.35 | step: 40.71 {'loss': 1.2284, 'learning_rate': 7.259619680962222e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-10 22:22:06,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.16 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.89 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-10 22:22:07,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.52 | bwd_microstep: 712.96 | bwd_inner_microstep: 712.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3894 [2024-06-10 22:22:09,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1587.26 | bwd_inner_microstep: 1587.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4208 [2024-06-10 22:22:11,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.19 | bwd_microstep: 1563.07 | bwd_inner_microstep: 1563.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-10 22:22:13,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.53 | bwd_microstep: 1438.94 | bwd_inner_microstep: 1438.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-10 22:22:15,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.45 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 22:22:17,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.42 | bwd_microstep: 1654.56 | bwd_inner_microstep: 1654.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3613 [2024-06-10 22:22:19,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.36 | bwd_microstep: 1217.77 | bwd_inner_microstep: 1217.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3416 [2024-06-10 22:22:20,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.50 | bwd_microstep: 1214.21 | bwd_inner_microstep: 1214.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-10 22:22:22,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1510.88 | bwd_inner_microstep: 1510.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678 [2024-06-10 22:22:24,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.43 | bwd_microstep: 1366.83 | bwd_inner_microstep: 1366.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3504 [2024-06-10 22:22:26,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1399.24 | bwd_inner_microstep: 1399.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-10 22:22:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.23 | bwd_microstep: 1338.10 | bwd_inner_microstep: 1338.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 22:22:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1339.25 | bwd_inner_microstep: 1339.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107 [2024-06-10 22:22:31,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.47 | bwd_microstep: 1018.19 | bwd_inner_microstep: 1018.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 22:22:33,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.18 | bwd_microstep: 1343.48 | bwd_inner_microstep: 1343.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3638 [2024-06-10 22:22:35,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1571.79 | bwd_inner_microstep: 1571.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3508 [2024-06-10 22:22:38,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.87 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-10 22:22:40,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984 [2024-06-10 22:22:41,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.65 | bwd_microstep: 734.51 | bwd_inner_microstep: 734.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-10 22:22:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.43 | bwd_microstep: 1289.68 | bwd_inner_microstep: 1289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-10 22:22:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1396.21 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-10 22:22:45,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.13 | bwd_microstep: 797.87 | bwd_inner_microstep: 797.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-10 22:22:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076 [2024-06-10 22:22:48,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.83 | bwd_microstep: 917.71 | bwd_inner_microstep: 917.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 22:22:50,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.17 | bwd_microstep: 1304.01 | bwd_inner_microstep: 1303.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2032 [2024-06-10 22:22:51,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.84 | bwd_microstep: 839.41 | bwd_inner_microstep: 839.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-10 22:22:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1451.76 | bwd_inner_microstep: 1451.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2054 [2024-06-10 22:22:55,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.24 | bwd_microstep: 874.51 | bwd_inner_microstep: 874.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022 [2024-06-10 22:22:56,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.70 | bwd_microstep: 808.03 | bwd_inner_microstep: 808.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2990 [2024-06-10 22:22:57,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.69 | bwd_microstep: 1139.93 | bwd_inner_microstep: 1139.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-10 22:23:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-10 22:23:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 8170.92 | bwd_inner_microstep: 1741.20 | bwd_allreduce_microstep: 6429.66 | step_microstep: 38.11 [2024-06-10 22:23:06,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15121.65 | bwd: 46947.33 | bwd_inner: 40516.67 | bwd_allreduce: 6429.94 | step: 39.64 {'loss': 1.211, 'learning_rate': 7.2307091506003325e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 22:23:08,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.29 | bwd_microstep: 1336.47 | bwd_inner_microstep: 1336.39 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.11 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 3125 [2024-06-10 22:23:09,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.57 | bwd_microstep: 1014.15 | bwd_inner_microstep: 1014.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2321 [2024-06-10 22:23:11,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.02 | bwd_microstep: 882.39 | bwd_inner_microstep: 882.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3911 [2024-06-10 22:23:13,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.30 | bwd_microstep: 1587.41 | bwd_inner_microstep: 1587.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-10 22:23:15,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.11 | bwd_microstep: 1350.45 | bwd_inner_microstep: 1350.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 22:23:17,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.82 | bwd_microstep: 1376.21 | bwd_inner_microstep: 1376.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 22:23:18,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.66 | bwd_microstep: 1149.92 | bwd_inner_microstep: 1149.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 749 [2024-06-10 22:23:19,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.54 | bwd_microstep: 301.01 | bwd_inner_microstep: 300.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948 [2024-06-10 22:23:20,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 793.26 | bwd_inner_microstep: 793.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-10 22:23:21,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.67 | bwd_microstep: 795.30 | bwd_inner_microstep: 795.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 22:23:23,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.67 | bwd_microstep: 1388.05 | bwd_inner_microstep: 1388.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3492 [2024-06-10 22:23:25,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.88 | bwd_microstep: 1329.65 | bwd_inner_microstep: 1329.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934 [2024-06-10 22:23:26,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 820.78 | bwd_inner_microstep: 820.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-10 22:23:28,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1493.17 | bwd_inner_microstep: 1493.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 22:23:30,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.25 | bwd_microstep: 1473.94 | bwd_inner_microstep: 1473.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-10 22:23:32,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.94 | bwd_microstep: 1344.28 | bwd_inner_microstep: 1344.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3430 [2024-06-10 22:23:34,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 934 [2024-06-10 22:23:34,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.93 | bwd_microstep: 378.23 | bwd_inner_microstep: 378.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3634 [2024-06-10 22:23:36,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1343.88 | bwd_inner_microstep: 1343.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828 [2024-06-10 22:23:38,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.56 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3619 [2024-06-10 22:23:40,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.71 | bwd_microstep: 1453.83 | bwd_inner_microstep: 1453.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055 [2024-06-10 22:23:41,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.53 | bwd_microstep: 909.89 | bwd_inner_microstep: 909.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-10 22:23:43,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.82 | bwd_microstep: 1158.00 | bwd_inner_microstep: 1157.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3589 [2024-06-10 22:23:45,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.05 | bwd_microstep: 1438.02 | bwd_inner_microstep: 1437.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-10 22:23:47,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.06 | bwd_microstep: 1297.60 | bwd_inner_microstep: 1297.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871 [2024-06-10 22:23:49,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.47 | bwd_microstep: 1662.58 | bwd_inner_microstep: 1662.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991 [2024-06-10 22:23:50,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.42 | bwd_microstep: 833.47 | bwd_inner_microstep: 833.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3740 [2024-06-10 22:23:52,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 1537.89 | bwd_inner_microstep: 1537.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3622 [2024-06-10 22:23:54,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.46 | bwd_microstep: 1537.25 | bwd_inner_microstep: 1537.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3805 [2024-06-10 22:23:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.44 | bwd_microstep: 1685.32 | bwd_inner_microstep: 1685.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-10 22:23:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1315.93 | bwd_inner_microstep: 1315.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085 [2024-06-10 22:24:08,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.29 | optimizer_step: 6.60 [2024-06-10 22:24:08,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.15 | bwd_microstep: 9094.04 | bwd_inner_microstep: 972.01 | bwd_allreduce_microstep: 8121.97 | step_microstep: 39.28 [2024-06-10 22:24:08,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14527.09 | bwd: 46957.98 | bwd_inner: 38835.04 | bwd_allreduce: 8122.24 | step: 40.82 {'loss': 1.2119, 'learning_rate': 7.201843593614428e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-10 22:24:10,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1330.46 | bwd_inner_microstep: 1330.30 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 22:24:12,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1478.06 | bwd_inner_microstep: 1478.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-10 22:24:14,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.33 | bwd_microstep: 1470.81 | bwd_inner_microstep: 1470.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-10 22:24:16,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.38 | bwd_microstep: 1540.44 | bwd_inner_microstep: 1540.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 22:24:18,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-10 22:24:20,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750 [2024-06-10 22:24:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1538.31 | bwd_inner_microstep: 1538.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-10 22:24:23,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.45 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3435 [2024-06-10 22:24:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-10 22:24:27,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1189.31 | bwd_inner_microstep: 1189.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3995 [2024-06-10 22:24:29,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.91 | bwd_microstep: 1708.02 | bwd_inner_microstep: 1707.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-10 22:24:31,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.50 | bwd_microstep: 1251.47 | bwd_inner_microstep: 1251.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-10 22:24:33,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1478.69 | bwd_inner_microstep: 1478.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-10 22:24:35,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1428.58 | bwd_inner_microstep: 1428.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-10 22:24:37,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1544.21 | bwd_inner_microstep: 1544.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3643 [2024-06-10 22:24:39,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1448.17 | bwd_inner_microstep: 1448.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446 [2024-06-10 22:24:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.91 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 22:24:43,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1351.50 | bwd_inner_microstep: 1351.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-10 22:24:45,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.34 | bwd_microstep: 1293.43 | bwd_inner_microstep: 1293.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632 [2024-06-10 22:24:47,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.14 | bwd_microstep: 1343.10 | bwd_inner_microstep: 1343.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-10 22:24:49,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.61 | bwd_microstep: 1529.69 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3535 [2024-06-10 22:24:51,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.78 | bwd_microstep: 1417.08 | bwd_inner_microstep: 1417.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1976 [2024-06-10 22:24:52,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.77 | bwd_microstep: 829.49 | bwd_inner_microstep: 829.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187 [2024-06-10 22:24:53,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.64 | bwd_microstep: 956.20 | bwd_inner_microstep: 956.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676 [2024-06-10 22:24:55,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 22:24:57,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1279.72 | bwd_inner_microstep: 1279.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-10 22:24:59,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1501.31 | bwd_inner_microstep: 1501.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3469 [2024-06-10 22:25:01,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.10 | bwd_microstep: 1317.29 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3597 [2024-06-10 22:25:03,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.85 | bwd_microstep: 1508.64 | bwd_inner_microstep: 1508.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588 [2024-06-10 22:25:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.69 | bwd_microstep: 1703.74 | bwd_inner_microstep: 1703.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3433 [2024-06-10 22:25:07,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.47 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 22:25:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-10 22:25:10,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1983.47 | bwd_inner_microstep: 1686.42 | bwd_allreduce_microstep: 296.99 | step_microstep: 38.70 [2024-06-10 22:25:10,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16649.40 | bwd: 44929.71 | bwd_inner: 44631.68 | bwd_allreduce: 297.29 | step: 40.29 {'loss': 1.1763, 'learning_rate': 7.173023111668868e-06, 'epoch': 0.73} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461 [2024-06-10 22:25:12,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.60 | bwd_microstep: 1571.71 | bwd_inner_microstep: 1571.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 22:25:14,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1242.59 | bwd_inner_microstep: 1242.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-10 22:25:16,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.17 | bwd_microstep: 1653.85 | bwd_inner_microstep: 1653.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-10 22:25:18,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.64 | bwd_microstep: 1437.24 | bwd_inner_microstep: 1437.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 22:25:20,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1280.32 | bwd_inner_microstep: 1280.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-10 22:25:22,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 22:25:23,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.11 | bwd_microstep: 1281.22 | bwd_inner_microstep: 1281.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3751 [2024-06-10 22:25:26,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.62 | bwd_microstep: 1504.75 | bwd_inner_microstep: 1504.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 22:25:27,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.04 | bwd_microstep: 1386.80 | bwd_inner_microstep: 1386.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3520 [2024-06-10 22:25:29,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.22 | bwd_microstep: 1288.84 | bwd_inner_microstep: 1288.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3490 [2024-06-10 22:25:31,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.37 | bwd_microstep: 1507.74 | bwd_inner_microstep: 1507.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502 [2024-06-10 22:25:34,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.38 | bwd_microstep: 1577.01 | bwd_inner_microstep: 1576.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441 [2024-06-10 22:25:35,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2651 [2024-06-10 22:25:37,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.22 | bwd_microstep: 1212.26 | bwd_inner_microstep: 1212.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 22:25:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1250.31 | bwd_inner_microstep: 1250.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 22:25:41,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.65 | bwd_microstep: 1318.42 | bwd_inner_microstep: 1318.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3442 [2024-06-10 22:25:42,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.96 | bwd_microstep: 1186.96 | bwd_inner_microstep: 1186.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3504 [2024-06-10 22:25:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.92 | bwd_microstep: 1222.82 | bwd_inner_microstep: 1222.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-10 22:25:46,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1509.79 | bwd_inner_microstep: 1509.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 22:25:48,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1259.44 | bwd_inner_microstep: 1259.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-10 22:25:50,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-10 22:25:52,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.83 | bwd_microstep: 1185.52 | bwd_inner_microstep: 1185.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 22:25:53,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1285.17 | bwd_inner_microstep: 1285.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-10 22:25:55,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.86 | bwd_microstep: 1388.88 | bwd_inner_microstep: 1388.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-10 22:25:57,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1556.55 | bwd_inner_microstep: 1556.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3833 [2024-06-10 22:26:00,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.90 | bwd_microstep: 1489.14 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-10 22:26:02,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.49 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-10 22:26:04,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1489.29 | bwd_inner_microstep: 1489.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3550 [2024-06-10 22:26:06,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.88 | bwd_microstep: 1421.07 | bwd_inner_microstep: 1421.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3570 [2024-06-10 22:26:08,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.78 | bwd_microstep: 1695.89 | bwd_inner_microstep: 1695.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 22:26:10,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.99 | bwd_microstep: 1646.38 | bwd_inner_microstep: 1646.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3388 [2024-06-10 22:26:14,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-10 22:26:14,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.91 | bwd_microstep: 3103.43 | bwd_inner_microstep: 1556.34 | bwd_allreduce_microstep: 1547.03 | step_microstep: 39.43 [2024-06-10 22:26:14,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16847.21 | bwd: 46746.39 | bwd_inner: 45198.45 | bwd_allreduce: 1547.26 | step: 41.26 {'loss': 1.2022, 'learning_rate': 7.1442478062692135e-06, 'epoch': 0.73} ��█▎ | 1256/1726 [21:43:40<8:00:02, 61.28s/it] 73%|███████▎ | 1257/1726 [21:44:40<7:56:06, 60.91s/it] 73%|███████▎ | 1257/1726 [21:44:40<7:56:06, 60.91s/it] 73%|███████▎ | 1258/1726 [21:45:43<7:58:34, 61.36s/it] 73%|███████▎ | 1258/1726 [21:45:43<7:58:34, 61.36s/it] 73%|███████▎ | 1259/1726 [21:46:45<7:58:38, 61.49s/it] 73%|███████▎ | 1259/1726 [21:46:45<7:58:38, 61.49s/it] 73%|███████▎ | 1260/1726 [21:47:47<7:58:36, 61.62s/it] 73%|███████▎ | 1260/1726 [21:47:47<7:58:36, 61.62s/it] 73%|███████▎ | 1261/1726 [21:48:51<8:02:59, 62.32s/it] 73%|██████�dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3482 [2024-06-10 22:26:16,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.06 | bwd_microstep: 1571.04 | bwd_inner_microstep: 1571.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3923 [2024-06-10 22:26:18,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.68 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3879 [2024-06-10 22:26:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1586.65 | bwd_inner_microstep: 1586.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3902 [2024-06-10 22:26:22,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1585.21 | bwd_inner_microstep: 1585.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 22:26:24,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.80 | bwd_microstep: 1245.68 | bwd_inner_microstep: 1245.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2924 [2024-06-10 22:26:26,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.25 | bwd_microstep: 1188.54 | bwd_inner_microstep: 1188.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-10 22:26:27,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.25 | bwd_microstep: 1301.32 | bwd_inner_microstep: 1301.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-10 22:26:29,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1403.50 | bwd_inner_microstep: 1403.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 22:26:31,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1484.16 | bwd_inner_microstep: 1484.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 22:26:33,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.45 | bwd_microstep: 1194.01 | bwd_inner_microstep: 1193.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3497 [2024-06-10 22:26:35,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1446.55 | bwd_inner_microstep: 1446.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-10 22:26:37,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1485.98 | bwd_inner_microstep: 1485.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 22:26:39,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1345.38 | bwd_inner_microstep: 1345.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 22:26:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.41 | bwd_microstep: 1603.56 | bwd_inner_microstep: 1603.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 22:26:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1412.73 | bwd_inner_microstep: 1412.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544 [2024-06-10 22:26:45,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.53 | bwd_microstep: 1234.11 | bwd_inner_microstep: 1234.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 22:26:47,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1409.71 | bwd_inner_microstep: 1409.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 22:26:49,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1401.23 | bwd_inner_microstep: 1401.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-10 22:26:51,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1356.16 | bwd_inner_microstep: 1356.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3742 [2024-06-10 22:26:52,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.41 | bwd_microstep: 1341.26 | bwd_inner_microstep: 1341.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-10 22:26:54,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.63 | bwd_microstep: 1389.52 | bwd_inner_microstep: 1389.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-10 22:26:56,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1412.44 | bwd_inner_microstep: 1412.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-10 22:26:58,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.29 | bwd_microstep: 1182.77 | bwd_inner_microstep: 1182.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3533 [2024-06-10 22:27:00,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.90 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3575 [2024-06-10 22:27:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.15 | bwd_microstep: 1457.97 | bwd_inner_microstep: 1457.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3920 [2024-06-10 22:27:04,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.94 | bwd_microstep: 1333.62 | bwd_inner_microstep: 1333.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2684 [2024-06-10 22:27:05,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.89 | bwd_microstep: 1220.05 | bwd_inner_microstep: 1220.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-10 22:27:07,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1457.17 | bwd_inner_microstep: 1457.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3850 [2024-06-10 22:27:09,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.49 | bwd_microstep: 1491.34 | bwd_inner_microstep: 1491.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-10 22:27:11,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.83 | bwd_microstep: 1441.30 | bwd_inner_microstep: 1441.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-10 22:27:13,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.12 | bwd_microstep: 1407.16 | bwd_inner_microstep: 1407.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2890 [2024-06-10 22:27:17,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-10 22:27:17,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.64 | bwd_microstep: 3054.72 | bwd_inner_microstep: 1127.58 | bwd_allreduce_microstep: 1927.08 | step_microstep: 39.11 [2024-06-10 22:27:17,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16521.24 | bwd: 46037.86 | bwd_inner: 44109.85 | bwd_allreduce: 1927.31 | step: 40.61 {'loss': 1.19, 'learning_rate': 7.115517778761963e-06, 'epoch': 0.73} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512 [2024-06-10 22:27:19,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1412.52 | bwd_inner_microstep: 1412.44 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2893 [2024-06-10 22:27:20,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.69 | bwd_microstep: 998.90 | bwd_inner_microstep: 998.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 22:27:22,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1341.80 | bwd_inner_microstep: 1341.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795 [2024-06-10 22:27:24,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.21 | bwd_microstep: 1545.03 | bwd_inner_microstep: 1545.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 22:27:26,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1379.93 | bwd_inner_microstep: 1379.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-10 22:27:28,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1341.49 | bwd_inner_microstep: 1341.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-10 22:27:30,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.54 | bwd_inner_microstep: 1538.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-10 22:27:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.13 | bwd_microstep: 1188.45 | bwd_inner_microstep: 1188.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710 [2024-06-10 22:27:34,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1526.85 | bwd_inner_microstep: 1526.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-10 22:27:36,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1491.32 | bwd_inner_microstep: 1491.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1996 [2024-06-10 22:27:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.18 | bwd_microstep: 830.16 | bwd_inner_microstep: 830.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-10 22:27:39,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.49 | bwd_microstep: 1519.71 | bwd_inner_microstep: 1519.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3521 [2024-06-10 22:27:41,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.19 | bwd_microstep: 1303.63 | bwd_inner_microstep: 1303.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3507 [2024-06-10 22:27:43,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.35 | bwd_microstep: 1336.87 | bwd_inner_microstep: 1336.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3785 [2024-06-10 22:27:45,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.51 | bwd_microstep: 1353.29 | bwd_inner_microstep: 1353.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-10 22:27:46,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.90 | bwd_microstep: 1342.07 | bwd_inner_microstep: 1342.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 22:27:48,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.69 | bwd_microstep: 1380.23 | bwd_inner_microstep: 1380.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-10 22:27:51,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.32 | bwd_microstep: 1658.55 | bwd_inner_microstep: 1658.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 22:27:53,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.33 | bwd_microstep: 1414.83 | bwd_inner_microstep: 1414.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-10 22:27:54,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.60 | bwd_microstep: 1286.26 | bwd_inner_microstep: 1286.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719 [2024-06-10 22:27:57,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.71 | bwd_microstep: 1634.63 | bwd_inner_microstep: 1634.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-10 22:27:58,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1399.40 | bwd_inner_microstep: 1399.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807 [2024-06-10 22:28:01,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1479.12 | bwd_inner_microstep: 1479.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 22:28:03,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.18 | bwd_microstep: 1459.03 | bwd_inner_microstep: 1459.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 22:28:04,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.92 | bwd_microstep: 1375.39 | bwd_inner_microstep: 1375.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3477 [2024-06-10 22:28:07,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.64 | bwd_microstep: 2056.60 | bwd_inner_microstep: 2056.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-10 22:28:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.08 | bwd_microstep: 1351.04 | bwd_inner_microstep: 1351.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-10 22:28:11,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.57 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3578 [2024-06-10 22:28:13,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.43 | bwd_microstep: 1593.62 | bwd_inner_microstep: 1593.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 22:28:15,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.17 | bwd_microstep: 1448.83 | bwd_inner_microstep: 1448.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-10 22:28:17,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.71 | bwd_microstep: 1643.63 | bwd_inner_microstep: 1643.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3430 [2024-06-10 22:28:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-10 22:28:19,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.87 | bwd_microstep: 1513.37 | bwd_inner_microstep: 1505.63 | bwd_allreduce_microstep: 7.70 | step_microstep: 38.14 [2024-06-10 22:28:19,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16745.85 | bwd: 45618.09 | bwd_inner: 45609.43 | bwd_allreduce: 7.96 | step: 39.74 {'loss': 1.2217, 'learning_rate': 7.086833130334107e-06, 'epoch': 0.73} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471 [2024-06-10 22:28:21,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.04 | bwd_microstep: 1430.76 | bwd_inner_microstep: 1430.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-10 22:28:23,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.02 | bwd_microstep: 1476.87 | bwd_inner_microstep: 1476.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 22:28:26,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1548.26 | bwd_inner_microstep: 1548.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954 [2024-06-10 22:28:27,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.02 | bwd_microstep: 732.12 | bwd_inner_microstep: 732.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-10 22:28:28,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.14 | bwd_microstep: 1155.38 | bwd_inner_microstep: 1155.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-10 22:28:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.78 | bwd_microstep: 791.77 | bwd_inner_microstep: 791.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 22:28:31,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 22:28:33,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.27 | bwd_microstep: 1383.77 | bwd_inner_microstep: 1383.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1990 [2024-06-10 22:28:34,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.27 | bwd_microstep: 846.56 | bwd_inner_microstep: 846.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3516 [2024-06-10 22:28:36,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.16 | bwd_microstep: 1433.45 | bwd_inner_microstep: 1433.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3669 [2024-06-10 22:28:38,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1617.54 | bwd_inner_microstep: 1617.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649 [2024-06-10 22:28:40,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.08 | bwd_microstep: 1415.73 | bwd_inner_microstep: 1415.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484 [2024-06-10 22:28:42,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.12 | bwd_microstep: 1506.59 | bwd_inner_microstep: 1506.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3406 [2024-06-10 22:28:44,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.72 | bwd_microstep: 1370.65 | bwd_inner_microstep: 1370.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 22:28:46,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1556.14 | bwd_inner_microstep: 1556.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 22:28:48,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.94 | bwd_microstep: 1374.49 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 22:28:50,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.42 | bwd_microstep: 1280.47 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 22:28:52,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.57 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-10 22:28:54,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.70 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 22:28:55,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1277.98 | bwd_inner_microstep: 1277.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-10 22:28:56,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.42 | bwd_microstep: 710.28 | bwd_inner_microstep: 710.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-10 22:28:59,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.62 | bwd_microstep: 1659.92 | bwd_inner_microstep: 1659.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2290 [2024-06-10 22:29:00,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.13 | bwd_microstep: 855.30 | bwd_inner_microstep: 855.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-10 22:29:02,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1556.31 | bwd_inner_microstep: 1556.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 22:29:04,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1329.26 | bwd_inner_microstep: 1329.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471 [2024-06-10 22:29:06,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.31 | bwd_microstep: 1405.45 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3560 [2024-06-10 22:29:07,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1204.45 | bwd_inner_microstep: 1204.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812 [2024-06-10 22:29:10,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1517.67 | bwd_inner_microstep: 1517.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568 [2024-06-10 22:29:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1554.77 | bwd_inner_microstep: 1554.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420 [2024-06-10 22:29:14,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.33 | bwd_microstep: 1374.08 | bwd_inner_microstep: 1374.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-10 22:29:16,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.97 | bwd_microstep: 1639.62 | bwd_inner_microstep: 1639.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-10 22:29:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-10 22:29:21,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.63 | bwd_microstep: 4976.50 | bwd_inner_microstep: 1861.69 | bwd_allreduce_microstep: 3114.76 | step_microstep: 38.03 [2024-06-10 22:29:21,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15929.42 | bwd: 45790.13 | bwd_inner: 42674.47 | bwd_allreduce: 3114.99 | step: 39.56 {'loss': 1.1961, 'learning_rate': 7.0581939620128515e-06, 'epoch': 0.73} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-10 22:29:23,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.67 | bwd_microstep: 1334.06 | bwd_inner_microstep: 1334.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907 [2024-06-10 22:29:26,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.16 | bwd_microstep: 1691.97 | bwd_inner_microstep: 1691.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3905 [2024-06-10 22:29:28,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1390.24 | bwd_inner_microstep: 1390.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3786 [2024-06-10 22:29:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1344.87 | bwd_inner_microstep: 1344.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-10 22:29:31,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.92 | bwd_microstep: 1453.44 | bwd_inner_microstep: 1453.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3743 [2024-06-10 22:29:34,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1532.21 | bwd_inner_microstep: 1532.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1941 [2024-06-10 22:29:35,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.58 | bwd_microstep: 823.92 | bwd_inner_microstep: 823.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-10 22:29:36,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-10 22:29:38,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1301.29 | bwd_inner_microstep: 1301.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3699 [2024-06-10 22:29:40,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.29 | bwd_microstep: 1477.24 | bwd_inner_microstep: 1477.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 22:29:42,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.73 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3573 [2024-06-10 22:29:44,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1428.54 | bwd_inner_microstep: 1428.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3508 [2024-06-10 22:29:46,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.53 | bwd_microstep: 1251.88 | bwd_inner_microstep: 1251.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 22:29:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2112 [2024-06-10 22:29:49,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.93 | bwd_microstep: 1020.25 | bwd_inner_microstep: 1020.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685 [2024-06-10 22:29:51,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.67 | bwd_microstep: 1516.95 | bwd_inner_microstep: 1516.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-10 22:29:53,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.74 | bwd_microstep: 1243.90 | bwd_inner_microstep: 1243.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 22:29:55,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.26 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525 [2024-06-10 22:29:56,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.94 | bwd_microstep: 1322.25 | bwd_inner_microstep: 1322.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-10 22:29:59,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1559.16 | bwd_inner_microstep: 1559.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 22:30:00,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1356.13 | bwd_inner_microstep: 1356.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 22:30:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.40 | bwd_microstep: 1287.81 | bwd_inner_microstep: 1287.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1962 [2024-06-10 22:30:03,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.55 | bwd_microstep: 732.51 | bwd_inner_microstep: 732.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-10 22:30:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1398.16 | bwd_inner_microstep: 1398.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 22:30:07,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1557.10 | bwd_inner_microstep: 1557.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-10 22:30:09,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.09 | bwd_microstep: 1297.82 | bwd_inner_microstep: 1297.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-10 22:30:11,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 1633.40 | bwd_inner_microstep: 1633.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1891 [2024-06-10 22:30:12,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 265.10 | bwd_microstep: 688.64 | bwd_inner_microstep: 688.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3815 [2024-06-10 22:30:15,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 662.42 | bwd_microstep: 1823.88 | bwd_inner_microstep: 1823.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2713 [2024-06-10 22:30:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.56 | bwd_microstep: 1096.59 | bwd_inner_microstep: 1096.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-10 22:30:19,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.02 | bwd_microstep: 1591.50 | bwd_inner_microstep: 1591.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3390 [2024-06-10 22:30:24,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-10 22:30:24,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.96 | bwd_microstep: 5185.15 | bwd_inner_microstep: 1558.98 | bwd_allreduce_microstep: 3626.11 | step_microstep: 37.79 [2024-06-10 22:30:24,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15982.22 | bwd: 46535.71 | bwd_inner: 42908.69 | bwd_allreduce: 3626.34 | step: 39.28 {'loss': 1.1922, 'learning_rate': 7.029600374665171e-06, 'epoch': 0.73} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 22:30:26,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.58 | bwd_microstep: 1238.24 | bwd_inner_microstep: 1238.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 22:30:28,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.82 | bwd_microstep: 1244.02 | bwd_inner_microstep: 1243.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-10 22:30:30,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.28 | bwd_microstep: 1445.68 | bwd_inner_microstep: 1445.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 22:30:32,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1448.33 | bwd_inner_microstep: 1448.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3789 [2024-06-10 22:30:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.79 | bwd_microstep: 1252.98 | bwd_inner_microstep: 1252.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 22:30:36,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.91 | bwd_microstep: 1545.18 | bwd_inner_microstep: 1545.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4174 [2024-06-10 22:30:38,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.78 | bwd_microstep: 1549.21 | bwd_inner_microstep: 1549.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-10 22:30:40,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 1437.89 | bwd_inner_microstep: 1437.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-10 22:30:42,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-10 22:30:43,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.65 | bwd_microstep: 1251.70 | bwd_inner_microstep: 1251.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-10 22:30:44,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.31 | bwd_microstep: 680.71 | bwd_inner_microstep: 680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3413 [2024-06-10 22:30:46,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3662 [2024-06-10 22:30:48,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1520.94 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3496 [2024-06-10 22:30:50,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.58 | bwd_microstep: 1551.10 | bwd_inner_microstep: 1551.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2060 [2024-06-10 22:30:51,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.89 | bwd_microstep: 724.16 | bwd_inner_microstep: 724.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3641 [2024-06-10 22:30:54,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.45 | bwd_microstep: 1571.29 | bwd_inner_microstep: 1571.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-10 22:30:55,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.06 | bwd_microstep: 1409.03 | bwd_inner_microstep: 1409.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-10 22:30:57,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.17 | bwd_microstep: 1337.23 | bwd_inner_microstep: 1337.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3969 [2024-06-10 22:30:59,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1528.69 | bwd_inner_microstep: 1528.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-10 22:31:01,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1452.78 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-10 22:31:03,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.10 | bwd_microstep: 975.14 | bwd_inner_microstep: 975.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-10 22:31:05,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1351.13 | bwd_inner_microstep: 1351.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632 [2024-06-10 22:31:06,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.69 | bwd_microstep: 1247.41 | bwd_inner_microstep: 1247.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 22:31:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1354.42 | bwd_inner_microstep: 1354.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 22:31:10,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1556.07 | bwd_inner_microstep: 1556.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 22:31:12,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1440.20 | bwd_inner_microstep: 1440.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-10 22:31:14,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1412.48 | bwd_inner_microstep: 1412.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-10 22:31:17,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.60 | bwd_microstep: 1657.73 | bwd_inner_microstep: 1657.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3800 [2024-06-10 22:31:18,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3825 [2024-06-10 22:31:21,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.76 | bwd_microstep: 1581.71 | bwd_inner_microstep: 1581.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-10 22:31:23,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1442.77 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3767 [2024-06-10 22:31:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.07 | optimizer_step: 6.64 [2024-06-10 22:31:28,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 4464.57 | bwd_inner_microstep: 1659.91 | bwd_allreduce_microstep: 2804.61 | step_microstep: 37.61 [2024-06-10 22:31:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16341.86 | bwd: 46684.83 | bwd_inner: 43879.30 | bwd_allreduce: 2804.84 | step: 39.11 {'loss': 1.2325, 'learning_rate': 7.001052468997551e-06, 'epoch': 0.73} �▎ | 1261/1726 [21:48:51<8:02:59, 62.32s/it] 73%|███████▎ | 1262/1726 [21:49:53<8:03:18, 62.50s/it] 73%|███████▎ | 1262/1726 [21:49:53<8:03:18, 62.50s/it] 73%|███████▎ | 1263/1726 [21:50:56<8:02:45, 62.56s/it] 73%|███████▎ | 1263/1726 [21:50:56<8:02:45, 62.56s/it] 73%|███████▎ | 1264/1726 [21:51:58<8:00:33, 62.41s/it] 73%|███████▎ | 1264/1726 [21:51:58<8:00:33, 62.41s/it] 73%|███████▎ | 1265/1726 [21:53:01<8:00:32, 62.54s/it] 73%|███████▎ | 1265/1726 [21:53:01<8:00:32, 62.54s/it] 73%|███████▎ | 1266/1726 [21:54:04<8:01:22, 62.79s/it] 73%|███████▎dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425 [2024-06-10 22:31:29,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.77 | bwd_microstep: 1143.29 | bwd_inner_microstep: 1143.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-10 22:31:30,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.03 | bwd_microstep: 788.10 | bwd_inner_microstep: 788.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3851 [2024-06-10 22:31:33,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.45 | bwd_microstep: 1623.47 | bwd_inner_microstep: 1623.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3848 [2024-06-10 22:31:35,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.85 | bwd_microstep: 1657.02 | bwd_inner_microstep: 1656.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-10 22:31:37,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.72 | bwd_microstep: 1376.38 | bwd_inner_microstep: 1376.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2969 [2024-06-10 22:31:38,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.63 | bwd_microstep: 1009.25 | bwd_inner_microstep: 1009.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-10 22:31:40,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1285.51 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-10 22:31:42,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:31:43,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1284.94 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-10 22:31:45,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.13 | bwd_microstep: 1258.19 | bwd_inner_microstep: 1258.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014 [2024-06-10 22:31:46,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.26 | bwd_microstep: 896.35 | bwd_inner_microstep: 896.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 22:31:48,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.60 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-10 22:31:50,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.58 | bwd_microstep: 1441.65 | bwd_inner_microstep: 1441.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-10 22:31:52,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1487.06 | bwd_inner_microstep: 1487.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3644 [2024-06-10 22:31:55,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.36 | bwd_microstep: 1637.31 | bwd_inner_microstep: 1637.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3637 [2024-06-10 22:31:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.13 | bwd_microstep: 1810.15 | bwd_inner_microstep: 1810.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2550 [2024-06-10 22:31:59,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.88 | bwd_microstep: 1059.26 | bwd_inner_microstep: 1059.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 22:32:00,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.87 | bwd_microstep: 1292.63 | bwd_inner_microstep: 1292.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-10 22:32:02,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1458.88 | bwd_inner_microstep: 1458.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1401 [2024-06-10 22:32:03,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.84 | bwd_microstep: 525.44 | bwd_inner_microstep: 525.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2302 [2024-06-10 22:32:04,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 882.51 | bwd_inner_microstep: 882.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-10 22:32:06,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1559.17 | bwd_inner_microstep: 1559.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-10 22:32:08,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.18 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 22:32:09,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1328.53 | bwd_inner_microstep: 1328.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-10 22:32:11,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1393.31 | bwd_inner_microstep: 1393.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-10 22:32:13,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2272 [2024-06-10 22:32:15,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.99 | bwd_microstep: 782.37 | bwd_inner_microstep: 782.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495 [2024-06-10 22:32:16,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-10 22:32:18,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1295.16 | bwd_inner_microstep: 1295.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3565 [2024-06-10 22:32:21,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.71 | bwd_microstep: 1667.21 | bwd_inner_microstep: 1667.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3605 [2024-06-10 22:32:23,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.68 | bwd_microstep: 1640.74 | bwd_inner_microstep: 1640.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-10 22:32:29,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 22:32:29,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 5391.73 | bwd_inner_microstep: 1705.36 | bwd_allreduce_microstep: 3686.31 | step_microstep: 38.00 [2024-06-10 22:32:29,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15496.24 | bwd: 45293.48 | bwd_inner: 41606.27 | bwd_allreduce: 3686.54 | step: 39.42 {'loss': 1.1277, 'learning_rate': 6.97255034555556e-06, 'epoch': 0.73} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 22:32:31,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1466.49 | bwd_inner_microstep: 1466.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-10 22:32:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.94 | bwd_microstep: 1482.93 | bwd_inner_microstep: 1482.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 22:32:35,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.92 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 22:32:36,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.13 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3773 [2024-06-10 22:32:38,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.73 | bwd_microstep: 1501.60 | bwd_inner_microstep: 1501.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-10 22:32:40,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.59 | bwd_microstep: 1385.81 | bwd_inner_microstep: 1385.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-10 22:32:41,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.29 | bwd_microstep: 794.28 | bwd_inner_microstep: 794.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-10 22:32:44,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.64 | bwd_microstep: 1629.66 | bwd_inner_microstep: 1629.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-10 22:32:45,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.91 | bwd_microstep: 677.72 | bwd_inner_microstep: 677.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-10 22:32:47,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.94 | bwd_microstep: 1526.54 | bwd_inner_microstep: 1526.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517 [2024-06-10 22:32:48,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.51 | bwd_microstep: 1191.11 | bwd_inner_microstep: 1191.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3479 [2024-06-10 22:32:50,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.70 | bwd_microstep: 1327.14 | bwd_inner_microstep: 1327.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2205 [2024-06-10 22:32:52,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.62 | bwd_microstep: 956.26 | bwd_inner_microstep: 956.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-10 22:32:53,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.71 | bwd_microstep: 882.95 | bwd_inner_microstep: 882.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-10 22:32:55,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.44 | bwd_microstep: 1482.68 | bwd_inner_microstep: 1482.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3655 [2024-06-10 22:32:57,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.31 | bwd_microstep: 1716.75 | bwd_inner_microstep: 1716.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1896 [2024-06-10 22:32:58,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.95 | bwd_microstep: 775.17 | bwd_inner_microstep: 775.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-10 22:33:00,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1502.37 | bwd_inner_microstep: 1502.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-10 22:33:02,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.66 | bwd_microstep: 1377.02 | bwd_inner_microstep: 1376.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-10 22:33:04,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 22:33:06,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.18 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3863 [2024-06-10 22:33:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.44 | bwd_microstep: 1736.40 | bwd_inner_microstep: 1736.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-10 22:33:10,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1356.62 | bwd_inner_microstep: 1356.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090 [2024-06-10 22:33:11,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.90 | bwd_microstep: 917.53 | bwd_inner_microstep: 917.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 22:33:13,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.50 | bwd_microstep: 1432.70 | bwd_inner_microstep: 1432.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-10 22:33:14,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.91 | bwd_microstep: 813.71 | bwd_inner_microstep: 813.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2852 [2024-06-10 22:33:16,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.34 | bwd_microstep: 1095.18 | bwd_inner_microstep: 1095.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-10 22:33:18,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.84 | bwd_microstep: 1350.81 | bwd_inner_microstep: 1350.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-10 22:33:20,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.72 | bwd_microstep: 1554.41 | bwd_inner_microstep: 1554.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-10 22:33:22,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.14 | bwd_microstep: 1542.93 | bwd_inner_microstep: 1542.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 22:33:24,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.30 | bwd_microstep: 1281.35 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-10 22:33:30,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-10 22:33:30,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 5758.72 | bwd_inner_microstep: 1567.32 | bwd_allreduce_microstep: 4191.35 | step_microstep: 37.98 [2024-06-10 22:33:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15413.85 | bwd: 45548.72 | bwd_inner: 41356.35 | bwd_allreduce: 4191.64 | step: 39.47 {'loss': 1.1898, 'learning_rate': 6.94409410472352e-06, 'epoch': 0.73} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1929 [2024-06-10 22:33:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.80 | bwd_microstep: 874.28 | bwd_inner_microstep: 874.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-10 22:33:33,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.75 | bwd_microstep: 1344.04 | bwd_inner_microstep: 1344.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-10 22:33:35,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1551.84 | bwd_inner_microstep: 1551.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4238 [2024-06-10 22:33:38,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1662.78 | bwd_inner_microstep: 1662.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-10 22:33:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.84 | bwd_microstep: 1647.88 | bwd_inner_microstep: 1647.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-10 22:33:42,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.99 | bwd_microstep: 1182.91 | bwd_inner_microstep: 1182.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-10 22:33:44,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1523.32 | bwd_inner_microstep: 1523.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 22:33:46,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.34 | bwd_microstep: 1546.25 | bwd_inner_microstep: 1546.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1875 [2024-06-10 22:33:47,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.68 | bwd_microstep: 744.91 | bwd_inner_microstep: 744.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-10 22:33:48,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.02 | bwd_microstep: 802.34 | bwd_inner_microstep: 802.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3444 [2024-06-10 22:33:50,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.75 | bwd_microstep: 1283.36 | bwd_inner_microstep: 1283.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450 [2024-06-10 22:33:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1413.15 | bwd_inner_microstep: 1413.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3398 [2024-06-10 22:33:53,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.05 | bwd_microstep: 1365.79 | bwd_inner_microstep: 1365.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401 [2024-06-10 22:33:55,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.82 | bwd_microstep: 1434.92 | bwd_inner_microstep: 1434.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3644 [2024-06-10 22:33:58,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.65 | bwd_microstep: 1814.26 | bwd_inner_microstep: 1814.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 22:33:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.02 | bwd_microstep: 791.16 | bwd_inner_microstep: 791.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 22:34:01,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1717.96 | bwd_inner_microstep: 1717.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562 [2024-06-10 22:34:03,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.75 | bwd_microstep: 1427.96 | bwd_inner_microstep: 1427.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650 [2024-06-10 22:34:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.77 | bwd_microstep: 1582.02 | bwd_inner_microstep: 1581.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-10 22:34:07,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3677 [2024-06-10 22:34:09,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.67 | bwd_microstep: 1326.23 | bwd_inner_microstep: 1326.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-10 22:34:11,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.77 | bwd_microstep: 1395.64 | bwd_inner_microstep: 1395.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692 [2024-06-10 22:34:13,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.86 | bwd_microstep: 1331.13 | bwd_inner_microstep: 1331.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-10 22:34:15,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.70 | bwd_microstep: 1376.77 | bwd_inner_microstep: 1376.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-10 22:34:17,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.60 | bwd_microstep: 1648.44 | bwd_inner_microstep: 1648.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2412 [2024-06-10 22:34:19,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 936.97 | bwd_inner_microstep: 936.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 22:34:21,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1399.35 | bwd_inner_microstep: 1399.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-10 22:34:23,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.83 | bwd_microstep: 1648.92 | bwd_inner_microstep: 1648.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3591 [2024-06-10 22:34:25,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1460.54 | bwd_inner_microstep: 1460.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591 [2024-06-10 22:34:27,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1566.97 | bwd_inner_microstep: 1566.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532 [2024-06-10 22:34:29,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1451.06 | bwd_inner_microstep: 1451.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-10 22:34:31,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 22:34:31,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.72 | bwd_microstep: 1539.74 | bwd_inner_microstep: 1532.10 | bwd_allreduce_microstep: 7.59 | step_microstep: 37.53 [2024-06-10 22:34:31,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16476.20 | bwd: 44198.46 | bwd_inner: 44189.98 | bwd_allreduce: 7.81 | step: 39.02 {'loss': 1.201, 'learning_rate': 6.915683846724188e-06, 'epoch': 0.74} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 22:34:33,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1336.31 | bwd_inner_microstep: 1336.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4119 [2024-06-10 22:34:35,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 648.38 | bwd_microstep: 1736.34 | bwd_inner_microstep: 1736.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 22:34:37,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2239 [2024-06-10 22:34:38,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.17 | bwd_microstep: 863.14 | bwd_inner_microstep: 863.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-10 22:34:41,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1485.08 | bwd_inner_microstep: 1485.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-10 22:34:42,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.16 | bwd_microstep: 1391.23 | bwd_inner_microstep: 1391.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-10 22:34:44,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1386.19 | bwd_inner_microstep: 1386.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-10 22:34:46,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.73 | bwd_microstep: 1527.27 | bwd_inner_microstep: 1527.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3752 [2024-06-10 22:34:49,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1566.23 | bwd_inner_microstep: 1566.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3684 [2024-06-10 22:34:51,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.12 | bwd_microstep: 1721.48 | bwd_inner_microstep: 1721.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-10 22:34:53,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1248.74 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2458 [2024-06-10 22:34:54,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.27 | bwd_microstep: 994.92 | bwd_inner_microstep: 994.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 22:34:56,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1350.16 | bwd_inner_microstep: 1350.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-10 22:34:58,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.51 | bwd_microstep: 1449.28 | bwd_inner_microstep: 1449.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508 [2024-06-10 22:35:00,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.94 | bwd_microstep: 1446.08 | bwd_inner_microstep: 1446.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3519 [2024-06-10 22:35:02,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.23 | bwd_microstep: 1553.30 | bwd_inner_microstep: 1553.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3501 [2024-06-10 22:35:04,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1432.53 | bwd_inner_microstep: 1432.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-10 22:35:06,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.22 | bwd_microstep: 1522.07 | bwd_inner_microstep: 1522.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615 [2024-06-10 22:35:08,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1468.04 | bwd_inner_microstep: 1468.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-10 22:35:10,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1499.15 | bwd_inner_microstep: 1499.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-10 22:35:12,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.31 | bwd_microstep: 1491.25 | bwd_inner_microstep: 1491.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 22:35:14,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.15 | bwd_microstep: 1477.73 | bwd_inner_microstep: 1477.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 22:35:16,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.43 | bwd_microstep: 1452.81 | bwd_inner_microstep: 1452.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 22:35:19,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.02 | bwd_microstep: 1660.13 | bwd_inner_microstep: 1660.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2220 [2024-06-10 22:35:20,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.00 | bwd_microstep: 960.62 | bwd_inner_microstep: 960.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474 [2024-06-10 22:35:22,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.47 | bwd_microstep: 1184.37 | bwd_inner_microstep: 1184.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 22:35:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.36 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597 [2024-06-10 22:35:26,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1437.17 | bwd_inner_microstep: 1437.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 22:35:28,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 22:35:30,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.03 | bwd_microstep: 1557.35 | bwd_inner_microstep: 1557.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-10 22:35:32,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.15 | bwd_microstep: 1357.57 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 22:35:34,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-10 22:35:34,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.61 | bwd_microstep: 1418.84 | bwd_inner_microstep: 1411.16 | bwd_allreduce_microstep: 7.63 | step_microstep: 37.54 [2024-06-10 22:35:34,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16881.40 | bwd: 45256.00 | bwd_inner: 45247.46 | bwd_allreduce: 7.85 | step: 38.99 {'loss': 1.1756, 'learning_rate': 6.887319671618315e-06, 'epoch': 0.74} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452 [2024-06-10 22:35:36,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1448.01 | bwd_inner_microstep: 1447.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3945 [2024-06-10 22:35:38,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.01 | bwd_microstep: 1702.63 | bwd_inner_microstep: 1702.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-10 22:35:40,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.37 | bwd_inner_microstep: 1383.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-10 22:35:42,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.74 | bwd_inner_microstep: 1556.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 22:35:44,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.52 | bwd_microstep: 1249.16 | bwd_inner_microstep: 1249.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3742 [2024-06-10 22:35:46,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1497.66 | bwd_inner_microstep: 1497.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-10 22:35:47,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.90 | bwd_microstep: 1154.14 | bwd_inner_microstep: 1154.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-10 22:35:49,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1532.70 | bwd_inner_microstep: 1532.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-10 22:35:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.48 | bwd_microstep: 1531.29 | bwd_inner_microstep: 1531.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 748 [2024-06-10 22:35:52,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.19 | bwd_microstep: 300.55 | bwd_inner_microstep: 300.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-10 22:35:54,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1284.89 | bwd_inner_microstep: 1284.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439 [2024-06-10 22:35:56,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1411.88 | bwd_inner_microstep: 1411.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-10 22:35:57,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1252.60 | bwd_inner_microstep: 1252.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-10 22:35:59,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.44 | bwd_microstep: 1340.78 | bwd_inner_microstep: 1340.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921 [2024-06-10 22:36:00,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 788.69 | bwd_inner_microstep: 788.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 22:36:02,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1373.99 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514 [2024-06-10 22:36:04,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1446.73 | bwd_inner_microstep: 1446.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-10 22:36:06,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.68 | bwd_microstep: 897.33 | bwd_inner_microstep: 897.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-10 22:36:07,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1383.02 | bwd_inner_microstep: 1383.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-10 22:36:10,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.72 | bwd_microstep: 1526.15 | bwd_inner_microstep: 1526.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3986 [2024-06-10 22:36:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.32 | bwd_microstep: 1814.23 | bwd_inner_microstep: 1814.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-10 22:36:14,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1411.40 | bwd_inner_microstep: 1411.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-10 22:36:16,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.43 | bwd_microstep: 1279.42 | bwd_inner_microstep: 1279.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 616 [2024-06-10 22:36:16,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.50 | bwd_microstep: 261.61 | bwd_inner_microstep: 261.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555 [2024-06-10 22:36:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.55 | bwd_microstep: 1200.66 | bwd_inner_microstep: 1200.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-10 22:36:20,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.98 | bwd_microstep: 1302.55 | bwd_inner_microstep: 1302.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-10 22:36:21,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.22 | bwd_microstep: 1310.59 | bwd_inner_microstep: 1310.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-10 22:36:24,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.65 | bwd_microstep: 1502.71 | bwd_inner_microstep: 1502.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818 [2024-06-10 22:36:26,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.05 | bwd_microstep: 1585.68 | bwd_inner_microstep: 1585.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3781 [2024-06-10 22:36:28,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.78 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3577 [2024-06-10 22:36:30,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.96 | bwd_microstep: 1426.51 | bwd_inner_microstep: 1426.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4011 [2024-06-10 22:36:36,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-10 22:36:36,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 5668.14 | bwd_inner_microstep: 1621.23 | bwd_allreduce_microstep: 4046.86 | step_microstep: 38.16 [2024-06-10 22:36:36,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15754.40 | bwd: 46210.41 | bwd_inner: 42162.65 | bwd_allreduce: 4047.09 | step: 39.63 {'loss': 1.1764, 'learning_rate': 6.859001679304398e-06, 'epoch': 0.74} | 1266/1726 [21:54:04<8:01:22, 62.79s/it] 73%|███████▎ | 1267/1726 [21:55:06<7:56:29, 62.29s/it] 73%|███████▎ | 1267/1726 [21:55:06<7:56:29, 62.29s/it] 73%|███████▎ | 1268/1726 [21:56:07<7:53:09, 61.99s/it] 73%|███████▎ | 1268/1726 [21:56:07<7:53:09, 61.99s/it] 74%|███████▎ | 1269/1726 [21:57:08<7:49:53, 61.69s/it] 74%|███████▎ | 1269/1726 [21:57:08<7:49:53, 61.69s/it] 74%|███████▎ | 1270/1726 [21:58:10<7:50:39, 61.93s/it] 74%|███████▎ | 1270/1726 [21:58:10<7:50:39, 61.93s/it] 74%|███████▎ | 1271/1726 [21:59:13<7:50:27, 62.04s/it] 74%|███████▎ | dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-10 22:36:37,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 787.04 | bwd_inner_microstep: 786.92 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4494 [2024-06-10 22:36:39,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.28 | bwd_microstep: 1641.09 | bwd_inner_microstep: 1641.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-10 22:36:41,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.67 | bwd_microstep: 1397.70 | bwd_inner_microstep: 1397.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3401 [2024-06-10 22:36:43,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.41 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 22:36:45,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1545.61 | bwd_inner_microstep: 1545.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-10 22:36:47,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1430.21 | bwd_inner_microstep: 1430.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217 [2024-06-10 22:36:48,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.20 | bwd_microstep: 956.75 | bwd_inner_microstep: 956.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708 [2024-06-10 22:36:51,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.05 | bwd_microstep: 1628.52 | bwd_inner_microstep: 1628.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-10 22:36:53,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.51 | bwd_microstep: 1525.27 | bwd_inner_microstep: 1525.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-10 22:36:55,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.53 | bwd_microstep: 1427.10 | bwd_inner_microstep: 1427.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 22:36:57,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920 [2024-06-10 22:36:58,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.15 | bwd_microstep: 720.70 | bwd_inner_microstep: 720.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3674 [2024-06-10 22:37:00,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.08 | bwd_microstep: 1549.04 | bwd_inner_microstep: 1549.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504 [2024-06-10 22:37:02,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 3424 [2024-06-10 22:37:03,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.16 | bwd_microstep: 1258.28 | bwd_inner_microstep: 1258.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-10 22:37:05,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1469.06 | bwd_inner_microstep: 1469.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-10 22:37:07,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1388.78 | bwd_inner_microstep: 1388.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422 [2024-06-10 22:37:09,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.99 | bwd_microstep: 937.77 | bwd_inner_microstep: 937.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3639 [2024-06-10 22:37:11,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1605.01 | bwd_inner_microstep: 1604.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3619 [2024-06-10 22:37:13,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.61 | bwd_microstep: 1374.32 | bwd_inner_microstep: 1374.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3503 [2024-06-10 22:37:14,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.08 | bwd_microstep: 1235.25 | bwd_inner_microstep: 1235.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-10 22:37:16,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.40 | bwd_microstep: 1554.66 | bwd_inner_microstep: 1554.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-10 22:37:18,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.74 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808 [2024-06-10 22:37:20,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1353.01 | bwd_inner_microstep: 1352.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 22:37:22,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.25 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3803 [2024-06-10 22:37:24,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3825 [2024-06-10 22:37:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1501.01 | bwd_inner_microstep: 1500.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 22:37:28,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.53 | bwd_microstep: 1496.90 | bwd_inner_microstep: 1496.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2258 [2024-06-10 22:37:29,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.71 | bwd_microstep: 968.95 | bwd_inner_microstep: 968.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3596 [2024-06-10 22:37:31,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.37 | bwd_microstep: 1432.31 | bwd_inner_microstep: 1432.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-10 22:37:34,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.36 | bwd_microstep: 1700.93 | bwd_inner_microstep: 1700.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-10 22:37:36,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.92 | optimizer_gradients: 4.02 | optimizer_step: 6.60 [2024-06-10 22:37:36,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1866.03 | bwd_inner_microstep: 1442.55 | bwd_allreduce_microstep: 423.43 | step_microstep: 37.56 [2024-06-10 22:37:36,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16190.29 | bwd: 43769.05 | bwd_inner: 43344.63 | bwd_allreduce: 423.70 | step: 39.04 {'loss': 1.1987, 'learning_rate': 6.830729969518246e-06, 'epoch': 0.74} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-10 22:37:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.38 | bwd_microstep: 1371.82 | bwd_inner_microstep: 1371.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-10 22:37:39,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.11 | bwd_microstep: 776.20 | bwd_inner_microstep: 776.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-10 22:37:41,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.38 | bwd_microstep: 1555.70 | bwd_inner_microstep: 1555.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-10 22:37:43,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1240.69 | bwd_inner_microstep: 1240.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4071 [2024-06-10 22:37:45,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.97 | bwd_microstep: 1624.15 | bwd_inner_microstep: 1624.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 22:37:47,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.62 | bwd_microstep: 1481.17 | bwd_inner_microstep: 1481.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 22:37:49,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1250.17 | bwd_inner_microstep: 1250.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-10 22:37:51,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1244.74 | bwd_inner_microstep: 1244.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 22:37:52,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1246.62 | bwd_inner_microstep: 1246.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-10 22:37:54,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.70 | bwd_microstep: 1351.45 | bwd_inner_microstep: 1351.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-10 22:37:56,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1407.95 | bwd_inner_microstep: 1407.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3490 [2024-06-10 22:37:58,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1445.64 | bwd_inner_microstep: 1445.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 22:38:00,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.97 | bwd_microstep: 1372.16 | bwd_inner_microstep: 1372.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-10 22:38:02,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.03 | bwd_microstep: 1252.29 | bwd_inner_microstep: 1252.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502 [2024-06-10 22:38:04,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1584.09 | bwd_inner_microstep: 1584.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2153 [2024-06-10 22:38:05,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.13 | bwd_microstep: 976.60 | bwd_inner_microstep: 976.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-10 22:38:07,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.74 | bwd_microstep: 791.69 | bwd_inner_microstep: 791.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 22:38:09,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1556.12 | bwd_inner_microstep: 1556.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954 [2024-06-10 22:38:10,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.94 | bwd_microstep: 700.21 | bwd_inner_microstep: 700.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-10 22:38:11,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1248.69 | bwd_inner_microstep: 1248.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-10 22:38:13,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 22:38:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1350.88 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520 [2024-06-10 22:38:17,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1416.71 | bwd_inner_microstep: 1416.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566 [2024-06-10 22:38:19,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.03 | bwd_microstep: 1597.24 | bwd_inner_microstep: 1597.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3768 [2024-06-10 22:38:21,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.74 | bwd_microstep: 1465.28 | bwd_inner_microstep: 1465.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009 [2024-06-10 22:38:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 802.21 | bwd_inner_microstep: 802.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2055 [2024-06-10 22:38:24,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.53 | bwd_microstep: 1009.83 | bwd_inner_microstep: 1009.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-10 22:38:26,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.04 | bwd_microstep: 1646.83 | bwd_inner_microstep: 1646.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-10 22:38:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-10 22:38:29,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.68 | bwd_microstep: 728.05 | bwd_inner_microstep: 728.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2029 [2024-06-10 22:38:31,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.04 | bwd_microstep: 913.01 | bwd_inner_microstep: 912.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3776 [2024-06-10 22:38:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-10 22:38:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.67 | bwd_microstep: 6291.98 | bwd_inner_microstep: 1705.18 | bwd_allreduce_microstep: 4586.75 | step_microstep: 37.93 [2024-06-10 22:38:37,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15277.27 | bwd: 45647.30 | bwd_inner: 41059.64 | bwd_allreduce: 4586.97 | step: 39.46 {'loss': 1.2205, 'learning_rate': 6.80250464183269e-06, 'epoch': 0.74} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456 [2024-06-10 22:38:40,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1543.42 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-10 22:38:41,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.44 | bwd_microstep: 1379.10 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3931 [2024-06-10 22:38:44,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.90 | bwd_microstep: 1590.66 | bwd_inner_microstep: 1590.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3780 [2024-06-10 22:38:46,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1476.89 | bwd_inner_microstep: 1476.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732 [2024-06-10 22:38:48,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1529.55 | bwd_inner_microstep: 1529.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-10 22:38:50,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.89 | bwd_microstep: 1274.59 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 22:38:51,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1280.92 | bwd_inner_microstep: 1280.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3705 [2024-06-10 22:38:53,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1455.15 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491 [2024-06-10 22:38:55,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.59 | bwd_microstep: 1404.70 | bwd_inner_microstep: 1404.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 22:38:57,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.15 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498 [2024-06-10 22:38:59,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1442.73 | bwd_inner_microstep: 1442.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-10 22:39:01,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.39 | bwd_microstep: 1339.82 | bwd_inner_microstep: 1339.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3663 [2024-06-10 22:39:03,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.02 | bwd_microstep: 1466.55 | bwd_inner_microstep: 1466.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3068 [2024-06-10 22:39:05,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.89 | bwd_microstep: 1237.60 | bwd_inner_microstep: 1237.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-10 22:39:07,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.11 | bwd_microstep: 1451.44 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-10 22:39:08,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.57 | bwd_microstep: 697.76 | bwd_inner_microstep: 697.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-10 22:39:10,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.76 | bwd_microstep: 1455.19 | bwd_inner_microstep: 1455.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070 [2024-06-10 22:39:11,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.46 | bwd_microstep: 915.31 | bwd_inner_microstep: 915.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 22:39:13,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1249.80 | bwd_inner_microstep: 1249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-10 22:39:15,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1398.41 | bwd_inner_microstep: 1398.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-10 22:39:16,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1398.64 | bwd_inner_microstep: 1398.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3504 [2024-06-10 22:39:18,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1248.53 | bwd_inner_microstep: 1248.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3874 [2024-06-10 22:39:20,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.65 | bwd_microstep: 1612.16 | bwd_inner_microstep: 1612.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-10 22:39:22,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.26 | bwd_microstep: 1502.59 | bwd_inner_microstep: 1502.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-10 22:39:24,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.92 | bwd_microstep: 1445.80 | bwd_inner_microstep: 1445.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-10 22:39:25,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.70 | bwd_microstep: 702.62 | bwd_inner_microstep: 702.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-10 22:39:27,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.07 | bwd_microstep: 878.59 | bwd_inner_microstep: 878.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3453 [2024-06-10 22:39:29,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.31 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3617 [2024-06-10 22:39:31,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.91 | bwd_microstep: 1245.53 | bwd_inner_microstep: 1245.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2043 [2024-06-10 22:39:32,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.44 | bwd_microstep: 716.49 | bwd_inner_microstep: 716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-10 22:39:34,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.36 | bwd_microstep: 1542.50 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585 [2024-06-10 22:39:36,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.03 | optimizer_step: 6.58 [2024-06-10 22:39:36,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1959.91 | bwd_inner_microstep: 1482.79 | bwd_allreduce_microstep: 477.07 | step_microstep: 37.58 [2024-06-10 22:39:36,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15743.27 | bwd: 42665.05 | bwd_inner: 42187.04 | bwd_allreduce: 477.30 | step: 39.10 {'loss': 1.1478, 'learning_rate': 6.774325795657175e-06, 'epoch': 0.74} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-10 22:39:38,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.04 | bwd_microstep: 1577.57 | bwd_inner_microstep: 1577.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-10 22:39:40,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1382.71 | bwd_inner_microstep: 1382.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3885 [2024-06-10 22:39:43,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.26 | bwd_microstep: 1684.27 | bwd_inner_microstep: 1684.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-10 22:39:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476 [2024-06-10 22:39:46,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.51 | bwd_microstep: 1343.74 | bwd_inner_microstep: 1343.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-10 22:39:48,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.65 | bwd_microstep: 1481.22 | bwd_inner_microstep: 1481.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 22:39:50,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1247.10 | bwd_inner_microstep: 1247.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-10 22:39:52,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1383.70 | bwd_inner_microstep: 1383.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-10 22:39:54,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.05 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-10 22:39:56,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1247.16 | bwd_inner_microstep: 1247.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 22:39:57,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.23 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-10 22:40:00,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.88 | bwd_microstep: 1618.48 | bwd_inner_microstep: 1618.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3573 [2024-06-10 22:40:02,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.73 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-10 22:40:03,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1284.01 | bwd_inner_microstep: 1283.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 22:40:05,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3675 [2024-06-10 22:40:07,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1554.47 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 22:40:09,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.79 | bwd_microstep: 1388.67 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-10 22:40:11,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.83 | bwd_microstep: 1277.00 | bwd_inner_microstep: 1276.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-10 22:40:13,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1450.52 | bwd_inner_microstep: 1450.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-10 22:40:15,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1606.91 | bwd_inner_microstep: 1606.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 22:40:17,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1389.47 | bwd_inner_microstep: 1389.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3620 [2024-06-10 22:40:19,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977 [2024-06-10 22:40:20,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.50 | bwd_microstep: 705.30 | bwd_inner_microstep: 705.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555 [2024-06-10 22:40:22,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.17 | bwd_microstep: 1333.05 | bwd_inner_microstep: 1333.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-10 22:40:24,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 1511.26 | bwd_inner_microstep: 1511.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-10 22:40:26,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.97 | bwd_microstep: 1556.19 | bwd_inner_microstep: 1556.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560 [2024-06-10 22:40:28,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 1429.03 | bwd_inner_microstep: 1429.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3435 [2024-06-10 22:40:30,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.19 | bwd_microstep: 1215.60 | bwd_inner_microstep: 1215.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-10 22:40:31,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.42 | bwd_microstep: 875.64 | bwd_inner_microstep: 875.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-10 22:40:32,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.55 | bwd_microstep: 973.62 | bwd_inner_microstep: 973.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-10 22:40:34,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.50 | bwd_microstep: 1491.93 | bwd_inner_microstep: 1491.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-10 22:40:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.04 | optimizer_step: 6.60 [2024-06-10 22:40:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 2674.96 | bwd_inner_microstep: 1749.63 | bwd_allreduce_microstep: 925.28 | step_microstep: 37.37 [2024-06-10 22:40:38,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16356.75 | bwd: 44762.16 | bwd_inner: 43835.98 | bwd_allreduce: 925.51 | step: 38.96 {'loss': 1.1573, 'learning_rate': 6.746193530237457e-06, 'epoch': 0.74} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 22:40:39,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1274.87 | bwd_inner_microstep: 1274.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-10 22:40:41,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1379.16 | bwd_inner_microstep: 1379.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3891 [2024-06-10 22:40:43,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1482.17 | bwd_inner_microstep: 1482.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 22:40:45,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1342.36 | bwd_inner_microstep: 1342.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-10 22:40:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.30 | bwd_microstep: 1550.25 | bwd_inner_microstep: 1550.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-10 22:40:49,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.77 | bwd_microstep: 1479.85 | bwd_inner_microstep: 1479.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 22:40:51,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.64 | bwd_microstep: 1391.88 | bwd_inner_microstep: 1391.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-10 22:40:53,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.48 | bwd_microstep: 1405.03 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 22:40:55,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.01 | bwd_microstep: 1384.56 | bwd_inner_microstep: 1384.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 22:40:57,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1247.24 | bwd_inner_microstep: 1247.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-10 22:40:59,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.35 | bwd_microstep: 1620.30 | bwd_inner_microstep: 1620.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-10 22:41:01,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.00 | bwd_microstep: 1381.99 | bwd_inner_microstep: 1381.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3692 [2024-06-10 22:41:03,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.40 | bwd_microstep: 1617.88 | bwd_inner_microstep: 1617.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3656 [2024-06-10 22:41:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1542.88 | bwd_inner_microstep: 1542.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3623 [2024-06-10 22:41:07,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1373.48 | bwd_inner_microstep: 1373.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 22:41:09,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.32 | bwd_microstep: 1287.48 | bwd_inner_microstep: 1287.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-10 22:41:11,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1287.54 | bwd_inner_microstep: 1287.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-10 22:41:13,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1485.22 | bwd_inner_microstep: 1485.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-10 22:41:15,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1410.08 | bwd_inner_microstep: 1410.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 22:41:17,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.39 | bwd_microstep: 1403.90 | bwd_inner_microstep: 1403.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-10 22:41:19,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1291.04 | bwd_inner_microstep: 1291.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-10 22:41:21,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.57 | bwd_microstep: 1459.02 | bwd_inner_microstep: 1459.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-10 22:41:22,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2577 [2024-06-10 22:41:24,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.36 | bwd_microstep: 1068.93 | bwd_inner_microstep: 1068.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613 [2024-06-10 22:41:26,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.98 | bwd_microstep: 1605.81 | bwd_inner_microstep: 1605.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-10 22:41:28,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.22 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3747 [2024-06-10 22:41:30,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.52 | bwd_microstep: 1563.39 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-10 22:41:32,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.02 | bwd_microstep: 1403.07 | bwd_inner_microstep: 1403.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-10 22:41:34,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.35 | bwd_microstep: 1656.99 | bwd_inner_microstep: 1656.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-10 22:41:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.33 | bwd_microstep: 1592.56 | bwd_inner_microstep: 1592.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3590 [2024-06-10 22:41:39,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.85 | bwd_microstep: 1670.49 | bwd_inner_microstep: 1670.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781 [2024-06-10 22:41:41,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.04 | optimizer_step: 6.62 [2024-06-10 22:41:41,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.60 | bwd_microstep: 1488.08 | bwd_inner_microstep: 1480.46 | bwd_allreduce_microstep: 7.58 | step_microstep: 37.38 [2024-06-10 22:41:41,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17061.04 | bwd: 45691.78 | bwd_inner: 45683.31 | bwd_allreduce: 7.81 | step: 38.88 {'loss': 1.2352, 'learning_rate': 6.7181079446552165e-06, 'epoch': 0.74} 1271/1726 [21:59:13<7:50:27, 62.04s/it] 74%|███████▎ | 1272/1726 [22:00:13<7:45:27, 61.51s/it] 74%|███████▎ | 1272/1726 [22:00:13<7:45:27, 61.51s/it] 74%|███████▍ | 1273/1726 [22:01:14<7:43:51, 61.44s/it] 74%|███████▍ | 1273/1726 [22:01:14<7:43:51, 61.44s/it] 74%|███████▍ | 1274/1726 [22:02:13<7:36:44, 60.63s/it] 74%|███████▍ | 1274/1726 [22:02:13<7:36:44, 60.63s/it] 74%|███████▍ | 1275/1726 [22:03:14<7:37:36, 60.88s/it] 74%|███████▍ | 1275/1726 [22:03:14<7:37:36, 60.88s/it] 74%|███████▍ | 1276/1726 [22:04:17<7:41:33, 61.54s/it] 74%|███████▍ | 1276dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2629 [2024-06-10 22:41:42,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.30 | bwd_microstep: 1011.12 | bwd_inner_microstep: 1011.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4667 [2024-06-10 22:41:44,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.44 | bwd_microstep: 1674.46 | bwd_inner_microstep: 1674.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-10 22:41:46,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1343.11 | bwd_inner_microstep: 1343.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-10 22:41:48,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.01 | bwd_microstep: 1293.80 | bwd_inner_microstep: 1293.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-10 22:41:50,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.02 | bwd_microstep: 1442.59 | bwd_inner_microstep: 1442.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 22:41:52,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1388.15 | bwd_inner_microstep: 1388.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-10 22:41:54,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.79 | bwd_microstep: 1149.03 | bwd_inner_microstep: 1149.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-10 22:41:55,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.33 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3514 [2024-06-10 22:41:57,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.84 | bwd_microstep: 1433.33 | bwd_inner_microstep: 1433.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4124 [2024-06-10 22:42:00,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.09 | bwd_microstep: 1729.34 | bwd_inner_microstep: 1729.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509 [2024-06-10 22:42:02,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.59 | bwd_microstep: 1448.78 | bwd_inner_microstep: 1448.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2138 [2024-06-10 22:42:03,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.56 | bwd_microstep: 926.15 | bwd_inner_microstep: 926.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-10 22:42:04,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.26 | bwd_microstep: 886.95 | bwd_inner_microstep: 886.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519 [2024-06-10 22:42:06,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.05 | bwd_microstep: 1324.61 | bwd_inner_microstep: 1324.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3525 [2024-06-10 22:42:08,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1322.12 | bwd_inner_microstep: 1322.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-10 22:42:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.61 | bwd_microstep: 1493.61 | bwd_inner_microstep: 1493.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2188 [2024-06-10 22:42:11,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.66 | bwd_microstep: 763.87 | bwd_inner_microstep: 763.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2180 [2024-06-10 22:42:12,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.00 | bwd_microstep: 857.08 | bwd_inner_microstep: 857.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1928 [2024-06-10 22:42:13,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.16 | bwd_microstep: 697.64 | bwd_inner_microstep: 697.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3655 [2024-06-10 22:42:15,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1322.91 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-10 22:42:17,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1410.93 | bwd_inner_microstep: 1410.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-10 22:42:19,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.64 | bwd_microstep: 1506.19 | bwd_inner_microstep: 1506.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621 [2024-06-10 22:42:21,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.23 | bwd_microstep: 1310.39 | bwd_inner_microstep: 1310.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-10 22:42:22,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.99 | bwd_microstep: 820.54 | bwd_inner_microstep: 820.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-10 22:42:24,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.27 | bwd_microstep: 1449.50 | bwd_inner_microstep: 1449.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3599 [2024-06-10 22:42:26,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.31 | bwd_microstep: 1341.41 | bwd_inner_microstep: 1341.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-10 22:42:27,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.45 | bwd_microstep: 699.84 | bwd_inner_microstep: 699.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 22:42:29,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.75 | bwd_microstep: 1645.18 | bwd_inner_microstep: 1645.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3821 [2024-06-10 22:42:32,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.41 | bwd_microstep: 1708.31 | bwd_inner_microstep: 1708.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2287 [2024-06-10 22:42:33,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.38 | bwd_microstep: 1009.98 | bwd_inner_microstep: 1009.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2057 [2024-06-10 22:42:34,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.49 | bwd_microstep: 941.35 | bwd_inner_microstep: 941.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3047 [2024-06-10 22:42:40,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-10 22:42:40,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 5515.40 | bwd_inner_microstep: 1566.69 | bwd_allreduce_microstep: 3948.65 | step_microstep: 37.86 [2024-06-10 22:42:40,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15041.04 | bwd: 44208.78 | bwd_inner: 40259.23 | bwd_allreduce: 3948.87 | step: 39.31 {'loss': 1.188, 'learning_rate': 6.690069137827757e-06, 'epoch': 0.74} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1932 [2024-06-10 22:42:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.69 | bwd_microstep: 816.38 | bwd_inner_microstep: 816.31 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3888 [2024-06-10 22:42:43,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.59 | bwd_microstep: 1477.31 | bwd_inner_microstep: 1477.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4478 [2024-06-10 22:42:46,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.89 | bwd_microstep: 1633.36 | bwd_inner_microstep: 1633.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2930 [2024-06-10 22:42:47,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.55 | bwd_microstep: 1142.58 | bwd_inner_microstep: 1142.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756 [2024-06-10 22:42:49,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.14 | bwd_microstep: 1434.09 | bwd_inner_microstep: 1434.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 22:42:51,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.51 | bwd_microstep: 1441.66 | bwd_inner_microstep: 1441.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743 [2024-06-10 22:42:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.51 | bwd_microstep: 1635.97 | bwd_inner_microstep: 1635.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2233 [2024-06-10 22:42:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.82 | bwd_microstep: 770.12 | bwd_inner_microstep: 770.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-10 22:42:56,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.79 | bwd_microstep: 714.18 | bwd_inner_microstep: 714.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-10 22:42:57,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507 [2024-06-10 22:42:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.74 | bwd_microstep: 1191.57 | bwd_inner_microstep: 1191.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1959 [2024-06-10 22:43:00,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.52 | bwd_microstep: 827.15 | bwd_inner_microstep: 827.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3512 [2024-06-10 22:43:02,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.77 | bwd_microstep: 1515.89 | bwd_inner_microstep: 1515.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1969 [2024-06-10 22:43:03,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 889.41 | bwd_inner_microstep: 889.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-10 22:43:05,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1356.96 | bwd_inner_microstep: 1356.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-10 22:43:07,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.36 | bwd_microstep: 917.91 | bwd_inner_microstep: 917.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-10 22:43:09,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.80 | bwd_microstep: 1552.00 | bwd_inner_microstep: 1551.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-10 22:43:11,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1416.65 | bwd_inner_microstep: 1416.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2264 [2024-06-10 22:43:12,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.61 | bwd_microstep: 873.25 | bwd_inner_microstep: 873.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:43:14,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1286.77 | bwd_inner_microstep: 1286.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-10 22:43:15,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.39 | bwd_microstep: 1159.04 | bwd_inner_microstep: 1159.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 22:43:17,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.27 | bwd_microstep: 1555.07 | bwd_inner_microstep: 1555.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-10 22:43:19,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.16 | bwd_microstep: 1403.14 | bwd_inner_microstep: 1403.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-10 22:43:21,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1398.00 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 22:43:23,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.22 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-10 22:43:25,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.58 | bwd_microstep: 1499.72 | bwd_inner_microstep: 1499.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3678 [2024-06-10 22:43:27,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1672.64 | bwd_inner_microstep: 1672.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3824 [2024-06-10 22:43:30,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.95 | bwd_microstep: 1749.41 | bwd_inner_microstep: 1749.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3070 [2024-06-10 22:43:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.67 | bwd_microstep: 1298.29 | bwd_inner_microstep: 1298.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-10 22:43:34,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1654.69 | bwd_inner_microstep: 1654.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2884 [2024-06-10 22:43:35,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.53 | bwd_microstep: 1086.41 | bwd_inner_microstep: 1086.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563 [2024-06-10 22:43:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-10 22:43:43,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 6546.13 | bwd_inner_microstep: 1798.88 | bwd_allreduce_microstep: 4747.16 | step_microstep: 39.06 [2024-06-10 22:43:43,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15533.12 | bwd: 46481.32 | bwd_inner: 41733.15 | bwd_allreduce: 4747.46 | step: 40.53 {'loss': 1.1993, 'learning_rate': 6.662077208507603e-06, 'epoch': 0.74} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5427 [2024-06-10 22:43:45,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 705.97 | bwd_microstep: 1867.74 | bwd_inner_microstep: 1867.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3958 [2024-06-10 22:43:47,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.36 | bwd_microstep: 1490.75 | bwd_inner_microstep: 1490.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2337 [2024-06-10 22:43:49,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.65 | bwd_microstep: 981.95 | bwd_inner_microstep: 981.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3869 [2024-06-10 22:43:51,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.91 | bwd_microstep: 1564.96 | bwd_inner_microstep: 1564.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 22:43:53,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1385.90 | bwd_inner_microstep: 1385.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-10 22:43:55,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.30 | bwd_microstep: 1481.44 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-10 22:43:56,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.25 | bwd_microstep: 677.41 | bwd_inner_microstep: 677.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 22:43:58,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1395.21 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 22:43:59,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.38 | bwd_microstep: 801.85 | bwd_inner_microstep: 801.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 22:44:01,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.80 | bwd_microstep: 1403.09 | bwd_inner_microstep: 1403.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495 [2024-06-10 22:44:02,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.16 | bwd_microstep: 1223.26 | bwd_inner_microstep: 1223.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 732 [2024-06-10 22:44:03,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.90 | bwd_microstep: 296.61 | bwd_inner_microstep: 296.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3483 [2024-06-10 22:44:05,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1315.71 | bwd_inner_microstep: 1315.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 22:44:06,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1345.81 | bwd_inner_microstep: 1345.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2405 [2024-06-10 22:44:08,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.33 | bwd_microstep: 1031.85 | bwd_inner_microstep: 1031.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390 [2024-06-10 22:44:10,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.38 | bwd_microstep: 1339.69 | bwd_inner_microstep: 1339.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635 [2024-06-10 22:44:12,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1604.88 | bwd_inner_microstep: 1604.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3836 [2024-06-10 22:44:14,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.23 | bwd_microstep: 1721.01 | bwd_inner_microstep: 1720.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-10 22:44:15,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3920 [2024-06-10 22:44:18,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.22 | bwd_microstep: 1798.87 | bwd_inner_microstep: 1798.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-10 22:44:20,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.49 | bwd_microstep: 1390.88 | bwd_inner_microstep: 1390.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-10 22:44:21,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.97 | bwd_microstep: 876.01 | bwd_inner_microstep: 875.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2519 [2024-06-10 22:44:23,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.55 | bwd_microstep: 1059.62 | bwd_inner_microstep: 1059.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 22:44:24,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.18 | bwd_microstep: 1435.45 | bwd_inner_microstep: 1435.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-10 22:44:27,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1533.41 | bwd_inner_microstep: 1533.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-10 22:44:29,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.59 | bwd_microstep: 1644.00 | bwd_inner_microstep: 1643.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-10 22:44:31,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.88 | bwd_microstep: 1492.58 | bwd_inner_microstep: 1492.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2534 [2024-06-10 22:44:33,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.32 | bwd_microstep: 1187.20 | bwd_inner_microstep: 1187.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3653 [2024-06-10 22:44:35,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1512.64 | bwd_inner_microstep: 1512.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-10 22:44:36,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1300.59 | bwd_inner_microstep: 1300.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3575 [2024-06-10 22:44:38,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.50 | bwd_microstep: 1428.74 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-10 22:44:44,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.19 | optimizer_step: 6.60 [2024-06-10 22:44:44,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 5063.93 | bwd_inner_microstep: 1780.72 | bwd_allreduce_microstep: 3283.15 | step_microstep: 38.47 [2024-06-10 22:44:44,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15631.54 | bwd: 45450.74 | bwd_inner: 42166.67 | bwd_allreduce: 3283.39 | step: 39.90 {'loss': 1.1584, 'learning_rate': 6.634132255282182e-06, 'epoch': 0.74} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1930 [2024-06-10 22:44:45,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.72 | bwd_microstep: 809.19 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-10 22:44:47,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1381.18 | bwd_inner_microstep: 1381.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876 [2024-06-10 22:44:49,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.63 | bwd_microstep: 1478.84 | bwd_inner_microstep: 1478.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3831 [2024-06-10 22:44:51,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.34 | bwd_microstep: 1583.21 | bwd_inner_microstep: 1583.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 22:44:53,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1480.15 | bwd_inner_microstep: 1480.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-10 22:44:55,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.58 | bwd_microstep: 1385.17 | bwd_inner_microstep: 1385.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-10 22:44:57,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1252.01 | bwd_inner_microstep: 1251.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2246 [2024-06-10 22:44:58,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 964.20 | bwd_inner_microstep: 964.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-10 22:45:00,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.06 | bwd_microstep: 1392.10 | bwd_inner_microstep: 1392.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-10 22:45:02,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1253.33 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-10 22:45:04,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.59 | bwd_microstep: 1652.93 | bwd_inner_microstep: 1652.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-10 22:45:05,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.63 | bwd_microstep: 798.81 | bwd_inner_microstep: 798.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-10 22:45:07,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.95 | bwd_microstep: 1318.77 | bwd_inner_microstep: 1318.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 22:45:09,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.42 | bwd_microstep: 1256.28 | bwd_inner_microstep: 1256.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3468 [2024-06-10 22:45:11,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.92 | bwd_microstep: 1311.45 | bwd_inner_microstep: 1311.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3926 [2024-06-10 22:45:13,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.94 | bwd_microstep: 1762.30 | bwd_inner_microstep: 1762.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-10 22:45:15,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1512.80 | bwd_inner_microstep: 1512.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3603 [2024-06-10 22:45:17,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1473.74 | bwd_inner_microstep: 1473.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 22:45:19,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1397.11 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-10 22:45:21,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.09 | bwd_inner_microstep: 1279.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933 [2024-06-10 22:45:23,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.54 | bwd_microstep: 1196.35 | bwd_inner_microstep: 1196.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-10 22:45:25,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.28 | bwd_microstep: 1615.97 | bwd_inner_microstep: 1615.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-10 22:45:27,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1557.81 | bwd_inner_microstep: 1557.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-10 22:45:29,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1628.19 | bwd_inner_microstep: 1628.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-10 22:45:31,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1413.16 | bwd_inner_microstep: 1413.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242 [2024-06-10 22:45:33,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.40 | bwd_microstep: 967.12 | bwd_inner_microstep: 967.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-10 22:45:35,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.83 | bwd_microstep: 1661.95 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 22:45:37,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.57 | bwd_microstep: 1553.79 | bwd_inner_microstep: 1553.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-10 22:45:39,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1506.71 | bwd_inner_microstep: 1506.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3380 [2024-06-10 22:45:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1436.00 | bwd_inner_microstep: 1435.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-10 22:45:43,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.14 | bwd_microstep: 1500.77 | bwd_inner_microstep: 1500.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755 [2024-06-10 22:45:45,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.04 | optimizer_step: 6.65 [2024-06-10 22:45:45,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.85 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1497.85 | bwd_allreduce_microstep: 7.69 | step_microstep: 37.61 [2024-06-10 22:45:45,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16516.46 | bwd: 44286.10 | bwd_inner: 44277.46 | bwd_allreduce: 7.94 | step: 39.11 {'loss': 1.168, 'learning_rate': 6.6062343765734774e-06, 'epoch': 0.74} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-10 22:45:47,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.86 | bwd_microstep: 1472.40 | bwd_inner_microstep: 1472.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-10 22:45:49,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1342.63 | bwd_inner_microstep: 1342.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-10 22:45:51,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.44 | bwd_microstep: 1342.99 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301 [2024-06-10 22:45:52,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.39 | bwd_microstep: 975.34 | bwd_inner_microstep: 975.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-10 22:45:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.55 | bwd_microstep: 1346.22 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-10 22:45:56,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.35 | bwd_microstep: 1190.78 | bwd_inner_microstep: 1190.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-10 22:45:58,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1353.84 | bwd_inner_microstep: 1353.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1949 [2024-06-10 22:45:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.04 | bwd_microstep: 727.62 | bwd_inner_microstep: 727.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522 [2024-06-10 22:46:00,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.90 | bwd_microstep: 1197.55 | bwd_inner_microstep: 1197.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3663 [2024-06-10 22:46:02,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1370.49 | bwd_inner_microstep: 1370.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1919 [2024-06-10 22:46:03,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.28 | bwd_microstep: 817.94 | bwd_inner_microstep: 817.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-10 22:46:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.92 | bwd_microstep: 1584.71 | bwd_inner_microstep: 1584.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562 [2024-06-10 22:46:08,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1597.16 | bwd_inner_microstep: 1597.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3547 [2024-06-10 22:46:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.64 | bwd_microstep: 1594.28 | bwd_inner_microstep: 1594.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3649 [2024-06-10 22:46:12,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.42 | bwd_microstep: 1483.59 | bwd_inner_microstep: 1483.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3697 [2024-06-10 22:46:14,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.45 | bwd_microstep: 1724.49 | bwd_inner_microstep: 1724.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-10 22:46:16,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.11 | bwd_microstep: 1487.78 | bwd_inner_microstep: 1487.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-10 22:46:18,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.75 | bwd_microstep: 1486.04 | bwd_inner_microstep: 1486.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-10 22:46:20,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1254.37 | bwd_inner_microstep: 1254.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384 [2024-06-10 22:46:22,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.32 | bwd_microstep: 1435.29 | bwd_inner_microstep: 1435.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2188 [2024-06-10 22:46:23,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.03 | bwd_microstep: 860.71 | bwd_inner_microstep: 860.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 539 [2024-06-10 22:46:24,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.21 | bwd_microstep: 246.56 | bwd_inner_microstep: 246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3526 [2024-06-10 22:46:26,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.92 | bwd_microstep: 1328.73 | bwd_inner_microstep: 1328.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654 [2024-06-10 22:46:28,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.77 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2230 [2024-06-10 22:46:29,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.62 | bwd_microstep: 867.34 | bwd_inner_microstep: 867.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3546 [2024-06-10 22:46:31,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1455.41 | bwd_inner_microstep: 1455.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-10 22:46:33,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1601.01 | bwd_inner_microstep: 1600.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279 [2024-06-10 22:46:34,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.32 | bwd_microstep: 878.50 | bwd_inner_microstep: 878.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2105 [2024-06-10 22:46:36,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.36 | bwd_microstep: 884.71 | bwd_inner_microstep: 884.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3566 [2024-06-10 22:46:37,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.17 | bwd_microstep: 1203.36 | bwd_inner_microstep: 1203.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823 [2024-06-10 22:46:39,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1514.92 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 22:46:47,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 20.91 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-10 22:46:47,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.64 | bwd_microstep: 7216.91 | bwd_inner_microstep: 1691.87 | bwd_allreduce_microstep: 5524.98 | step_microstep: 42.35 [2024-06-10 22:46:47,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15210.67 | bwd: 46367.58 | bwd_inner: 40841.69 | bwd_allreduce: 5525.22 | step: 43.85 {'loss': 1.1745, 'learning_rate': 6.578383670637662e-06, 'epoch': 0.74} /1726 [22:04:17<7:41:33, 61.54s/it] 74%|███████▍ | 1277/1726 [22:05:17<7:36:06, 60.95s/it] 74%|███████▍ | 1277/1726 [22:05:17<7:36:06, 60.95s/it] 74%|███████▍ | 1278/1726 [22:06:19<7:38:13, 61.37s/it] 74%|███████▍ | 1278/1726 [22:06:19<7:38:13, 61.37s/it] 74%|███████▍ | 1279/1726 [22:07:21<7:37:17, 61.38s/it] 74%|███████▍ | 1279/1726 [22:07:21<7:37:17, 61.38s/it] 74%|███████▍ | 1280/1726 [22:08:22<7:35:43, 61.31s/it] 74%|███████▍ | 1280/1726 [22:08:22<7:35:43, 61.31s/it] 74%|███████▍ | 1281/1726 [22:09:24<7:36:02, 61.49s/it] 74%|███████▍ | 1281/172dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504 [2024-06-10 22:46:49,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1420.88 | bwd_inner_microstep: 1420.71 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-10 22:46:51,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.94 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4185 [2024-06-10 22:46:53,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.93 | bwd_microstep: 1746.08 | bwd_inner_microstep: 1746.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-10 22:46:55,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.52 | bwd_microstep: 1547.69 | bwd_inner_microstep: 1547.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-10 22:46:56,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.18 | bwd_microstep: 675.84 | bwd_inner_microstep: 675.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4025 [2024-06-10 22:46:59,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.15 | bwd_microstep: 1707.06 | bwd_inner_microstep: 1707.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075 [2024-06-10 22:47:00,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.46 | bwd_microstep: 818.11 | bwd_inner_microstep: 818.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3716 [2024-06-10 22:47:02,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.29 | bwd_microstep: 1630.05 | bwd_inner_microstep: 1630.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3733 [2024-06-10 22:47:04,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.94 | bwd_microstep: 1663.17 | bwd_inner_microstep: 1663.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1864 [2024-06-10 22:47:05,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.52 | bwd_microstep: 709.41 | bwd_inner_microstep: 709.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3409 [2024-06-10 22:47:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1435.98 | bwd_inner_microstep: 1435.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 22:47:09,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1488.39 | bwd_inner_microstep: 1488.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-10 22:47:11,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.42 | bwd_microstep: 1438.82 | bwd_inner_microstep: 1438.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3976 [2024-06-10 22:47:14,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.31 | bwd_microstep: 1799.04 | bwd_inner_microstep: 1799.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3703 [2024-06-10 22:47:16,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.89 | bwd_microstep: 1721.35 | bwd_inner_microstep: 1721.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 22:47:17,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 794.01 | bwd_inner_microstep: 793.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 22:47:19,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.33 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628 [2024-06-10 22:47:21,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.08 | bwd_microstep: 1413.51 | bwd_inner_microstep: 1413.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-10 22:47:23,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1388.45 | bwd_inner_microstep: 1388.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 22:47:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 22:47:27,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1383.43 | bwd_inner_microstep: 1383.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939 [2024-06-10 22:47:28,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.60 | bwd_microstep: 697.78 | bwd_inner_microstep: 697.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-10 22:47:30,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1452.93 | bwd_inner_microstep: 1452.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590 [2024-06-10 22:47:32,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-10 22:47:34,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.48 | bwd_microstep: 1598.67 | bwd_inner_microstep: 1598.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4701 [2024-06-10 22:47:37,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.95 | bwd_microstep: 1685.50 | bwd_inner_microstep: 1685.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3553 [2024-06-10 22:47:39,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.17 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-10 22:47:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.19 | bwd_microstep: 1437.32 | bwd_inner_microstep: 1437.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 22:47:43,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1556.63 | bwd_inner_microstep: 1556.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654 [2024-06-10 22:47:45,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1519.49 | bwd_inner_microstep: 1519.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713 [2024-06-10 22:47:46,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.01 | bwd_microstep: 1128.98 | bwd_inner_microstep: 1128.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3416 [2024-06-10 22:47:49,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-10 22:47:49,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1720.84 | bwd_inner_microstep: 1509.49 | bwd_allreduce_microstep: 211.31 | step_microstep: 37.53 [2024-06-10 22:47:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16570.38 | bwd: 44740.87 | bwd_inner: 44528.54 | bwd_allreduce: 211.61 | step: 39.02 {'loss': 1.2216, 'learning_rate': 6.550580235564794e-06, 'epoch': 0.74} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-10 22:47:50,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1245.96 | bwd_inner_microstep: 1245.77 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1866 [2024-06-10 22:47:51,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 706.18 | bwd_inner_microstep: 706.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-10 22:47:53,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.68 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3852 [2024-06-10 22:47:56,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.54 | bwd_microstep: 1659.01 | bwd_inner_microstep: 1658.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-10 22:47:58,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1448.15 | bwd_inner_microstep: 1448.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-10 22:48:00,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1481.50 | bwd_inner_microstep: 1481.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-10 22:48:02,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.74 | bwd_microstep: 1451.03 | bwd_inner_microstep: 1451.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-10 22:48:03,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.31 | bwd_microstep: 1278.82 | bwd_inner_microstep: 1278.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751 [2024-06-10 22:48:06,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1637.19 | bwd_inner_microstep: 1637.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-10 22:48:08,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.98 | bwd_microstep: 1411.28 | bwd_inner_microstep: 1411.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3525 [2024-06-10 22:48:10,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.49 | bwd_microstep: 1338.00 | bwd_inner_microstep: 1337.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379 [2024-06-10 22:48:11,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.62 | bwd_microstep: 1334.36 | bwd_inner_microstep: 1334.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-10 22:48:13,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 22:48:15,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.56 | bwd_microstep: 1339.78 | bwd_inner_microstep: 1339.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-10 22:48:17,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.45 | bwd_microstep: 1246.14 | bwd_inner_microstep: 1246.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632 [2024-06-10 22:48:19,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.55 | bwd_microstep: 1433.29 | bwd_inner_microstep: 1433.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3631 [2024-06-10 22:48:21,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.79 | bwd_microstep: 1741.62 | bwd_inner_microstep: 1741.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3831 [2024-06-10 22:48:23,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1452.20 | bwd_inner_microstep: 1452.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-10 22:48:25,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1253.16 | bwd_inner_microstep: 1253.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-10 22:48:27,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1293.01 | bwd_inner_microstep: 1292.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-10 22:48:29,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1380.84 | bwd_inner_microstep: 1380.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114 [2024-06-10 22:48:30,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.73 | bwd_microstep: 923.39 | bwd_inner_microstep: 923.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3608 [2024-06-10 22:48:32,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1340.83 | bwd_inner_microstep: 1340.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-10 22:48:34,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1506.31 | bwd_inner_microstep: 1506.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449 [2024-06-10 22:48:36,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1383.16 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3863 [2024-06-10 22:48:38,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1471.32 | bwd_inner_microstep: 1471.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722 [2024-06-10 22:48:40,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.06 | bwd_microstep: 1240.09 | bwd_inner_microstep: 1240.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2416 [2024-06-10 22:48:41,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.54 | bwd_microstep: 842.92 | bwd_inner_microstep: 842.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2004 [2024-06-10 22:48:42,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.06 | bwd_microstep: 833.26 | bwd_inner_microstep: 833.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 22:48:44,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.86 | bwd_microstep: 1246.09 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-10 22:48:45,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075 [2024-06-10 22:48:49,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 22:48:49,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.41 | bwd_microstep: 3719.95 | bwd_inner_microstep: 941.01 | bwd_allreduce_microstep: 2778.88 | step_microstep: 37.82 [2024-06-10 22:48:49,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15678.76 | bwd: 44704.40 | bwd_inner: 41924.48 | bwd_allreduce: 2779.18 | step: 39.35 {'loss': 1.1341, 'learning_rate': 6.522824169278419e-06, 'epoch': 0.74} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476 [2024-06-10 22:48:51,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.58 | bwd_microstep: 1401.16 | bwd_inner_microstep: 1401.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2367 [2024-06-10 22:48:53,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.21 | bwd_microstep: 893.05 | bwd_inner_microstep: 893.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3521 [2024-06-10 22:48:54,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1222.25 | bwd_inner_microstep: 1222.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-10 22:48:56,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1480.22 | bwd_inner_microstep: 1480.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3426 [2024-06-10 22:48:58,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.98 | bwd_microstep: 1309.27 | bwd_inner_microstep: 1309.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-10 22:49:00,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.99 | bwd_microstep: 1245.38 | bwd_inner_microstep: 1245.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-10 22:49:02,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.40 | bwd_microstep: 1376.15 | bwd_inner_microstep: 1376.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672 [2024-06-10 22:49:04,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.09 | bwd_microstep: 1354.06 | bwd_inner_microstep: 1354.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-10 22:49:05,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.12 | bwd_microstep: 1287.08 | bwd_inner_microstep: 1287.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-10 22:49:07,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1249.10 | bwd_inner_microstep: 1249.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-10 22:49:09,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.10 | bwd_microstep: 1188.38 | bwd_inner_microstep: 1188.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3500 [2024-06-10 22:49:11,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.60 | bwd_microstep: 1351.92 | bwd_inner_microstep: 1351.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979 [2024-06-10 22:49:12,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 893.99 | bwd_inner_microstep: 893.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-10 22:49:14,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1393.78 | bwd_inner_microstep: 1393.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-10 22:49:16,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1476.11 | bwd_inner_microstep: 1476.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3625 [2024-06-10 22:49:18,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1464.01 | bwd_inner_microstep: 1463.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-10 22:49:19,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 793.13 | bwd_inner_microstep: 793.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-10 22:49:21,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.77 | bwd_microstep: 1612.69 | bwd_inner_microstep: 1612.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3842 [2024-06-10 22:49:24,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.65 | bwd_microstep: 1765.84 | bwd_inner_microstep: 1765.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 22:49:25,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1285.68 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3689 [2024-06-10 22:49:27,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1235.36 | bwd_inner_microstep: 1235.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-10 22:49:29,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.45 | bwd_microstep: 1285.93 | bwd_inner_microstep: 1285.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-10 22:49:31,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.35 | bwd_microstep: 1433.83 | bwd_inner_microstep: 1433.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2041 [2024-06-10 22:49:32,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 906.30 | bwd_inner_microstep: 906.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480 [2024-06-10 22:49:34,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1217.09 | bwd_inner_microstep: 1217.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3722 [2024-06-10 22:49:36,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.35 | bwd_microstep: 1563.33 | bwd_inner_microstep: 1563.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551 [2024-06-10 22:49:38,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1297.92 | bwd_inner_microstep: 1297.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-10 22:49:40,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.34 | bwd_microstep: 1490.10 | bwd_inner_microstep: 1490.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-10 22:49:42,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.40 | bwd_microstep: 1547.39 | bwd_inner_microstep: 1547.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3802 [2024-06-10 22:49:44,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.05 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3422 [2024-06-10 22:49:46,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.19 | bwd_microstep: 1544.33 | bwd_inner_microstep: 1544.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-10 22:49:51,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-10 22:49:51,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 3573.30 | bwd_inner_microstep: 1685.96 | bwd_allreduce_microstep: 1887.28 | step_microstep: 37.67 [2024-06-10 22:49:51,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16001.22 | bwd: 44700.49 | bwd_inner: 42812.31 | bwd_allreduce: 1887.51 | step: 39.19 {'loss': 1.1561, 'learning_rate': 6.4951155695352595e-06, 'epoch': 0.74} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-10 22:49:53,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.31 | bwd_microstep: 1470.67 | bwd_inner_microstep: 1470.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-10 22:49:54,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1244.53 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2304 [2024-06-10 22:49:56,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 973.76 | bwd_inner_microstep: 973.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3013 [2024-06-10 22:49:57,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1226.26 | bwd_inner_microstep: 1226.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-10 22:50:00,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.81 | bwd_microstep: 1643.12 | bwd_inner_microstep: 1643.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 22:50:01,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1278.94 | bwd_inner_microstep: 1278.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3793 [2024-06-10 22:50:03,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.37 | bwd_microstep: 1442.14 | bwd_inner_microstep: 1442.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-10 22:50:05,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.09 | bwd_microstep: 1146.57 | bwd_inner_microstep: 1146.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3401 [2024-06-10 22:50:07,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.74 | bwd_microstep: 1178.70 | bwd_inner_microstep: 1178.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3751 [2024-06-10 22:50:09,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.91 | bwd_microstep: 1442.52 | bwd_inner_microstep: 1442.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-10 22:50:10,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.89 | bwd_microstep: 1187.32 | bwd_inner_microstep: 1187.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2156 [2024-06-10 22:50:12,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.87 | bwd_microstep: 947.11 | bwd_inner_microstep: 947.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439 [2024-06-10 22:50:13,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1409.72 | bwd_inner_microstep: 1409.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3693 [2024-06-10 22:50:16,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.00 | bwd_microstep: 1720.50 | bwd_inner_microstep: 1720.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-10 22:50:18,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.32 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-10 22:50:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.37 | bwd_microstep: 1490.22 | bwd_inner_microstep: 1490.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3687 [2024-06-10 22:50:22,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.09 | bwd_microstep: 1331.66 | bwd_inner_microstep: 1331.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-10 22:50:24,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1450.29 | bwd_inner_microstep: 1450.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2933 [2024-06-10 22:50:25,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.05 | bwd_microstep: 1097.44 | bwd_inner_microstep: 1097.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-10 22:50:27,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.86 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-10 22:50:29,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.41 | bwd_microstep: 1256.70 | bwd_inner_microstep: 1256.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-10 22:50:30,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.98 | bwd_microstep: 697.05 | bwd_inner_microstep: 697.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-10 22:50:32,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.82 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-10 22:50:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1299.08 | bwd_inner_microstep: 1299.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827 [2024-06-10 22:50:36,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.68 | bwd_microstep: 1491.58 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-10 22:50:37,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.92 | bwd_microstep: 1251.60 | bwd_inner_microstep: 1251.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806 [2024-06-10 22:50:39,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.86 | bwd_microstep: 1515.03 | bwd_inner_microstep: 1515.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-10 22:50:42,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.82 | bwd_microstep: 1546.50 | bwd_inner_microstep: 1546.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2268 [2024-06-10 22:50:43,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.33 | bwd_microstep: 809.09 | bwd_inner_microstep: 809.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808 [2024-06-10 22:50:45,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1359.49 | bwd_inner_microstep: 1359.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801 [2024-06-10 22:50:47,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1550.61 | bwd_inner_microstep: 1550.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 22:50:49,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.04 | optimizer_step: 6.60 [2024-06-10 22:50:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 1850.86 | bwd_inner_microstep: 1089.59 | bwd_allreduce_microstep: 761.23 | step_microstep: 37.50 [2024-06-10 22:50:49,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15634.05 | bwd: 42494.27 | bwd_inner: 41732.11 | bwd_allreduce: 761.46 | step: 38.97 {'loss': 1.1984, 'learning_rate': 6.46745453392485e-06, 'epoch': 0.74} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3462 [2024-06-10 22:50:51,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1407.79 | bwd_inner_microstep: 1407.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 22:50:53,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.48 | bwd_microstep: 1379.46 | bwd_inner_microstep: 1379.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3887 [2024-06-10 22:50:55,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.22 | bwd_microstep: 1480.69 | bwd_inner_microstep: 1480.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-10 22:50:57,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.17 | bwd_microstep: 1244.55 | bwd_inner_microstep: 1244.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-10 22:50:59,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1436.24 | bwd_inner_microstep: 1436.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-10 22:51:00,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 1380.88 | bwd_inner_microstep: 1380.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-10 22:51:02,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.17 | bwd_microstep: 804.96 | bwd_inner_microstep: 804.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4019 [2024-06-10 22:51:04,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.74 | bwd_microstep: 1612.57 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4011 [2024-06-10 22:51:06,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.30 | bwd_microstep: 1815.80 | bwd_inner_microstep: 1815.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3550 [2024-06-10 22:51:08,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1491.11 | bwd_inner_microstep: 1491.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412 [2024-06-10 22:51:10,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.90 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673 [2024-06-10 22:51:12,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.04 | bwd_microstep: 1580.43 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2117 [2024-06-10 22:51:14,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.46 | bwd_microstep: 919.54 | bwd_inner_microstep: 919.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478 [2024-06-10 22:51:16,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.84 | bwd_microstep: 1576.07 | bwd_inner_microstep: 1576.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3454 [2024-06-10 22:51:18,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1398.95 | bwd_inner_microstep: 1398.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-10 22:51:20,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.68 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487 [2024-06-10 22:51:22,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.76 | bwd_microstep: 1570.00 | bwd_inner_microstep: 1569.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1983 [2024-06-10 22:51:23,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.22 | bwd_microstep: 893.15 | bwd_inner_microstep: 893.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3629 [2024-06-10 22:51:25,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.36 | bwd_microstep: 1533.59 | bwd_inner_microstep: 1533.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-10 22:51:27,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1609.03 | bwd_inner_microstep: 1609.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 22:51:29,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1394.97 | bwd_inner_microstep: 1394.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3895 [2024-06-10 22:51:31,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1483.86 | bwd_inner_microstep: 1483.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-10 22:51:33,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.69 | bwd_microstep: 1182.18 | bwd_inner_microstep: 1182.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3812 [2024-06-10 22:51:35,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.56 | bwd_microstep: 1414.46 | bwd_inner_microstep: 1414.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009 [2024-06-10 22:51:36,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.13 | bwd_microstep: 804.63 | bwd_inner_microstep: 804.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-10 22:51:38,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1256.34 | bwd_inner_microstep: 1256.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-10 22:51:39,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.00 | bwd_microstep: 1274.35 | bwd_inner_microstep: 1274.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-10 22:51:42,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.04 | bwd_microstep: 1606.68 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3580 [2024-06-10 22:51:44,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.20 | bwd_microstep: 1471.97 | bwd_inner_microstep: 1471.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-10 22:51:46,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.66 | bwd_microstep: 1305.55 | bwd_inner_microstep: 1305.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3461 [2024-06-10 22:51:48,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.44 | bwd_microstep: 1567.70 | bwd_inner_microstep: 1567.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426 [2024-06-10 22:51:51,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-10 22:51:51,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.15 | bwd_microstep: 3145.15 | bwd_inner_microstep: 1445.69 | bwd_allreduce_microstep: 1699.41 | step_microstep: 37.83 [2024-06-10 22:51:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16389.08 | bwd: 45675.92 | bwd_inner: 43975.62 | bwd_allreduce: 1699.64 | step: 39.29 {'loss': 1.1699, 'learning_rate': 6.439841159869233e-06, 'epoch': 0.75} 6 [22:09:24<7:36:02, 61.49s/it] 74%|███████▍ | 1282/1726 [22:10:25<7:35:22, 61.54s/it] 74%|███████▍ | 1282/1726 [22:10:25<7:35:22, 61.54s/it] 74%|███████▍ | 1283/1726 [22:11:26<7:32:31, 61.29s/it] 74%|███████▍ | 1283/1726 [22:11:26<7:32:31, 61.29s/it] 74%|███████▍ | 1284/1726 [22:12:27<7:30:57, 61.22s/it] 74%|███████▍ | 1284/1726 [22:12:27<7:30:57, 61.22s/it] 74%|███████▍ | 1285/1726 [22:13:26<7:23:50, 60.39s/it] 74%|███████▍ | 1285/1726 [22:13:26<7:23:50, 60.39s/it] 75%|███████▍ | 1286/1726 [22:14:28<7:27:16, 60.99s/it] 75%|███████▍ | 1286/1726 [2dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3550 [2024-06-10 22:51:53,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.13 | bwd_microstep: 1246.95 | bwd_inner_microstep: 1246.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897 [2024-06-10 22:51:55,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.59 | bwd_microstep: 1581.28 | bwd_inner_microstep: 1581.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2348 [2024-06-10 22:51:57,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.54 | bwd_microstep: 984.91 | bwd_inner_microstep: 984.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-10 22:51:59,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.17 | bwd_microstep: 1544.23 | bwd_inner_microstep: 1544.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 22:52:01,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.97 | bwd_microstep: 1385.75 | bwd_inner_microstep: 1385.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 22:52:02,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 22:52:05,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.72 | bwd_microstep: 1640.69 | bwd_inner_microstep: 1640.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-10 22:52:07,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.44 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-10 22:52:08,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1255.81 | bwd_inner_microstep: 1255.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-10 22:52:10,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1286.41 | bwd_inner_microstep: 1286.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1873 [2024-06-10 22:52:11,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.31 | bwd_microstep: 741.14 | bwd_inner_microstep: 741.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-10 22:52:13,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.01 | bwd_microstep: 1476.64 | bwd_inner_microstep: 1476.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3667 [2024-06-10 22:52:15,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.05 | bwd_microstep: 1548.96 | bwd_inner_microstep: 1548.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949 [2024-06-10 22:52:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.42 | bwd_microstep: 890.32 | bwd_inner_microstep: 890.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-10 22:52:19,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.37 | bwd_microstep: 1508.58 | bwd_inner_microstep: 1508.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-10 22:52:21,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1386.90 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-10 22:52:23,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.78 | bwd_microstep: 1393.63 | bwd_inner_microstep: 1393.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-10 22:52:25,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.86 | bwd_microstep: 1620.31 | bwd_inner_microstep: 1620.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3596 [2024-06-10 22:52:27,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.83 | bwd_microstep: 1467.51 | bwd_inner_microstep: 1467.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3859 [2024-06-10 22:52:29,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.25 | bwd_microstep: 1569.77 | bwd_inner_microstep: 1569.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-10 22:52:31,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.85 | bwd_microstep: 1394.94 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-10 22:52:33,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.72 | bwd_microstep: 1423.80 | bwd_inner_microstep: 1423.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 22:52:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.97 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2652 [2024-06-10 22:52:37,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.02 | bwd_microstep: 1222.02 | bwd_inner_microstep: 1221.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-10 22:52:39,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.50 | bwd_microstep: 1662.13 | bwd_inner_microstep: 1662.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3550 [2024-06-10 22:52:41,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.72 | bwd_microstep: 1586.41 | bwd_inner_microstep: 1586.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2202 [2024-06-10 22:52:42,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.19 | bwd_microstep: 858.95 | bwd_inner_microstep: 858.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3726 [2024-06-10 22:52:45,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.58 | bwd_microstep: 1730.37 | bwd_inner_microstep: 1730.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3420 [2024-06-10 22:52:46,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1313.60 | bwd_inner_microstep: 1313.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3563 [2024-06-10 22:52:49,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.40 | bwd_microstep: 1525.00 | bwd_inner_microstep: 1524.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3770 [2024-06-10 22:52:50,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.14 | bwd_microstep: 1356.58 | bwd_inner_microstep: 1356.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-10 22:52:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.05 | optimizer_step: 6.63 [2024-06-10 22:52:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1998.99 | bwd_inner_microstep: 1749.42 | bwd_allreduce_microstep: 249.53 | step_microstep: 37.76 [2024-06-10 22:52:53,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16572.63 | bwd: 44766.09 | bwd_inner: 44515.67 | bwd_allreduce: 249.75 | step: 39.24 {'loss': 1.1843, 'learning_rate': 6.412275544622557e-06, 'epoch': 0.75} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1856 [2024-06-10 22:52:54,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.12 | bwd_microstep: 759.18 | bwd_inner_microstep: 759.11 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-10 22:52:56,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.82 | bwd_microstep: 1280.96 | bwd_inner_microstep: 1280.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2691 [2024-06-10 22:52:57,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.87 | bwd_microstep: 1125.15 | bwd_inner_microstep: 1125.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3846 [2024-06-10 22:52:59,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.36 | bwd_microstep: 1466.07 | bwd_inner_microstep: 1466.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3479 [2024-06-10 22:53:01,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.39 | bwd_microstep: 1215.27 | bwd_inner_microstep: 1215.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4124 [2024-06-10 22:53:03,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.67 | bwd_microstep: 1595.15 | bwd_inner_microstep: 1595.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3738 [2024-06-10 22:53:06,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.79 | bwd_microstep: 1630.91 | bwd_inner_microstep: 1630.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1889 [2024-06-10 22:53:07,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.95 | bwd_microstep: 712.25 | bwd_inner_microstep: 712.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 22:53:08,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.55 | bwd_microstep: 1247.14 | bwd_inner_microstep: 1247.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2385 [2024-06-10 22:53:09,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.99 | bwd_microstep: 837.06 | bwd_inner_microstep: 837.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-10 22:53:11,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.91 | bwd_microstep: 1152.43 | bwd_inner_microstep: 1152.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-10 22:53:13,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.68 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-10 22:53:15,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.61 | bwd_inner_microstep: 1477.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-10 22:53:17,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.58 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954 [2024-06-10 22:53:18,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.99 | bwd_microstep: 850.72 | bwd_inner_microstep: 850.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3437 [2024-06-10 22:53:20,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1308.78 | bwd_inner_microstep: 1308.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3829 [2024-06-10 22:53:22,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1387.34 | bwd_inner_microstep: 1387.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529 [2024-06-10 22:53:24,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1411.49 | bwd_inner_microstep: 1411.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2135 [2024-06-10 22:53:25,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.46 | bwd_microstep: 833.24 | bwd_inner_microstep: 833.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618 [2024-06-10 22:53:27,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1434.31 | bwd_inner_microstep: 1434.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-10 22:53:29,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1457.76 | bwd_inner_microstep: 1457.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-10 22:53:31,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1281.39 | bwd_inner_microstep: 1281.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-10 22:53:33,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.71 | bwd_microstep: 1456.67 | bwd_inner_microstep: 1456.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1998 [2024-06-10 22:53:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.27 | bwd_microstep: 737.27 | bwd_inner_microstep: 737.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-10 22:53:36,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.66 | bwd_microstep: 1283.48 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3777 [2024-06-10 22:53:38,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.93 | bwd_microstep: 1473.16 | bwd_inner_microstep: 1473.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-10 22:53:40,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1507.81 | bwd_inner_microstep: 1507.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-10 22:53:42,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1445.08 | bwd_inner_microstep: 1445.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3738 [2024-06-10 22:53:44,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.07 | bwd_microstep: 1457.06 | bwd_inner_microstep: 1457.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-10 22:53:46,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1634.74 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-10 22:53:48,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1304.56 | bwd_inner_microstep: 1304.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-10 22:53:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.33 | optimizer_step: 6.63 [2024-06-10 22:53:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.08 | bwd_microstep: 5821.06 | bwd_inner_microstep: 1477.97 | bwd_allreduce_microstep: 4343.02 | step_microstep: 38.80 [2024-06-10 22:53:54,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15383.97 | bwd: 45415.06 | bwd_inner: 41071.07 | bwd_allreduce: 4343.30 | step: 40.26 {'loss': 1.1461, 'learning_rate': 6.384757785270777e-06, 'epoch': 0.75} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-10 22:53:55,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.25 | bwd_microstep: 797.51 | bwd_inner_microstep: 797.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-10 22:53:57,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.25 | bwd_microstep: 1242.47 | bwd_inner_microstep: 1242.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2817 [2024-06-10 22:53:59,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.02 | bwd_microstep: 1108.97 | bwd_inner_microstep: 1108.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1878 [2024-06-10 22:54:00,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.47 | bwd_microstep: 771.40 | bwd_inner_microstep: 771.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-10 22:54:01,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-10 22:54:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.27 | bwd_microstep: 1277.13 | bwd_inner_microstep: 1277.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1934 [2024-06-10 22:54:04,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.86 | bwd_microstep: 741.64 | bwd_inner_microstep: 741.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 877 [2024-06-10 22:54:05,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.24 | bwd_microstep: 398.19 | bwd_inner_microstep: 398.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-10 22:54:06,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.60 | bwd_microstep: 1216.11 | bwd_inner_microstep: 1216.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3486 [2024-06-10 22:54:08,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.25 | bwd_microstep: 1431.07 | bwd_inner_microstep: 1431.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3590 [2024-06-10 22:54:10,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.58 | bwd_microstep: 1367.72 | bwd_inner_microstep: 1367.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-10 22:54:12,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.35 | bwd_microstep: 1416.29 | bwd_inner_microstep: 1416.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3490 [2024-06-10 22:54:14,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.47 | bwd_microstep: 1528.72 | bwd_inner_microstep: 1528.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3827 [2024-06-10 22:54:17,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.08 | bwd_microstep: 1583.96 | bwd_inner_microstep: 1583.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-10 22:54:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-10 22:54:20,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.89 | bwd_microstep: 799.85 | bwd_inner_microstep: 799.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3904 [2024-06-10 22:54:22,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1492.95 | bwd_inner_microstep: 1492.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3445 [2024-06-10 22:54:23,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1284.35 | bwd_inner_microstep: 1284.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4054 [2024-06-10 22:54:26,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.65 | bwd_microstep: 1822.18 | bwd_inner_microstep: 1822.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-10 22:54:28,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1553.69 | bwd_inner_microstep: 1553.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-10 22:54:30,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.21 | bwd_microstep: 1613.44 | bwd_inner_microstep: 1613.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-10 22:54:32,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.23 | bwd_microstep: 1402.10 | bwd_inner_microstep: 1402.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-10 22:54:34,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.26 | bwd_microstep: 1284.84 | bwd_inner_microstep: 1284.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-10 22:54:36,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.41 | bwd_microstep: 1347.88 | bwd_inner_microstep: 1347.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 866 [2024-06-10 22:54:36,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 140.97 | bwd_microstep: 366.26 | bwd_inner_microstep: 366.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3817 [2024-06-10 22:54:38,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1401.30 | bwd_inner_microstep: 1401.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-10 22:54:41,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.55 | bwd_microstep: 1624.71 | bwd_inner_microstep: 1624.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3232 [2024-06-10 22:54:43,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.14 | bwd_microstep: 1424.32 | bwd_inner_microstep: 1424.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-10 22:54:44,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.56 | bwd_microstep: 897.04 | bwd_inner_microstep: 897.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3544 [2024-06-10 22:54:46,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.71 | bwd_microstep: 1543.89 | bwd_inner_microstep: 1543.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449 [2024-06-10 22:54:48,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.12 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-10 22:54:55,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-10 22:54:55,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.22 | bwd_microstep: 6173.26 | bwd_inner_microstep: 1803.63 | bwd_allreduce_microstep: 4369.56 | step_microstep: 38.57 [2024-06-10 22:54:55,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15108.27 | bwd: 45016.08 | bwd_inner: 40645.57 | bwd_allreduce: 4369.81 | step: 40.05 {'loss': 1.2173, 'learning_rate': 6.357287978731292e-06, 'epoch': 0.75} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411 [2024-06-10 22:54:56,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1300.88 | bwd_inner_microstep: 1300.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2102 [2024-06-10 22:54:58,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 821.07 | bwd_inner_microstep: 821.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867 [2024-06-10 22:55:00,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.87 | bwd_microstep: 1658.86 | bwd_inner_microstep: 1658.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-10 22:55:02,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.97 | bwd_microstep: 1309.99 | bwd_inner_microstep: 1309.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3583 [2024-06-10 22:55:03,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1264.87 | bwd_inner_microstep: 1264.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946 [2024-06-10 22:55:04,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.75 | bwd_microstep: 728.02 | bwd_inner_microstep: 728.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3498 [2024-06-10 22:55:06,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.24 | bwd_microstep: 1187.69 | bwd_inner_microstep: 1187.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-10 22:55:08,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.85 | bwd_microstep: 1386.27 | bwd_inner_microstep: 1386.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-10 22:55:10,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.98 | bwd_microstep: 1427.26 | bwd_inner_microstep: 1427.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-10 22:55:12,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1398.95 | bwd_inner_microstep: 1398.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-10 22:55:14,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.34 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1896 [2024-06-10 22:55:15,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.96 | bwd_microstep: 683.47 | bwd_inner_microstep: 683.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-10 22:55:17,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1390.63 | bwd_inner_microstep: 1390.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3707 [2024-06-10 22:55:19,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1466.83 | bwd_inner_microstep: 1466.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-10 22:55:21,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.25 | bwd_microstep: 1614.53 | bwd_inner_microstep: 1614.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446 [2024-06-10 22:55:23,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.51 | bwd_microstep: 1281.12 | bwd_inner_microstep: 1281.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3447 [2024-06-10 22:55:25,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1476.75 | bwd_inner_microstep: 1476.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1932 [2024-06-10 22:55:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.70 | bwd_microstep: 696.50 | bwd_inner_microstep: 696.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-10 22:55:28,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.39 | bwd_microstep: 1403.97 | bwd_inner_microstep: 1403.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3628 [2024-06-10 22:55:29,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.90 | bwd_microstep: 1216.47 | bwd_inner_microstep: 1216.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2076 [2024-06-10 22:55:30,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.62 | bwd_microstep: 913.70 | bwd_inner_microstep: 913.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-10 22:55:33,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-10 22:55:35,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.94 | bwd_microstep: 1556.38 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-10 22:55:37,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 1502.60 | bwd_inner_microstep: 1502.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2531 [2024-06-10 22:55:38,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 423.93 | bwd_microstep: 1150.12 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3535 [2024-06-10 22:55:40,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1389.41 | bwd_inner_microstep: 1389.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-10 22:55:42,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-10 22:55:44,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.64 | bwd_microstep: 1455.99 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2287 [2024-06-10 22:55:46,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.26 | bwd_microstep: 975.19 | bwd_inner_microstep: 975.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3780 [2024-06-10 22:55:48,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.31 | bwd_microstep: 1692.31 | bwd_inner_microstep: 1692.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228 [2024-06-10 22:55:49,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.35 | bwd_microstep: 961.82 | bwd_inner_microstep: 961.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-10 22:55:57,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-10 22:55:57,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 7411.73 | bwd_inner_microstep: 1661.19 | bwd_allreduce_microstep: 5750.48 | step_microstep: 37.73 [2024-06-10 22:55:57,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15382.19 | bwd: 47008.51 | bwd_inner: 41257.12 | bwd_allreduce: 5750.71 | step: 39.22 {'loss': 1.2659, 'learning_rate': 6.3298662217526315e-06, 'epoch': 0.75} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3488 [2024-06-10 22:55:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1433.30 | bwd_inner_microstep: 1433.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-10 22:56:01,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.36 | bwd_microstep: 1278.74 | bwd_inner_microstep: 1278.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2330 [2024-06-10 22:56:02,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.72 | bwd_microstep: 914.26 | bwd_inner_microstep: 914.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787 [2024-06-10 22:56:05,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.85 | bwd_microstep: 1640.71 | bwd_inner_microstep: 1640.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 22:56:06,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.46 | bwd_microstep: 1295.73 | bwd_inner_microstep: 1295.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 22:56:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.11 | bwd_microstep: 1338.12 | bwd_inner_microstep: 1338.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-10 22:56:10,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.70 | bwd_microstep: 1430.08 | bwd_inner_microstep: 1430.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 22:56:12,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1377.17 | bwd_inner_microstep: 1377.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 879 [2024-06-10 22:56:13,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 141.61 | bwd_microstep: 367.22 | bwd_inner_microstep: 367.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1907 [2024-06-10 22:56:14,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.09 | bwd_microstep: 715.47 | bwd_inner_microstep: 715.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551 [2024-06-10 22:56:15,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.82 | bwd_microstep: 1301.27 | bwd_inner_microstep: 1301.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-10 22:56:17,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1338.55 | bwd_inner_microstep: 1338.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3660 [2024-06-10 22:56:19,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.81 | bwd_microstep: 1577.71 | bwd_inner_microstep: 1577.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-10 22:56:21,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 1346.53 | bwd_inner_microstep: 1346.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3497 [2024-06-10 22:56:24,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.47 | bwd_microstep: 1678.85 | bwd_inner_microstep: 1678.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-10 22:56:26,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.11 | bwd_microstep: 1491.41 | bwd_inner_microstep: 1491.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-10 22:56:28,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.00 | bwd_microstep: 1554.13 | bwd_inner_microstep: 1554.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-10 22:56:30,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.74 | bwd_microstep: 1378.64 | bwd_inner_microstep: 1378.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-10 22:56:32,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1283.66 | bwd_inner_microstep: 1283.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 877 [2024-06-10 22:56:32,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 151.44 | bwd_microstep: 397.62 | bwd_inner_microstep: 397.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3477 [2024-06-10 22:56:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1419.83 | bwd_inner_microstep: 1419.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-10 22:56:36,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1395.75 | bwd_inner_microstep: 1395.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-10 22:56:38,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1399.01 | bwd_inner_microstep: 1398.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-10 22:56:40,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1391.36 | bwd_inner_microstep: 1391.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-10 22:56:41,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.08 | bwd_microstep: 1159.36 | bwd_inner_microstep: 1159.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-10 22:56:43,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1348.11 | bwd_inner_microstep: 1348.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3556 [2024-06-10 22:56:45,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1293.74 | bwd_inner_microstep: 1293.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 22:56:47,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1491.40 | bwd_inner_microstep: 1491.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3562 [2024-06-10 22:56:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.45 | bwd_microstep: 1440.72 | bwd_inner_microstep: 1440.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-10 22:56:51,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.31 | bwd_microstep: 1500.52 | bwd_inner_microstep: 1500.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2658 [2024-06-10 22:56:53,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.75 | bwd_microstep: 1020.71 | bwd_inner_microstep: 1020.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2235 [2024-06-10 22:56:59,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.09 | optimizer_step: 6.63 [2024-06-10 22:56:59,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.08 | bwd_microstep: 5678.51 | bwd_inner_microstep: 1087.67 | bwd_allreduce_microstep: 4590.79 | step_microstep: 37.81 [2024-06-10 22:56:59,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15384.94 | bwd: 45678.21 | bwd_inner: 41086.52 | bwd_allreduce: 4591.02 | step: 39.40 {'loss': 1.1452, 'learning_rate': 6.3024926109140725e-06, 'epoch': 0.75} 2:14:28<7:27:16, 60.99s/it] 75%|███████▍ | 1287/1726 [22:15:30<7:27:46, 61.20s/it] 75%|███████▍ | 1287/1726 [22:15:30<7:27:46, 61.20s/it] 75%|███████▍ | 1288/1726 [22:16:31<7:26:34, 61.18s/it] 75%|███████▍ | 1288/1726 [22:16:31<7:26:34, 61.18s/it] 75%|███████▍ | 1289/1726 [22:17:31<7:23:58, 60.96s/it] 75%|███████▍ | 1289/1726 [22:17:31<7:23:58, 60.96s/it] 75%|███████▍ | 1290/1726 [22:18:34<7:26:47, 61.49s/it] 75%|███████▍ | 1290/1726 [22:18:34<7:26:47, 61.49s/it] 75%|███████▍ | 1291/1726 [22:19:35<7:25:35, 61.46s/it] 75%|███████▍ | 1291/1726 [22:19dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3039 [2024-06-10 22:57:00,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.01 | bwd_microstep: 1272.50 | bwd_inner_microstep: 1272.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-10 22:57:02,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.94 | bwd_microstep: 1242.52 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860 [2024-06-10 22:57:04,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1455.96 | bwd_inner_microstep: 1455.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3886 [2024-06-10 22:57:06,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.41 | bwd_microstep: 1489.82 | bwd_inner_microstep: 1489.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-10 22:57:08,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.14 | bwd_microstep: 1543.41 | bwd_inner_microstep: 1543.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-10 22:57:10,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-10 22:57:12,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.50 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938 [2024-06-10 22:57:13,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.69 | bwd_microstep: 725.90 | bwd_inner_microstep: 725.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:57:15,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1283.55 | bwd_inner_microstep: 1283.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3427 [2024-06-10 22:57:16,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.22 | bwd_microstep: 1183.95 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644 [2024-06-10 22:57:18,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.97 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3492 [2024-06-10 22:57:20,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1442.12 | bwd_inner_microstep: 1442.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-10 22:57:22,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1479.11 | bwd_inner_microstep: 1479.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3657 [2024-06-10 22:57:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1655.04 | bwd_inner_microstep: 1655.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1934 [2024-06-10 22:57:26,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.71 | bwd_microstep: 821.95 | bwd_inner_microstep: 821.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679 [2024-06-10 22:57:28,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1963 [2024-06-10 22:57:29,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.08 | bwd_microstep: 854.76 | bwd_inner_microstep: 854.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3381 [2024-06-10 22:57:31,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.91 | bwd_microstep: 1271.81 | bwd_inner_microstep: 1271.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-10 22:57:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1345.37 | bwd_inner_microstep: 1345.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3619 [2024-06-10 22:57:35,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.56 | bwd_microstep: 1570.70 | bwd_inner_microstep: 1570.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-10 22:57:37,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.88 | bwd_microstep: 1479.61 | bwd_inner_microstep: 1479.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3646 [2024-06-10 22:57:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1412.72 | bwd_inner_microstep: 1412.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 22:57:41,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1536.66 | bwd_inner_microstep: 1536.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-10 22:57:43,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1460.42 | bwd_inner_microstep: 1460.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558 [2024-06-10 22:57:45,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.14 | bwd_microstep: 1360.27 | bwd_inner_microstep: 1360.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-10 22:57:47,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.81 | bwd_microstep: 1289.65 | bwd_inner_microstep: 1289.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-10 22:57:49,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.14 | bwd_microstep: 1508.70 | bwd_inner_microstep: 1508.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606 [2024-06-10 22:57:51,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1540.64 | bwd_inner_microstep: 1540.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3801 [2024-06-10 22:57:53,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1555.68 | bwd_inner_microstep: 1555.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-10 22:57:54,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.66 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-10 22:57:56,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.16 | bwd_microstep: 1505.13 | bwd_inner_microstep: 1505.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-10 22:58:00,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-10 22:58:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.46 | bwd_microstep: 3397.79 | bwd_inner_microstep: 1139.93 | bwd_allreduce_microstep: 2257.81 | step_microstep: 38.02 [2024-06-10 22:58:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15981.76 | bwd: 45155.83 | bwd_inner: 42897.12 | bwd_allreduce: 2258.04 | step: 39.49 {'loss': 1.1915, 'learning_rate': 6.275167242625331e-06, 'epoch': 0.75} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-10 22:58:02,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.52 | bwd_microstep: 1472.12 | bwd_inner_microstep: 1471.93 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-10 22:58:03,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.17 | bwd_microstep: 695.91 | bwd_inner_microstep: 695.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915 [2024-06-10 22:58:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1585.46 | bwd_inner_microstep: 1585.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2330 [2024-06-10 22:58:07,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.38 | bwd_microstep: 920.27 | bwd_inner_microstep: 920.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-10 22:58:09,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1490.20 | bwd_inner_microstep: 1490.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-10 22:58:10,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 793.92 | bwd_inner_microstep: 793.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-10 22:58:12,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-10 22:58:13,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1388.82 | bwd_inner_microstep: 1388.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411 [2024-06-10 22:58:15,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1308.03 | bwd_inner_microstep: 1308.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 22:58:17,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1388.79 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3518 [2024-06-10 22:58:19,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.10 | bwd_microstep: 1511.61 | bwd_inner_microstep: 1511.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1964 [2024-06-10 22:58:20,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.45 | bwd_microstep: 853.88 | bwd_inner_microstep: 853.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430 [2024-06-10 22:58:22,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.47 | bwd_microstep: 1284.96 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3518 [2024-06-10 22:58:24,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.68 | bwd_microstep: 1434.32 | bwd_inner_microstep: 1434.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-10 22:58:26,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1386.26 | bwd_inner_microstep: 1386.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-10 22:58:28,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.70 | bwd_microstep: 1387.95 | bwd_inner_microstep: 1387.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-10 22:58:30,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.09 | bwd_microstep: 1314.24 | bwd_inner_microstep: 1314.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2390 [2024-06-10 22:58:31,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.34 | bwd_microstep: 906.81 | bwd_inner_microstep: 906.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143 [2024-06-10 22:58:32,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.11 | bwd_microstep: 930.73 | bwd_inner_microstep: 930.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-10 22:58:34,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.23 | bwd_microstep: 1311.97 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-10 22:58:36,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.10 | bwd_microstep: 1293.59 | bwd_inner_microstep: 1293.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-10 22:58:38,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.86 | bwd_microstep: 1460.66 | bwd_inner_microstep: 1460.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-10 22:58:40,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.85 | bwd_microstep: 1654.63 | bwd_inner_microstep: 1654.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-10 22:58:42,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.76 | bwd_microstep: 1510.55 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-10 22:58:44,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1498.17 | bwd_inner_microstep: 1498.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809 [2024-06-10 22:58:46,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.50 | bwd_microstep: 1414.76 | bwd_inner_microstep: 1414.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3586 [2024-06-10 22:58:48,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1436.08 | bwd_inner_microstep: 1436.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3765 [2024-06-10 22:58:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.56 | bwd_microstep: 1469.20 | bwd_inner_microstep: 1469.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3612 [2024-06-10 22:58:53,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.49 | bwd_microstep: 1512.40 | bwd_inner_microstep: 1512.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-10 22:58:55,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.34 | bwd_microstep: 1636.07 | bwd_inner_microstep: 1636.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-10 22:58:57,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1538.52 | bwd_inner_microstep: 1538.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-10 22:59:00,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.63 [2024-06-10 22:59:00,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.47 | bwd_microstep: 2537.21 | bwd_inner_microstep: 1818.58 | bwd_allreduce_microstep: 718.58 | step_microstep: 37.69 [2024-06-10 22:59:00,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15953.58 | bwd: 43575.26 | bwd_inner: 42855.65 | bwd_allreduce: 718.88 | step: 39.35 {'loss': 1.1535, 'learning_rate': 6.247890213126213e-06, 'epoch': 0.75} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-10 22:59:02,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.39 | bwd_microstep: 1273.43 | bwd_inner_microstep: 1273.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 22:59:04,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405 [2024-06-10 22:59:05,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.29 | bwd_microstep: 1180.14 | bwd_inner_microstep: 1180.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-10 22:59:07,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.54 | bwd_microstep: 1378.03 | bwd_inner_microstep: 1378.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4303 [2024-06-10 22:59:10,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.28 | bwd_microstep: 1781.89 | bwd_inner_microstep: 1781.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-10 22:59:12,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.42 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-10 22:59:13,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.58 | bwd_microstep: 1342.66 | bwd_inner_microstep: 1342.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-10 22:59:15,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.36 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-10 22:59:17,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.04 | bwd_microstep: 1534.04 | bwd_inner_microstep: 1534.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4020 [2024-06-10 22:59:19,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.52 | bwd_microstep: 1519.78 | bwd_inner_microstep: 1519.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2108 [2024-06-10 22:59:20,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.28 | bwd_microstep: 732.26 | bwd_inner_microstep: 732.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1924 [2024-06-10 22:59:21,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.31 | bwd_microstep: 728.15 | bwd_inner_microstep: 728.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3707 [2024-06-10 22:59:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.22 | bwd_microstep: 1545.92 | bwd_inner_microstep: 1545.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-10 22:59:25,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.53 | bwd_microstep: 1380.61 | bwd_inner_microstep: 1380.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523 [2024-06-10 22:59:28,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.73 | bwd_microstep: 1585.76 | bwd_inner_microstep: 1585.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2331 [2024-06-10 22:59:29,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.89 | bwd_microstep: 919.02 | bwd_inner_microstep: 918.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-10 22:59:31,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.51 | bwd_microstep: 1386.54 | bwd_inner_microstep: 1386.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3523 [2024-06-10 22:59:33,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.85 | bwd_microstep: 1196.90 | bwd_inner_microstep: 1196.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-10 22:59:33,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.63 | bwd_microstep: 696.77 | bwd_inner_microstep: 696.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-10 22:59:35,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.04 | bwd_microstep: 1454.60 | bwd_inner_microstep: 1454.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-10 22:59:37,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1353.22 | bwd_inner_microstep: 1353.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-10 22:59:39,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1432.77 | bwd_inner_microstep: 1432.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-10 22:59:41,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.03 | bwd_microstep: 1359.09 | bwd_inner_microstep: 1359.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-10 22:59:43,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1289.97 | bwd_inner_microstep: 1289.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-10 22:59:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.95 | bwd_microstep: 1216.11 | bwd_inner_microstep: 1216.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3558 [2024-06-10 22:59:46,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.10 | bwd_microstep: 1233.39 | bwd_inner_microstep: 1233.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831 [2024-06-10 22:59:49,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.25 | bwd_microstep: 1518.02 | bwd_inner_microstep: 1517.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3432 [2024-06-10 22:59:50,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1426.17 | bwd_inner_microstep: 1426.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-10 22:59:52,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1339.77 | bwd_inner_microstep: 1339.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3595 [2024-06-10 22:59:55,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.33 | bwd_microstep: 1596.70 | bwd_inner_microstep: 1596.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2084 [2024-06-10 22:59:56,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.55 | bwd_microstep: 1012.24 | bwd_inner_microstep: 1012.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-10 23:00:00,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-10 23:00:00,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.75 | bwd_microstep: 3524.11 | bwd_inner_microstep: 1855.47 | bwd_allreduce_microstep: 1668.59 | step_microstep: 37.68 [2024-06-10 23:00:00,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15774.91 | bwd: 43908.91 | bwd_inner: 42239.42 | bwd_allreduce: 1668.82 | step: 39.20 {'loss': 1.1959, 'learning_rate': 6.220661618486268e-06, 'epoch': 0.75} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933 [2024-06-10 23:00:01,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.14 | bwd_microstep: 817.20 | bwd_inner_microstep: 817.07 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3928 [2024-06-10 23:00:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 688.20 | bwd_microstep: 1894.76 | bwd_inner_microstep: 1894.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3896 [2024-06-10 23:00:06,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.46 | bwd_microstep: 1512.13 | bwd_inner_microstep: 1512.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-10 23:00:07,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.08 | bwd_microstep: 791.08 | bwd_inner_microstep: 791.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474 [2024-06-10 23:00:09,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.15 | bwd_microstep: 1212.00 | bwd_inner_microstep: 1211.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-10 23:00:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 1403.87 | bwd_inner_microstep: 1403.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-10 23:00:13,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1388.25 | bwd_inner_microstep: 1388.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3715 [2024-06-10 23:00:15,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.61 | bwd_microstep: 1464.44 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-10 23:00:16,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1249.06 | bwd_inner_microstep: 1249.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2202 [2024-06-10 23:00:18,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.18 | bwd_microstep: 987.66 | bwd_inner_microstep: 987.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-10 23:00:19,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.18 | bwd_microstep: 803.14 | bwd_inner_microstep: 803.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-10 23:00:21,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.14 | bwd_microstep: 1295.29 | bwd_inner_microstep: 1295.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506 [2024-06-10 23:00:22,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1347.46 | bwd_inner_microstep: 1347.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-10 23:00:24,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1350.63 | bwd_inner_microstep: 1350.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-10 23:00:26,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1390.58 | bwd_inner_microstep: 1390.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3529 [2024-06-10 23:00:28,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1553.54 | bwd_inner_microstep: 1553.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533 [2024-06-10 23:00:30,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1518.50 | bwd_inner_microstep: 1518.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3537 [2024-06-10 23:00:33,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.72 | bwd_microstep: 1535.08 | bwd_inner_microstep: 1535.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3465 [2024-06-10 23:00:34,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.63 | bwd_microstep: 1244.41 | bwd_inner_microstep: 1244.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819 [2024-06-10 23:00:36,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.37 | bwd_microstep: 1511.94 | bwd_inner_microstep: 1511.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3832 [2024-06-10 23:00:39,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.33 | bwd_microstep: 1584.69 | bwd_inner_microstep: 1584.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 23:00:40,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1377.20 | bwd_inner_microstep: 1377.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3527 [2024-06-10 23:00:42,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.19 | bwd_microstep: 1356.36 | bwd_inner_microstep: 1356.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-10 23:00:44,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.73 | bwd_microstep: 1558.37 | bwd_inner_microstep: 1558.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-10 23:00:46,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.30 | bwd_microstep: 1272.52 | bwd_inner_microstep: 1272.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-10 23:00:48,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1448.68 | bwd_inner_microstep: 1448.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 23:00:50,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1256.01 | bwd_inner_microstep: 1255.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3769 [2024-06-10 23:00:52,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1474.68 | bwd_inner_microstep: 1474.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-10 23:00:54,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1372.19 | bwd_inner_microstep: 1372.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3382 [2024-06-10 23:00:56,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.54 | bwd_microstep: 1271.87 | bwd_inner_microstep: 1271.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3809 [2024-06-10 23:00:58,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.41 | bwd_microstep: 1478.64 | bwd_inner_microstep: 1478.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-10 23:01:01,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.03 | optimizer_step: 6.59 [2024-06-10 23:01:01,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 2449.36 | bwd_inner_microstep: 1734.86 | bwd_allreduce_microstep: 714.46 | step_microstep: 37.58 [2024-06-10 23:01:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16218.18 | bwd: 44171.64 | bwd_inner: 43456.18 | bwd_allreduce: 714.73 | step: 39.10 {'loss': 1.1778, 'learning_rate': 6.1934815546044765e-06, 'epoch': 0.75} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-10 23:01:03,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.20 | bwd_microstep: 1252.98 | bwd_inner_microstep: 1252.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893 [2024-06-10 23:01:05,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1680.71 | bwd_inner_microstep: 1680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-10 23:01:07,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1300.95 | bwd_inner_microstep: 1300.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4138 [2024-06-10 23:01:09,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.38 | bwd_microstep: 1442.47 | bwd_inner_microstep: 1442.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3469 [2024-06-10 23:01:11,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.72 | bwd_microstep: 1329.87 | bwd_inner_microstep: 1329.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-10 23:01:12,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.90 | bwd_microstep: 1437.50 | bwd_inner_microstep: 1437.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-10 23:01:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.06 | bwd_microstep: 678.73 | bwd_inner_microstep: 678.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-10 23:01:15,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.55 | bwd_microstep: 1148.16 | bwd_inner_microstep: 1148.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-10 23:01:17,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.33 | bwd_microstep: 1383.90 | bwd_inner_microstep: 1383.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3695 [2024-06-10 23:01:19,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.56 | bwd_microstep: 1624.56 | bwd_inner_microstep: 1624.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-10 23:01:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.20 | bwd_microstep: 1282.71 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2054 [2024-06-10 23:01:22,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.16 | bwd_microstep: 864.09 | bwd_inner_microstep: 864.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1964 [2024-06-10 23:01:23,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.60 | bwd_microstep: 826.68 | bwd_inner_microstep: 826.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-10 23:01:26,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.23 | bwd_microstep: 1719.51 | bwd_inner_microstep: 1719.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-10 23:01:28,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.86 | bwd_microstep: 1349.75 | bwd_inner_microstep: 1349.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-10 23:01:29,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 1379.95 | bwd_inner_microstep: 1379.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3516 [2024-06-10 23:01:31,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1318.46 | bwd_inner_microstep: 1318.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-10 23:01:33,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.07 | bwd_microstep: 1518.51 | bwd_inner_microstep: 1518.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3467 [2024-06-10 23:01:35,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.34 | bwd_microstep: 1455.45 | bwd_inner_microstep: 1455.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3706 [2024-06-10 23:01:37,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.09 | bwd_microstep: 1493.51 | bwd_inner_microstep: 1493.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513 [2024-06-10 23:01:39,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1505.03 | bwd_inner_microstep: 1505.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3614 [2024-06-10 23:01:41,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1339.47 | bwd_inner_microstep: 1339.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-10 23:01:43,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1493.70 | bwd_inner_microstep: 1493.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-10 23:01:45,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.18 | bwd_microstep: 1429.16 | bwd_inner_microstep: 1429.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-10 23:01:47,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1535.14 | bwd_inner_microstep: 1535.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-10 23:01:49,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.88 | bwd_microstep: 1416.28 | bwd_inner_microstep: 1416.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599 [2024-06-10 23:01:52,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1606.31 | bwd_inner_microstep: 1606.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-10 23:01:54,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 1409.33 | bwd_inner_microstep: 1409.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-10 23:01:56,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1612.48 | bwd_inner_microstep: 1612.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3453 [2024-06-10 23:01:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.32 | bwd_microstep: 1188.81 | bwd_inner_microstep: 1188.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3680 [2024-06-10 23:01:59,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.63 | bwd_microstep: 1306.78 | bwd_inner_microstep: 1306.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-10 23:02:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.04 | optimizer_step: 6.61 [2024-06-10 23:02:02,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.55 | bwd_microstep: 1876.16 | bwd_inner_microstep: 1588.85 | bwd_allreduce_microstep: 287.27 | step_microstep: 37.49 [2024-06-10 23:02:02,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16392.42 | bwd: 44207.11 | bwd_inner: 43918.95 | bwd_allreduce: 287.50 | step: 38.93 {'loss': 1.2053, 'learning_rate': 6.1663501172088726e-06, 'epoch': 0.75} :35<7:25:35, 61.46s/it] 75%|███████▍ | 1292/1726 [22:20:37<7:24:35, 61.46s/it] 75%|███████▍ | 1292/1726 [22:20:37<7:24:35, 61.46s/it] 75%|███████▍ | 1293/1726 [22:21:37<7:20:05, 60.98s/it] 75%|███████▍ | 1293/1726 [22:21:37<7:20:05, 60.98s/it] 75%|███████▍ | 1294/1726 [22:22:37<7:16:59, 60.69s/it] 75%|███████▍ | 1294/1726 [22:22:37<7:16:59, 60.69s/it] 75%|███████▌ | 1295/1726 [22:23:38<7:16:03, 60.70s/it] 75%|███████▌ | 1295/1726 [22:23:38<7:16:03, 60.70s/it] 75%|███████▌ | 1296/1726 [22:24:38<7:15:32, 60.77s/it] 75%|███████▌ | 1296/1726 [22:24:38> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400 [INFO|configuration_utils.py:473] 2024-06-11 00:51:26,030 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/config.json [INFO|configuration_utils.py:594] 2024-06-11 00:51:26,033 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-11 00:51:34,047 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-11 00:51:34,065 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-11 00:51:34,067 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-11 00:51:34,068 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/added_tokens.json [2024-06-11 00:51:34,477] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1400 is about to be saved! [2024-06-11 00:51:34,488] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt [2024-06-11 00:51:34,488] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt... [2024-06-11 00:51:43,074] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/mp_rank_00_model_states.pt. [2024-06-11 00:51:43,086] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-11 00:51:54,643] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-11 00:51:54,656] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1400/global_step1400/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-11 00:51:54,656] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1400 is ready now! [INFO|trainer.py:3028] 2024-06-11 00:51:54,881 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-800] due to args.save_total_limit dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 00:51:57,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.51 | bwd_microstep: 1269.01 | bwd_inner_microstep: 1268.91 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 00:51:59,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.75 | bwd_microstep: 1372.14 | bwd_inner_microstep: 1372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2643 [2024-06-11 00:52:00,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.94 | bwd_microstep: 1109.36 | bwd_inner_microstep: 1109.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862 [2024-06-11 00:52:02,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1460.57 | bwd_inner_microstep: 1460.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 00:52:04,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1243.47 | bwd_inner_microstep: 1243.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-11 00:52:05,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.71 | bwd_microstep: 790.96 | bwd_inner_microstep: 790.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4134 [2024-06-11 00:52:07,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.20 | bwd_microstep: 1736.00 | bwd_inner_microstep: 1735.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-11 00:52:10,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.43 | bwd_microstep: 1621.02 | bwd_inner_microstep: 1621.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-11 00:52:17,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.31 | bwd_microstep: 1271.62 | bwd_inner_microstep: 1271.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2907 [2024-06-11 00:52:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.15 | bwd_microstep: 1149.51 | bwd_inner_microstep: 1149.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 00:52:28,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.89 | bwd_microstep: 1276.07 | bwd_inner_microstep: 1276.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 00:52:30,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.90 | bwd_microstep: 1473.67 | bwd_inner_microstep: 1473.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-11 00:52:32,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1572.70 | bwd_inner_microstep: 1572.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3668 [2024-06-11 00:52:41,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.55 | bwd_microstep: 1604.02 | bwd_inner_microstep: 1603.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 00:52:42,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.77 | bwd_microstep: 1381.12 | bwd_inner_microstep: 1381.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 00:52:54,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.17 | bwd_microstep: 1391.82 | bwd_inner_microstep: 1391.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-11 00:52:56,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.93 | bwd_microstep: 1438.47 | bwd_inner_microstep: 1438.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203 [2024-06-11 00:52:57,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.25 | bwd_microstep: 955.75 | bwd_inner_microstep: 955.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2092 [2024-06-11 00:52:58,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.00 | bwd_microstep: 792.01 | bwd_inner_microstep: 791.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 00:53:00,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1396.14 | bwd_inner_microstep: 1396.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3424 [2024-06-11 00:53:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.67 | bwd_microstep: 1276.87 | bwd_inner_microstep: 1276.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719 [2024-06-11 00:53:04,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.33 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-11 00:53:06,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1535.11 | bwd_inner_microstep: 1535.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914 [2024-06-11 00:53:07,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.93 | bwd_microstep: 692.57 | bwd_inner_microstep: 692.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-11 00:53:09,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.77 | bwd_microstep: 1159.89 | bwd_inner_microstep: 1159.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-11 00:53:10,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1404.59 | bwd_inner_microstep: 1404.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3609 [2024-06-11 00:53:13,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.86 | bwd_microstep: 1582.54 | bwd_inner_microstep: 1582.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-11 00:53:14,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1346.09 | bwd_inner_microstep: 1345.99 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.23 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3780 [2024-06-11 00:53:17,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.88 | bwd_microstep: 1693.21 | bwd_inner_microstep: 1693.09 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-11 00:53:19,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.68 | bwd_microstep: 1477.30 | bwd_inner_microstep: 1477.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-11 00:53:21,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.92 | bwd_microstep: 1408.42 | bwd_inner_microstep: 1408.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568 [2024-06-11 00:53:33,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.31 | optimizer_step: 6.62 [2024-06-11 00:53:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.66 | bwd_microstep: 11112.92 | bwd_inner_microstep: 1765.14 | bwd_allreduce_microstep: 9347.70 | step_microstep: 40.61 [2024-06-11 00:53:33,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16136.22 | bwd: 52333.53 | bwd_inner: 42984.62 | bwd_allreduce: 9348.10 | step: 42.65 {'loss': 1.1706, 'learning_rate': 3.6062020302243196e-06, 'epoch': 0.81} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 00:53:35,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.87 | bwd_microstep: 1469.77 | bwd_inner_microstep: 1469.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3936 [2024-06-11 00:53:37,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1420.89 | bwd_inner_microstep: 1420.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-11 00:53:39,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1448.38 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 00:53:40,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1277.73 | bwd_inner_microstep: 1277.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3742 [2024-06-11 00:53:42,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.49 | bwd_microstep: 1528.90 | bwd_inner_microstep: 1528.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-11 00:53:45,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.15 | bwd_microstep: 1636.04 | bwd_inner_microstep: 1636.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3738 [2024-06-11 00:53:47,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.95 | bwd_microstep: 1434.39 | bwd_inner_microstep: 1434.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 00:53:48,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.60 | bwd_microstep: 1242.89 | bwd_inner_microstep: 1242.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952 [2024-06-11 00:53:49,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.92 | bwd_microstep: 730.80 | bwd_inner_microstep: 730.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886 [2024-06-11 00:53:50,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.30 | bwd_microstep: 685.33 | bwd_inner_microstep: 685.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2177 [2024-06-11 00:53:52,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.92 | bwd_microstep: 917.67 | bwd_inner_microstep: 917.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970 [2024-06-11 00:53:53,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.95 | bwd_microstep: 888.86 | bwd_inner_microstep: 888.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3418 [2024-06-11 00:53:55,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 1311.84 | bwd_inner_microstep: 1311.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-11 00:53:57,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1433.85 | bwd_inner_microstep: 1433.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 00:53:59,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.80 | bwd_microstep: 1384.92 | bwd_inner_microstep: 1384.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-11 00:54:01,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1503.37 | bwd_inner_microstep: 1503.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 00:54:03,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1451.90 | bwd_inner_microstep: 1451.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3606 [2024-06-11 00:54:05,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.98 | bwd_microstep: 1457.71 | bwd_inner_microstep: 1457.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3825 [2024-06-11 00:54:07,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.76 | bwd_microstep: 1490.88 | bwd_inner_microstep: 1490.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3575 [2024-06-11 00:54:09,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.85 | bwd_microstep: 1630.20 | bwd_inner_microstep: 1630.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3643 [2024-06-11 00:54:11,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.88 | bwd_microstep: 1318.16 | bwd_inner_microstep: 1318.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3624 [2024-06-11 00:54:13,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.07 | bwd_microstep: 1442.77 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3593 [2024-06-11 00:54:15,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1466.41 | bwd_inner_microstep: 1466.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997 [2024-06-11 00:54:16,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.01 | bwd_microstep: 736.41 | bwd_inner_microstep: 736.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2066 [2024-06-11 00:54:17,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.89 | bwd_microstep: 815.79 | bwd_inner_microstep: 815.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 00:54:19,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1256.32 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3566 [2024-06-11 00:54:21,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.25 | bwd_microstep: 1562.70 | bwd_inner_microstep: 1562.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-11 00:54:23,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.42 | bwd_microstep: 1357.03 | bwd_inner_microstep: 1357.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3752 [2024-06-11 00:54:25,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1341.72 | bwd_inner_microstep: 1341.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-11 00:54:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.31 | bwd_microstep: 1438.49 | bwd_inner_microstep: 1438.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-11 00:54:28,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3591 [2024-06-11 00:54:39,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-11 00:54:39,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 10357.43 | bwd_inner_microstep: 1766.42 | bwd_allreduce_microstep: 8590.95 | step_microstep: 39.11 [2024-06-11 00:54:39,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15738.70 | bwd: 50832.70 | bwd_inner: 42240.80 | bwd_allreduce: 8591.18 | step: 40.95 {'loss': 1.1724, 'learning_rate': 3.584731175854479e-06, 'epoch': 0.81} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-11 00:54:41,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1301.77 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 00:54:43,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1372.96 | bwd_inner_microstep: 1372.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3946 [2024-06-11 00:54:46,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.47 | bwd_microstep: 1687.51 | bwd_inner_microstep: 1687.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3912 [2024-06-11 00:54:47,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-11 00:54:49,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.13 | bwd_microstep: 1215.56 | bwd_inner_microstep: 1215.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3743 [2024-06-11 00:54:51,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743 [2024-06-11 00:54:53,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.28 | bwd_microstep: 1430.56 | bwd_inner_microstep: 1430.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-11 00:54:55,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.47 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 00:54:57,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.59 | bwd_microstep: 1285.88 | bwd_inner_microstep: 1285.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886 [2024-06-11 00:54:58,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.78 | bwd_microstep: 711.40 | bwd_inner_microstep: 711.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 00:54:59,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1285.79 | bwd_inner_microstep: 1285.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670 [2024-06-11 00:55:01,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1455.59 | bwd_inner_microstep: 1455.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-11 00:55:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.70 | bwd_microstep: 1282.37 | bwd_inner_microstep: 1282.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2664 [2024-06-11 00:55:05,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.81 | bwd_microstep: 1216.10 | bwd_inner_microstep: 1216.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-11 00:55:07,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.64 | bwd_microstep: 1529.41 | bwd_inner_microstep: 1529.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 00:55:09,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.83 | bwd_microstep: 1561.78 | bwd_inner_microstep: 1561.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2017 [2024-06-11 00:55:10,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.20 | bwd_microstep: 743.44 | bwd_inner_microstep: 743.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 00:55:12,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1382.67 | bwd_inner_microstep: 1382.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-11 00:55:14,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1441.11 | bwd_inner_microstep: 1441.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-11 00:55:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.71 | bwd_microstep: 1327.54 | bwd_inner_microstep: 1327.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-11 00:55:18,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1391.45 | bwd_inner_microstep: 1391.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2104 [2024-06-11 00:55:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.60 | bwd_microstep: 921.13 | bwd_inner_microstep: 921.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543 [2024-06-11 00:55:21,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.89 | bwd_microstep: 1328.01 | bwd_inner_microstep: 1327.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-11 00:55:23,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.06 | bwd_microstep: 1288.04 | bwd_inner_microstep: 1288.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480 [2024-06-11 00:55:25,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.39 | bwd_microstep: 1442.72 | bwd_inner_microstep: 1442.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3638 [2024-06-11 00:55:27,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.31 | bwd_microstep: 1657.47 | bwd_inner_microstep: 1657.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811 [2024-06-11 00:55:29,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1355.76 | bwd_inner_microstep: 1355.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561 [2024-06-11 00:55:31,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.60 | bwd_microstep: 1586.24 | bwd_inner_microstep: 1586.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-11 00:55:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1559.06 | bwd_inner_microstep: 1559.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2028 [2024-06-11 00:55:35,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.92 | bwd_microstep: 999.08 | bwd_inner_microstep: 999.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 00:55:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.13 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1542.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-11 00:55:41,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.63 [2024-06-11 00:55:41,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.55 | bwd_microstep: 3401.79 | bwd_inner_microstep: 1753.90 | bwd_allreduce_microstep: 1647.83 | step_microstep: 38.96 [2024-06-11 00:55:41,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16079.26 | bwd: 44796.47 | bwd_inner: 43147.73 | bwd_allreduce: 1648.06 | step: 40.43 {'loss': 1.1564, 'learning_rate': 3.5633181359760925e-06, 'epoch': 0.81} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 00:55:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.03 | bwd_microstep: 787.02 | bwd_inner_microstep: 786.89 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848 [2024-06-11 00:55:44,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.93 | bwd_microstep: 1559.49 | bwd_inner_microstep: 1559.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3859 [2024-06-11 00:55:46,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.73 | bwd_microstep: 1364.75 | bwd_inner_microstep: 1364.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 00:55:48,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1247.36 | bwd_inner_microstep: 1247.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 00:55:49,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1353.03 | bwd_inner_microstep: 1353.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-11 00:55:52,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1648.31 | bwd_inner_microstep: 1648.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3749 [2024-06-11 00:55:54,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.71 | bwd_microstep: 1437.01 | bwd_inner_microstep: 1436.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-11 00:55:56,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1442.11 | bwd_inner_microstep: 1442.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 00:55:57,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1288.67 | bwd_inner_microstep: 1288.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3993 [2024-06-11 00:56:00,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.42 | bwd_microstep: 1574.35 | bwd_inner_microstep: 1574.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 00:56:01,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.16 | bwd_microstep: 1285.85 | bwd_inner_microstep: 1285.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3415 [2024-06-11 00:56:03,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.60 | bwd_microstep: 1213.52 | bwd_inner_microstep: 1213.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-11 00:56:05,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.56 | bwd_microstep: 1537.07 | bwd_inner_microstep: 1537.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3517 [2024-06-11 00:56:07,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1517.16 | bwd_inner_microstep: 1517.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 00:56:09,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3647 [2024-06-11 00:56:12,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.11 | bwd_microstep: 1711.06 | bwd_inner_microstep: 1711.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-11 00:56:14,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.37 | bwd_microstep: 1599.07 | bwd_inner_microstep: 1599.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-11 00:56:16,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.97 | bwd_microstep: 1214.05 | bwd_inner_microstep: 1214.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-11 00:56:17,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.25 | bwd_microstep: 799.03 | bwd_inner_microstep: 799.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 00:56:18,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.92 | bwd_microstep: 974.22 | bwd_inner_microstep: 974.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654 [2024-06-11 00:56:20,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1519.55 | bwd_inner_microstep: 1519.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 00:56:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.50 | bwd_microstep: 1554.43 | bwd_inner_microstep: 1554.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 00:56:24,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1380.70 | bwd_inner_microstep: 1380.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 00:56:26,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1256.67 | bwd_inner_microstep: 1256.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 00:56:28,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.66 | bwd_microstep: 1412.20 | bwd_inner_microstep: 1412.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588 [2024-06-11 00:56:30,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1336.99 | bwd_inner_microstep: 1336.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 00:56:32,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1401.19 | bwd_inner_microstep: 1401.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-11 00:56:34,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1391.78 | bwd_inner_microstep: 1391.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416 [2024-06-11 00:56:36,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1441.69 | bwd_inner_microstep: 1441.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-11 00:56:38,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.64 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771 [2024-06-11 00:56:40,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.59 | bwd_microstep: 1447.25 | bwd_inner_microstep: 1447.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-11 00:56:42,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.26 | optimizer_step: 6.61 [2024-06-11 00:56:42,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.17 | bwd_microstep: 2112.15 | bwd_inner_microstep: 1721.29 | bwd_allreduce_microstep: 390.80 | step_microstep: 38.65 [2024-06-11 00:56:42,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16518.43 | bwd: 44724.51 | bwd_inner: 44332.71 | bwd_allreduce: 391.08 | step: 40.31 {'loss': 1.2194, 'learning_rate': 3.5419629860057915e-06, 'epoch': 0.81} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 00:56:44,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.18 | bwd_microstep: 1273.98 | bwd_inner_microstep: 1273.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4012 [2024-06-11 00:56:46,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1513.69 | bwd_inner_microstep: 1513.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-11 00:56:48,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1449.07 | bwd_inner_microstep: 1449.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 00:56:50,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1899 [2024-06-11 00:56:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.50 | bwd_microstep: 777.19 | bwd_inner_microstep: 777.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3495 [2024-06-11 00:56:53,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.67 | bwd_microstep: 1221.33 | bwd_inner_microstep: 1221.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3695 [2024-06-11 00:56:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.72 | bwd_microstep: 1590.33 | bwd_inner_microstep: 1590.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-11 00:56:57,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-11 00:56:58,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.90 | bwd_microstep: 678.27 | bwd_inner_microstep: 678.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-11 00:56:59,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.12 | bwd_microstep: 812.42 | bwd_inner_microstep: 812.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505 [2024-06-11 00:57:01,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.76 | bwd_microstep: 1315.89 | bwd_inner_microstep: 1315.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3479 [2024-06-11 00:57:03,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1411.99 | bwd_inner_microstep: 1411.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 00:57:05,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.52 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2185 [2024-06-11 00:57:06,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.58 | bwd_microstep: 858.41 | bwd_inner_microstep: 858.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2153 [2024-06-11 00:57:07,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.40 | bwd_microstep: 1045.79 | bwd_inner_microstep: 1045.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-11 00:57:09,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.15 | bwd_microstep: 1279.49 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109 [2024-06-11 00:57:10,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.67 | bwd_microstep: 918.72 | bwd_inner_microstep: 918.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-11 00:57:12,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1493.05 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3150 [2024-06-11 00:57:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1350.44 | bwd_inner_microstep: 1350.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-11 00:57:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.75 | bwd_microstep: 1314.93 | bwd_inner_microstep: 1314.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3618 [2024-06-11 00:57:18,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.50 | bwd_microstep: 1216.35 | bwd_inner_microstep: 1216.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 00:57:20,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1659.66 | bwd_inner_microstep: 1659.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 00:57:22,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.51 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 00:57:24,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1351.67 | bwd_inner_microstep: 1351.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3867 [2024-06-11 00:57:26,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.39 | bwd_microstep: 1679.29 | bwd_inner_microstep: 1679.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753 [2024-06-11 00:57:28,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.46 | bwd_microstep: 1475.09 | bwd_inner_microstep: 1475.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3598 [2024-06-11 00:57:30,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.19 | bwd_microstep: 1674.54 | bwd_inner_microstep: 1674.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-11 00:57:32,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.46 | bwd_microstep: 1398.80 | bwd_inner_microstep: 1398.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599 [2024-06-11 00:57:35,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.20 | bwd_microstep: 1708.58 | bwd_inner_microstep: 1708.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2958 [2024-06-11 00:57:36,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.57 | bwd_microstep: 1101.90 | bwd_inner_microstep: 1101.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-11 00:57:39,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.20 | bwd_microstep: 1606.70 | bwd_inner_microstep: 1606.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760 [2024-06-11 00:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.61 [2024-06-11 00:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.91 | bwd_microstep: 5307.76 | bwd_inner_microstep: 1857.21 | bwd_allreduce_microstep: 3450.49 | step_microstep: 39.95 [2024-06-11 00:57:44,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15822.06 | bwd: 46007.52 | bwd_inner: 42556.11 | bwd_allreduce: 3450.73 | step: 41.58 {'loss': 1.1881, 'learning_rate': 3.520665801156289e-06, 'epoch': 0.81} 81%|████████ | 1401/1726 [24:16:09<7:23:07, 81.81s/it] 81%|████████ | 1401/1726 [24:16:09<7:23:07, 81.81s/it] 81%|████████ | 1402/1726 [24:17:16<6:57:38, 77.34s/it] 81%|████████ | 1402/1726 [24:17:16<6:57:38, 77.34s/it] 81%|████████▏ | 1403/1726 [24:18:17<6:30:18, 72.50s/it] 81%|████████▏ | 1403/1726 [24:18:17<6:30:18, 72.50s/it] 81%|████████▏ | 1404/1726 [24:19:19<6:11:32, 69.23s/it] 81%|████████▏ | 1404/1726 [24:19:19<6:11:32, 69.23s/it] 81%|████████▏ | 1405/1726 [24:20:21<5:59:04, 67.12s/it] 81%|████████▏ | 1405/1726 [24:20:21<5:59:04, 67dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-11 00:57:46,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.67 | bwd_microstep: 1334.11 | bwd_inner_microstep: 1333.71 | bwd_allreduce_microstep: 0.26 | step_microstep: 0.35 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4384 [2024-06-11 00:57:49,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.40 | bwd_microstep: 1705.87 | bwd_inner_microstep: 1705.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 00:57:51,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1480.16 | bwd_inner_microstep: 1480.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3849 [2024-06-11 00:57:53,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.64 | bwd_microstep: 1555.19 | bwd_inner_microstep: 1555.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3795 [2024-06-11 00:57:55,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.54 | bwd_microstep: 1450.60 | bwd_inner_microstep: 1450.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-11 00:57:56,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.84 | bwd_microstep: 788.22 | bwd_inner_microstep: 788.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4029 [2024-06-11 00:57:58,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1451.29 | bwd_inner_microstep: 1451.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-11 00:57:59,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.61 | bwd_microstep: 790.40 | bwd_inner_microstep: 790.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1895 [2024-06-11 00:58:00,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.88 | bwd_microstep: 716.49 | bwd_inner_microstep: 716.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3405 [2024-06-11 00:58:02,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1440.89 | bwd_inner_microstep: 1440.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1940 [2024-06-11 00:58:03,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.51 | bwd_microstep: 823.74 | bwd_inner_microstep: 823.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3410 [2024-06-11 00:58:05,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.10 | bwd_microstep: 1404.37 | bwd_inner_microstep: 1404.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3488 [2024-06-11 00:58:07,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1500.08 | bwd_inner_microstep: 1500.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505 [2024-06-11 00:58:09,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1578.33 | bwd_inner_microstep: 1578.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-11 00:58:11,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.52 | bwd_microstep: 1509.86 | bwd_inner_microstep: 1509.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2298 [2024-06-11 00:58:13,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.26 | bwd_microstep: 939.48 | bwd_inner_microstep: 939.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3578 [2024-06-11 00:58:15,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.08 | bwd_microstep: 1532.42 | bwd_inner_microstep: 1532.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 00:58:17,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.75 | bwd_microstep: 1292.43 | bwd_inner_microstep: 1292.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2117 [2024-06-11 00:58:18,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.30 | bwd_microstep: 830.01 | bwd_inner_microstep: 829.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2055 [2024-06-11 00:58:19,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.46 | bwd_microstep: 911.64 | bwd_inner_microstep: 911.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 00:58:21,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.68 | bwd_microstep: 1560.33 | bwd_inner_microstep: 1560.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 00:58:23,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1562.16 | bwd_inner_microstep: 1562.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 00:58:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.78 | bwd_microstep: 1296.75 | bwd_inner_microstep: 1296.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486 [2024-06-11 00:58:27,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.52 | bwd_microstep: 1192.07 | bwd_inner_microstep: 1192.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2163 [2024-06-11 00:58:28,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.93 | bwd_microstep: 793.36 | bwd_inner_microstep: 793.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2585 [2024-06-11 00:58:29,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.43 | bwd_microstep: 1073.09 | bwd_inner_microstep: 1073.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2285 [2024-06-11 00:58:31,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731 [2024-06-11 00:58:33,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.45 | bwd_microstep: 1630.60 | bwd_inner_microstep: 1630.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-11 00:58:35,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1451.53 | bwd_inner_microstep: 1451.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3665 [2024-06-11 00:58:37,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.70 | bwd_microstep: 1263.74 | bwd_inner_microstep: 1263.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3769 [2024-06-11 00:58:39,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.96 | bwd_microstep: 1313.13 | bwd_inner_microstep: 1313.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 00:58:44,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.32 | optimizer_step: 6.59 [2024-06-11 00:58:44,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.71 | bwd_microstep: 4532.42 | bwd_inner_microstep: 1552.23 | bwd_allreduce_microstep: 2980.11 | step_microstep: 39.85 [2024-06-11 00:58:44,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15171.49 | bwd: 43645.32 | bwd_inner: 40663.92 | bwd_allreduce: 2980.64 | step: 41.93 {'loss': 1.1458, 'learning_rate': 3.4994266564361733e-06, 'epoch': 0.81} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1931 [2024-06-11 00:58:45,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.97 | bwd_microstep: 885.00 | bwd_inner_microstep: 884.87 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 00:58:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.14 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 00:58:48,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.61 | bwd_microstep: 1271.46 | bwd_inner_microstep: 1271.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3858 [2024-06-11 00:58:51,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.72 | bwd_microstep: 1524.28 | bwd_inner_microstep: 1524.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472 [2024-06-11 00:58:52,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.22 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 00:58:54,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.80 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 00:58:56,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.71 | bwd_microstep: 1302.96 | bwd_inner_microstep: 1302.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-11 00:58:58,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.67 | bwd_microstep: 1538.83 | bwd_inner_microstep: 1538.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 00:59:00,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.22 | bwd_microstep: 1284.13 | bwd_inner_microstep: 1284.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129 [2024-06-11 00:59:01,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.86 | bwd_microstep: 922.45 | bwd_inner_microstep: 922.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1873 [2024-06-11 00:59:02,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.33 | bwd_microstep: 678.98 | bwd_inner_microstep: 678.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 00:59:04,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.78 | bwd_microstep: 1389.45 | bwd_inner_microstep: 1389.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3940 [2024-06-11 00:59:06,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1495.15 | bwd_inner_microstep: 1495.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3650 [2024-06-11 00:59:08,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.46 | bwd_microstep: 1665.00 | bwd_inner_microstep: 1664.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-11 00:59:10,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.90 | bwd_microstep: 1451.33 | bwd_inner_microstep: 1451.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3020 [2024-06-11 00:59:12,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.16 | bwd_microstep: 1230.27 | bwd_inner_microstep: 1230.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2235 [2024-06-11 00:59:13,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.71 | bwd_microstep: 868.60 | bwd_inner_microstep: 868.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-11 00:59:15,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.16 | bwd_microstep: 1398.21 | bwd_inner_microstep: 1398.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3606 [2024-06-11 00:59:17,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.15 | bwd_microstep: 1431.83 | bwd_inner_microstep: 1431.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-11 00:59:18,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.09 | bwd_microstep: 797.22 | bwd_inner_microstep: 797.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 614 [2024-06-11 00:59:19,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.45 | bwd_microstep: 261.72 | bwd_inner_microstep: 261.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-11 00:59:20,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1189.25 | bwd_inner_microstep: 1189.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1987 [2024-06-11 00:59:22,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.52 | bwd_microstep: 865.16 | bwd_inner_microstep: 865.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1972 [2024-06-11 00:59:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.45 | bwd_microstep: 892.29 | bwd_inner_microstep: 892.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1993 [2024-06-11 00:59:24,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.34 | bwd_microstep: 898.99 | bwd_inner_microstep: 898.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-11 00:59:26,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.98 | bwd_microstep: 1439.84 | bwd_inner_microstep: 1439.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3430 [2024-06-11 00:59:28,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.67 | bwd_microstep: 1281.25 | bwd_inner_microstep: 1281.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-11 00:59:30,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.03 | bwd_microstep: 1647.48 | bwd_inner_microstep: 1647.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3741 [2024-06-11 00:59:32,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1368.58 | bwd_inner_microstep: 1368.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 00:59:34,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1289.80 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-11 00:59:36,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.98 | bwd_microstep: 1497.99 | bwd_inner_microstep: 1497.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-11 00:59:45,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.62 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-11 00:59:45,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.60 | bwd_microstep: 8815.55 | bwd_inner_microstep: 1702.31 | bwd_allreduce_microstep: 7113.19 | step_microstep: 40.82 [2024-06-11 00:59:45,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14705.85 | bwd: 46489.09 | bwd_inner: 39374.90 | bwd_allreduce: 7113.48 | step: 42.39 {'loss': 1.2046, 'learning_rate': 3.478245626649597e-06, 'epoch': 0.82} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469 [2024-06-11 00:59:47,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.47 | bwd_microstep: 1530.47 | bwd_inner_microstep: 1530.27 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.17 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-11 00:59:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1338.04 | bwd_inner_microstep: 1338.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3902 [2024-06-11 00:59:51,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.68 | bwd_microstep: 1679.58 | bwd_inner_microstep: 1679.50 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.16 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-11 00:59:53,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.72 | bwd_microstep: 1296.45 | bwd_inner_microstep: 1296.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-11 00:59:55,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.08 | bwd_microstep: 1532.88 | bwd_inner_microstep: 1532.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948 [2024-06-11 00:59:56,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.86 | bwd_microstep: 793.04 | bwd_inner_microstep: 793.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-11 00:59:58,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.76 | bwd_microstep: 1338.90 | bwd_inner_microstep: 1338.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3609 [2024-06-11 01:00:00,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.25 | bwd_microstep: 1312.34 | bwd_inner_microstep: 1312.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-11 01:00:01,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.35 | bwd_microstep: 797.76 | bwd_inner_microstep: 797.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3420 [2024-06-11 01:00:03,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.12 | bwd_microstep: 1279.04 | bwd_inner_microstep: 1279.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-11 01:00:05,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1420.87 | bwd_inner_microstep: 1420.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1976 [2024-06-11 01:00:06,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.15 | bwd_microstep: 829.03 | bwd_inner_microstep: 829.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-11 01:00:08,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1158.36 | bwd_inner_microstep: 1158.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689 [2024-06-11 01:00:10,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.58 | bwd_microstep: 1549.30 | bwd_inner_microstep: 1549.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755 [2024-06-11 01:00:12,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.84 | bwd_microstep: 1628.23 | bwd_inner_microstep: 1628.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3484 [2024-06-11 01:00:14,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.33 | bwd_microstep: 1549.28 | bwd_inner_microstep: 1549.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-11 01:00:16,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.55 | bwd_microstep: 1615.98 | bwd_inner_microstep: 1615.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647 [2024-06-11 01:00:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1605.39 | bwd_inner_microstep: 1605.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 01:00:20,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1255.40 | bwd_inner_microstep: 1255.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 01:00:22,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.25 | bwd_microstep: 1404.85 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2084 [2024-06-11 01:00:23,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.79 | bwd_microstep: 821.27 | bwd_inner_microstep: 821.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3604 [2024-06-11 01:00:25,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1309.26 | bwd_inner_microstep: 1309.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3821 [2024-06-11 01:00:27,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 1355.45 | bwd_inner_microstep: 1355.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-11 01:00:29,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1610.02 | bwd_inner_microstep: 1609.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2697 [2024-06-11 01:00:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.09 | bwd_microstep: 1129.07 | bwd_inner_microstep: 1129.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-11 01:00:33,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.46 | bwd_microstep: 1404.81 | bwd_inner_microstep: 1404.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3601 [2024-06-11 01:00:35,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.14 | bwd_microstep: 1508.63 | bwd_inner_microstep: 1508.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-11 01:00:37,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-11 01:00:39,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3816 [2024-06-11 01:00:41,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1402.96 | bwd_inner_microstep: 1402.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-11 01:00:43,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3760 [2024-06-11 01:00:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.07 | optimizer_step: 6.59 [2024-06-11 01:00:46,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.02 | bwd_microstep: 2651.72 | bwd_inner_microstep: 1750.37 | bwd_allreduce_microstep: 901.31 | step_microstep: 37.74 [2024-06-11 01:00:46,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16141.43 | bwd: 44357.88 | bwd_inner: 43455.45 | bwd_allreduce: 901.65 | step: 39.50 {'loss': 1.2037, 'learning_rate': 3.457122786396032e-06, 'epoch': 0.82} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1923 [2024-06-11 01:00:47,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.28 | bwd_microstep: 880.09 | bwd_inner_microstep: 879.96 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3953 [2024-06-11 01:00:49,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.95 | bwd_microstep: 1592.25 | bwd_inner_microstep: 1592.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4370 [2024-06-11 01:00:52,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.46 | bwd_microstep: 1541.82 | bwd_inner_microstep: 1541.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 01:00:53,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.95 | bwd_microstep: 1279.31 | bwd_inner_microstep: 1279.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 01:00:55,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.99 | bwd_microstep: 1474.78 | bwd_inner_microstep: 1474.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2444 [2024-06-11 01:00:57,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 949.13 | bwd_inner_microstep: 949.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 01:00:59,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.95 | bwd_microstep: 1383.15 | bwd_inner_microstep: 1383.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 01:01:00,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.44 | bwd_microstep: 1252.89 | bwd_inner_microstep: 1252.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3487 [2024-06-11 01:01:02,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.81 | bwd_microstep: 1437.14 | bwd_inner_microstep: 1437.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1941 [2024-06-11 01:01:04,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.67 | bwd_microstep: 889.43 | bwd_inner_microstep: 889.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-11 01:01:05,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.01 | bwd_microstep: 1346.01 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 01:01:07,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1381.98 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3661 [2024-06-11 01:01:10,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.83 | bwd_microstep: 1715.10 | bwd_inner_microstep: 1715.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-11 01:01:12,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1345.96 | bwd_inner_microstep: 1345.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1943 [2024-06-11 01:01:13,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.76 | bwd_microstep: 729.87 | bwd_inner_microstep: 729.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-11 01:01:14,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.80 | bwd_microstep: 1163.00 | bwd_inner_microstep: 1162.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 619 [2024-06-11 01:01:15,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.22 | bwd_microstep: 261.02 | bwd_inner_microstep: 260.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 01:01:17,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.76 | bwd_microstep: 1557.89 | bwd_inner_microstep: 1557.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 01:01:19,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1560.57 | bwd_inner_microstep: 1560.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2118 [2024-06-11 01:01:20,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.98 | bwd_microstep: 926.47 | bwd_inner_microstep: 926.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 01:01:22,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1294.86 | bwd_inner_microstep: 1294.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-11 01:01:23,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.71 | bwd_microstep: 801.85 | bwd_inner_microstep: 801.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 01:01:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.31 | bwd_microstep: 1557.69 | bwd_inner_microstep: 1557.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 01:01:27,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.00 | bwd_microstep: 1459.74 | bwd_inner_microstep: 1459.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3551 [2024-06-11 01:01:29,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 11, images per sample: 2.75, dynamic token length: 1535 [2024-06-11 01:01:30,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 232.69 | bwd_microstep: 612.84 | bwd_inner_microstep: 612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-11 01:01:33,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.64 | bwd_microstep: 1620.19 | bwd_inner_microstep: 1620.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3552 [2024-06-11 01:01:34,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.12 | bwd_microstep: 1426.91 | bwd_inner_microstep: 1426.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3592 [2024-06-11 01:01:37,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.05 | bwd_microstep: 1650.80 | bwd_inner_microstep: 1650.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-11 01:01:39,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.34 | bwd_microstep: 1499.94 | bwd_inner_microstep: 1499.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-11 01:01:41,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.18 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-11 01:01:46,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.16 | optimizer_step: 6.59 [2024-06-11 01:01:46,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.48 | bwd_microstep: 4826.51 | bwd_inner_microstep: 1534.74 | bwd_allreduce_microstep: 3291.71 | step_microstep: 39.26 [2024-06-11 01:01:46,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15330.18 | bwd: 44407.06 | bwd_inner: 41114.19 | bwd_allreduce: 3292.07 | step: 40.96 {'loss': 1.2128, 'learning_rate': 3.436058210070012e-06, 'epoch': 0.82} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-11 01:01:47,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.64 | bwd_microstep: 788.64 | bwd_inner_microstep: 788.50 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3830 [2024-06-11 01:01:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1481.29 | bwd_inner_microstep: 1481.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1979 [2024-06-11 01:01:50,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.73 | bwd_microstep: 858.17 | bwd_inner_microstep: 858.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-11 01:01:52,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.96 | bwd_microstep: 1294.18 | bwd_inner_microstep: 1294.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3793 [2024-06-11 01:01:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.18 | bwd_microstep: 1352.04 | bwd_inner_microstep: 1352.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 01:01:56,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.49 | bwd_microstep: 1295.87 | bwd_inner_microstep: 1295.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-11 01:01:58,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1428.03 | bwd_inner_microstep: 1428.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3503 [2024-06-11 01:02:00,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.11 | bwd_microstep: 1222.75 | bwd_inner_microstep: 1222.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 01:02:01,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.95 | bwd_microstep: 1288.32 | bwd_inner_microstep: 1288.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 01:02:03,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.32 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 01:02:05,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1354.53 | bwd_inner_microstep: 1354.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-11 01:02:06,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 793.21 | bwd_inner_microstep: 793.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 01:02:08,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.47 | bwd_microstep: 1484.37 | bwd_inner_microstep: 1484.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3908 [2024-06-11 01:02:11,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.33 | bwd_microstep: 1662.48 | bwd_inner_microstep: 1662.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 01:02:13,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.89 | bwd_microstep: 1286.83 | bwd_inner_microstep: 1286.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3823 [2024-06-11 01:02:15,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.10 | bwd_microstep: 1751.74 | bwd_inner_microstep: 1751.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-11 01:02:17,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.75 | bwd_microstep: 1494.57 | bwd_inner_microstep: 1494.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-11 01:02:19,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1288.01 | bwd_inner_microstep: 1287.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1925 [2024-06-11 01:02:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.87 | bwd_microstep: 697.96 | bwd_inner_microstep: 697.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3853 [2024-06-11 01:02:22,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.01 | bwd_microstep: 1668.38 | bwd_inner_microstep: 1668.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3620 [2024-06-11 01:02:24,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.85 | bwd_microstep: 1535.49 | bwd_inner_microstep: 1535.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3486 [2024-06-11 01:02:26,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.20 | bwd_microstep: 1315.06 | bwd_inner_microstep: 1315.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3439 [2024-06-11 01:02:28,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1410.33 | bwd_inner_microstep: 1410.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817 [2024-06-11 01:02:30,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.84 | bwd_microstep: 1507.08 | bwd_inner_microstep: 1507.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 01:02:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.96 | bwd_microstep: 1504.57 | bwd_inner_microstep: 1504.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 01:02:34,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.94 | bwd_microstep: 1655.52 | bwd_inner_microstep: 1655.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-11 01:02:36,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.36 | bwd_microstep: 1310.19 | bwd_inner_microstep: 1310.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-11 01:02:38,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.36 | bwd_microstep: 1300.53 | bwd_inner_microstep: 1300.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 01:02:40,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.86 | bwd_microstep: 1397.91 | bwd_inner_microstep: 1397.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-11 01:02:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.88 | bwd_microstep: 1406.84 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809 [2024-06-11 01:02:44,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.09 | bwd_microstep: 1752.85 | bwd_inner_microstep: 1752.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 01:02:49,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.13 | optimizer_step: 6.61 [2024-06-11 01:02:49,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 3833.66 | bwd_inner_microstep: 1764.67 | bwd_allreduce_microstep: 2068.94 | step_microstep: 39.24 [2024-06-11 01:02:49,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16334.94 | bwd: 45902.42 | bwd_inner: 43832.47 | bwd_allreduce: 2069.22 | step: 40.84 {'loss': 1.1339, 'learning_rate': 3.4150519718608744e-06, 'epoch': 0.82} .12s/it] 81%|████████▏ | 1406/1726 [24:21:20<5:45:15, 64.74s/it] 81%|████████▏ | 1406/1726 [24:21:20<5:45:15, 64.74s/it] 82%|████████▏ | 1407/1726 [24:22:22<5:39:04, 63.78s/it] 82%|████████▏ | 1407/1726 [24:22:22<5:39:04, 63.78s/it] 82%|████████▏ | 1408/1726 [24:23:23<5:33:21, 62.90s/it] 82%|████████▏ | 1408/1726 [24:23:23<5:33:21, 62.90s/it] 82%|████████▏ | 1409/1726 [24:24:23<5:27:51, 62.05s/it] 82%|████████▏ | 1409/1726 [24:24:23<5:27:51, 62.05s/it] 82%|████████▏ | 1410/1726 [24:25:25<5:27:39, 62.21s/it] 82%|████████▏ | 1410/1726 [24:2dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030 [2024-06-11 01:02:50,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.63 | bwd_microstep: 905.53 | bwd_inner_microstep: 905.46 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-11 01:02:51,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 799.69 | bwd_inner_microstep: 799.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 01:02:53,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1482.61 | bwd_inner_microstep: 1482.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-11 01:02:55,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.31 | bwd_microstep: 1645.98 | bwd_inner_microstep: 1645.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885 [2024-06-11 01:02:56,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.88 | bwd_microstep: 715.93 | bwd_inner_microstep: 715.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2467 [2024-06-11 01:02:58,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.28 | bwd_microstep: 858.38 | bwd_inner_microstep: 858.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 01:02:59,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.02 | bwd_microstep: 1303.38 | bwd_inner_microstep: 1303.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 01:03:01,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.38 | bwd_inner_microstep: 1389.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491 [2024-06-11 01:03:03,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-11 01:03:05,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.28 | bwd_microstep: 1499.39 | bwd_inner_microstep: 1499.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 01:03:07,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 01:03:09,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1252.23 | bwd_inner_microstep: 1252.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 01:03:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.38 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-11 01:03:13,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1483.95 | bwd_inner_microstep: 1483.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002 [2024-06-11 01:03:14,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.09 | bwd_microstep: 901.16 | bwd_inner_microstep: 901.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-11 01:03:16,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.21 | bwd_microstep: 1414.30 | bwd_inner_microstep: 1414.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-11 01:03:18,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.22 | bwd_microstep: 1447.34 | bwd_inner_microstep: 1447.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3439 [2024-06-11 01:03:20,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.36 | bwd_microstep: 1216.13 | bwd_inner_microstep: 1216.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 01:03:22,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1279.59 | bwd_inner_microstep: 1279.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 01:03:24,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1384.68 | bwd_inner_microstep: 1384.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2090 [2024-06-11 01:03:25,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.80 | bwd_microstep: 731.46 | bwd_inner_microstep: 731.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2185 [2024-06-11 01:03:26,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.83 | bwd_microstep: 922.93 | bwd_inner_microstep: 922.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-11 01:03:28,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.70 | bwd_microstep: 1220.04 | bwd_inner_microstep: 1220.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-11 01:03:30,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1562.17 | bwd_inner_microstep: 1562.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 01:03:32,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1403.05 | bwd_inner_microstep: 1403.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2078 [2024-06-11 01:03:33,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.23 | bwd_microstep: 727.65 | bwd_inner_microstep: 727.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 01:03:35,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.04 | bwd_microstep: 1415.28 | bwd_inner_microstep: 1415.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574 [2024-06-11 01:03:37,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.70 | bwd_microstep: 1425.33 | bwd_inner_microstep: 1425.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3461 [2024-06-11 01:03:38,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.64 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3584 [2024-06-11 01:03:40,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.69 | bwd_microstep: 1537.95 | bwd_inner_microstep: 1537.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 01:03:43,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.67 | bwd_microstep: 1470.92 | bwd_inner_microstep: 1470.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-11 01:03:52,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.28 | optimizer_step: 6.62 [2024-06-11 01:03:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 8464.59 | bwd_inner_microstep: 1516.64 | bwd_allreduce_microstep: 6947.89 | step_microstep: 39.31 [2024-06-11 01:03:52,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15106.72 | bwd: 47383.53 | bwd_inner: 40434.68 | bwd_allreduce: 6948.15 | step: 40.92 {'loss': 1.1595, 'learning_rate': 3.3941041457524748e-06, 'epoch': 0.82} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515 [2024-06-11 01:03:53,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.42 | bwd_microstep: 1340.70 | bwd_inner_microstep: 1340.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2406 [2024-06-11 01:03:55,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 998.44 | bwd_inner_microstep: 998.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3924 [2024-06-11 01:03:57,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.30 | bwd_microstep: 1686.15 | bwd_inner_microstep: 1686.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-11 01:03:59,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.06 | bwd_microstep: 1646.25 | bwd_inner_microstep: 1646.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-11 01:04:02,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.59 | bwd_microstep: 1651.76 | bwd_inner_microstep: 1651.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2551 [2024-06-11 01:04:03,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.61 | bwd_microstep: 1031.29 | bwd_inner_microstep: 1031.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 01:04:05,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.52 | bwd_microstep: 1291.42 | bwd_inner_microstep: 1291.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-11 01:04:07,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1530.49 | bwd_inner_microstep: 1530.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 01:04:09,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.33 | bwd_microstep: 1382.28 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 01:04:11,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.22 | bwd_microstep: 1246.90 | bwd_inner_microstep: 1246.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 01:04:12,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1287.37 | bwd_inner_microstep: 1287.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2128 [2024-06-11 01:04:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.49 | bwd_microstep: 892.31 | bwd_inner_microstep: 892.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3725 [2024-06-11 01:04:16,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1591.94 | bwd_inner_microstep: 1591.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493 [2024-06-11 01:04:18,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.80 | bwd_microstep: 1409.93 | bwd_inner_microstep: 1409.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-11 01:04:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.21 | bwd_microstep: 892.22 | bwd_inner_microstep: 892.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-11 01:04:21,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1445.59 | bwd_inner_microstep: 1445.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 01:04:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.80 | bwd_microstep: 1451.30 | bwd_inner_microstep: 1451.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-11 01:04:25,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1613.33 | bwd_inner_microstep: 1613.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 01:04:27,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1185.70 | bwd_inner_microstep: 1185.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 01:04:29,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.71 | bwd_inner_microstep: 1280.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 01:04:30,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.41 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983 [2024-06-11 01:04:31,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.82 | bwd_microstep: 705.37 | bwd_inner_microstep: 705.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-11 01:04:33,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1313.38 | bwd_inner_microstep: 1313.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 01:04:35,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.08 | bwd_microstep: 1556.48 | bwd_inner_microstep: 1556.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-11 01:04:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 806.45 | bwd_inner_microstep: 806.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3554 [2024-06-11 01:04:38,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1421.95 | bwd_inner_microstep: 1421.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3562 [2024-06-11 01:04:41,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.35 | bwd_microstep: 1698.81 | bwd_inner_microstep: 1698.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-11 01:04:43,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1654.34 | bwd_inner_microstep: 1654.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539 [2024-06-11 01:04:45,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.63 | bwd_microstep: 1589.94 | bwd_inner_microstep: 1589.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034 [2024-06-11 01:04:46,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.22 | bwd_microstep: 716.21 | bwd_inner_microstep: 716.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806 [2024-06-11 01:04:48,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.60 | bwd_microstep: 1475.86 | bwd_inner_microstep: 1475.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3484 [2024-06-11 01:04:53,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-11 01:04:53,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.72 | bwd_microstep: 3971.47 | bwd_inner_microstep: 1786.97 | bwd_allreduce_microstep: 2184.44 | step_microstep: 39.41 [2024-06-11 01:04:53,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15930.28 | bwd: 45049.21 | bwd_inner: 42863.84 | bwd_allreduce: 2184.67 | step: 40.92 {'loss': 1.1928, 'learning_rate': 3.3732148055229463e-06, 'epoch': 0.82} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-11 01:04:55,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.07 | bwd_microstep: 1339.06 | bwd_inner_microstep: 1338.99 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 01:04:56,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1247.29 | bwd_inner_microstep: 1247.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3909 [2024-06-11 01:04:59,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1487.29 | bwd_inner_microstep: 1487.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 01:05:00,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1383.46 | bwd_inner_microstep: 1383.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 01:05:02,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3807 [2024-06-11 01:05:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.91 | bwd_microstep: 1480.81 | bwd_inner_microstep: 1480.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477 [2024-06-11 01:05:06,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.56 | bwd_microstep: 1184.26 | bwd_inner_microstep: 1184.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-11 01:05:08,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 01:05:10,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1398.90 | bwd_inner_microstep: 1398.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3409 [2024-06-11 01:05:11,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.45 | bwd_microstep: 1211.46 | bwd_inner_microstep: 1211.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3446 [2024-06-11 01:05:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.41 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505 [2024-06-11 01:05:15,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.21 | bwd_microstep: 1582.35 | bwd_inner_microstep: 1582.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3675 [2024-06-11 01:05:18,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.57 | bwd_microstep: 1685.67 | bwd_inner_microstep: 1685.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2415 [2024-06-11 01:05:19,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.37 | bwd_microstep: 823.66 | bwd_inner_microstep: 823.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3640 [2024-06-11 01:05:21,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.47 | bwd_microstep: 1761.47 | bwd_inner_microstep: 1761.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 01:05:23,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1338.04 | bwd_inner_microstep: 1338.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3675 [2024-06-11 01:05:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.02 | bwd_microstep: 1404.32 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1917 [2024-06-11 01:05:26,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.97 | bwd_microstep: 688.41 | bwd_inner_microstep: 688.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3438 [2024-06-11 01:05:28,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.36 | bwd_microstep: 1283.20 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-11 01:05:30,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3832 [2024-06-11 01:05:32,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.88 | bwd_microstep: 1689.12 | bwd_inner_microstep: 1689.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 01:05:34,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.39 | bwd_microstep: 1493.38 | bwd_inner_microstep: 1493.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 01:05:36,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1257.98 | bwd_inner_microstep: 1257.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 01:05:38,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.27 | bwd_microstep: 1389.20 | bwd_inner_microstep: 1389.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-11 01:05:40,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.66 | bwd_microstep: 1460.17 | bwd_inner_microstep: 1460.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-11 01:05:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.10 | bwd_microstep: 1298.97 | bwd_inner_microstep: 1298.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 01:05:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.88 | bwd_microstep: 1352.39 | bwd_inner_microstep: 1352.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 01:05:45,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.35 | bwd_microstep: 1375.48 | bwd_inner_microstep: 1375.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3585 [2024-06-11 01:05:47,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.19 | bwd_microstep: 1339.48 | bwd_inner_microstep: 1339.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-11 01:05:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1406.12 | bwd_inner_microstep: 1406.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046 [2024-06-11 01:05:50,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.64 | bwd_microstep: 906.38 | bwd_inner_microstep: 906.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3691 [2024-06-11 01:05:55,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.13 | optimizer_step: 6.59 [2024-06-11 01:05:55,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.28 | bwd_microstep: 3646.50 | bwd_inner_microstep: 2067.06 | bwd_allreduce_microstep: 1579.39 | step_microstep: 39.00 [2024-06-11 01:05:55,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16243.45 | bwd: 45227.00 | bwd_inner: 43646.64 | bwd_allreduce: 1579.64 | step: 40.63 {'loss': 1.1653, 'learning_rate': 3.3523840247444394e-06, 'epoch': 0.82} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2630 [2024-06-11 01:05:56,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.39 | bwd_microstep: 1058.18 | bwd_inner_microstep: 1058.04 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3921 [2024-06-11 01:05:58,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.56 | bwd_microstep: 1551.98 | bwd_inner_microstep: 1551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 01:06:00,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1340.27 | bwd_inner_microstep: 1340.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-11 01:06:02,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.11 | bwd_microstep: 1149.44 | bwd_inner_microstep: 1149.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-11 01:06:03,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1246.66 | bwd_inner_microstep: 1246.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-11 01:06:06,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1532.02 | bwd_inner_microstep: 1532.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-11 01:06:07,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1253.20 | bwd_inner_microstep: 1253.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-11 01:06:08,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.92 | bwd_microstep: 800.44 | bwd_inner_microstep: 800.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 01:06:10,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-11 01:06:12,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1416.23 | bwd_inner_microstep: 1416.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-11 01:06:14,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.30 | bwd_microstep: 1531.96 | bwd_inner_microstep: 1531.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3517 [2024-06-11 01:06:16,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3459 [2024-06-11 01:06:18,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.04 | bwd_microstep: 1404.35 | bwd_inner_microstep: 1404.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-11 01:06:19,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.10 | bwd_microstep: 798.28 | bwd_inner_microstep: 798.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-11 01:06:21,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1519.33 | bwd_inner_microstep: 1519.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2074 [2024-06-11 01:06:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.43 | bwd_microstep: 915.28 | bwd_inner_microstep: 915.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3634 [2024-06-11 01:06:25,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.76 | bwd_microstep: 1707.45 | bwd_inner_microstep: 1707.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637 [2024-06-11 01:06:27,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.91 | bwd_microstep: 1709.38 | bwd_inner_microstep: 1709.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-11 01:06:29,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.46 | bwd_microstep: 1162.35 | bwd_inner_microstep: 1162.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-11 01:06:31,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1417.67 | bwd_inner_microstep: 1417.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-11 01:06:33,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.94 | bwd_microstep: 1406.41 | bwd_inner_microstep: 1406.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-11 01:06:35,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.04 | bwd_microstep: 1522.57 | bwd_inner_microstep: 1522.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-11 01:06:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1349.56 | bwd_inner_microstep: 1349.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-11 01:06:39,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1248.62 | bwd_inner_microstep: 1248.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3715 [2024-06-11 01:06:40,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1367.78 | bwd_inner_microstep: 1367.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 01:06:43,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.47 | bwd_microstep: 1498.38 | bwd_inner_microstep: 1498.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 01:06:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.22 | bwd_microstep: 1558.71 | bwd_inner_microstep: 1558.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-11 01:06:47,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.09 | bwd_microstep: 1508.23 | bwd_inner_microstep: 1508.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3813 [2024-06-11 01:06:49,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.10 | bwd_microstep: 1754.88 | bwd_inner_microstep: 1754.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-11 01:06:51,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.28 | bwd_microstep: 1594.57 | bwd_inner_microstep: 1594.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-11 01:06:54,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.74 | bwd_microstep: 1650.37 | bwd_inner_microstep: 1650.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3813 [2024-06-11 01:07:00,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.13 | optimizer_step: 6.61 [2024-06-11 01:07:00,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.99 | bwd_microstep: 5370.73 | bwd_inner_microstep: 1915.27 | bwd_allreduce_microstep: 3455.40 | step_microstep: 38.93 [2024-06-11 01:07:00,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16564.40 | bwd: 48082.03 | bwd_inner: 44625.60 | bwd_allreduce: 3455.68 | step: 40.72 {'loss': 1.163, 'learning_rate': 3.3316118767828498e-06, 'epoch': 0.82} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498 [2024-06-11 01:07:02,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1406.37 | bwd_inner_microstep: 1406.28 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-11 01:07:04,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.34 | bwd_microstep: 1475.19 | bwd_inner_microstep: 1475.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3803 [2024-06-11 01:07:06,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.14 | bwd_microstep: 1617.69 | bwd_inner_microstep: 1617.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2427 [2024-06-11 01:07:07,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.54 | bwd_microstep: 908.80 | bwd_inner_microstep: 908.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 01:07:09,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1897 [2024-06-11 01:07:10,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.12 | bwd_microstep: 685.01 | bwd_inner_microstep: 684.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1891 [2024-06-11 01:07:11,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 713.90 | bwd_inner_microstep: 713.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 01:07:13,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.12 | bwd_microstep: 1386.92 | bwd_inner_microstep: 1386.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-11 01:07:15,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1315.65 | bwd_inner_microstep: 1315.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3410 [2024-06-11 01:07:16,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.77 | bwd_microstep: 1307.24 | bwd_inner_microstep: 1307.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730 [2024-06-11 01:07:19,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.35 | bwd_microstep: 1628.71 | bwd_inner_microstep: 1628.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 01:07:21,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.21 | bwd_microstep: 1486.18 | bwd_inner_microstep: 1486.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-11 01:07:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1587.54 | bwd_inner_microstep: 1587.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 01:07:25,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.75 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3074 [2024-06-11 01:07:26,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1242.35 | bwd_inner_microstep: 1242.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-11 01:07:28,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.78 | bwd_microstep: 974.11 | bwd_inner_microstep: 974.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-11 01:07:30,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1494.42 | bwd_inner_microstep: 1494.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-11 01:07:32,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.09 | bwd_microstep: 1634.90 | bwd_inner_microstep: 1634.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876 [2024-06-11 01:07:33,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.10 | bwd_microstep: 710.00 | bwd_inner_microstep: 709.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522 [2024-06-11 01:07:35,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.39 | bwd_microstep: 1199.21 | bwd_inner_microstep: 1199.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-11 01:07:37,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1600.53 | bwd_inner_microstep: 1600.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 01:07:39,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1553.87 | bwd_inner_microstep: 1553.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 01:07:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-11 01:07:43,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1451.95 | bwd_inner_microstep: 1451.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-11 01:07:45,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.77 | bwd_microstep: 1355.03 | bwd_inner_microstep: 1355.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 01:07:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1509.98 | bwd_inner_microstep: 1509.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-11 01:07:49,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.43 | bwd_microstep: 1455.12 | bwd_inner_microstep: 1455.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3678 [2024-06-11 01:07:52,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1693.20 | bwd_inner_microstep: 1693.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-11 01:07:54,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.30 | bwd_microstep: 1754.78 | bwd_inner_microstep: 1754.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-11 01:07:56,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.22 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933 [2024-06-11 01:07:57,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.31 | bwd_microstep: 822.47 | bwd_inner_microstep: 822.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3584 [2024-06-11 01:08:01,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.13 | optimizer_step: 6.58 [2024-06-11 01:08:01,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 3682.57 | bwd_inner_microstep: 1643.87 | bwd_allreduce_microstep: 2038.64 | step_microstep: 38.90 [2024-06-11 01:08:01,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16015.59 | bwd: 45143.84 | bwd_inner: 43104.21 | bwd_allreduce: 2038.92 | step: 40.43 {'loss': 1.1865, 'learning_rate': 3.310898434797585e-06, 'epoch': 0.82} 5:25<5:27:39, 62.21s/it] 82%|████████▏ | 1411/1726 [24:26:28<5:27:35, 62.40s/it] 82%|████████▏ | 1411/1726 [24:26:28<5:27:35, 62.40s/it] 82%|████████▏ | 1412/1726 [24:27:30<5:24:51, 62.08s/it] 82%|████████▏ | 1412/1726 [24:27:30<5:24:51, 62.08s/it] 82%|████████▏ | 1413/1726 [24:28:31<5:23:26, 62.00s/it] 82%|████████▏ | 1413/1726 [24:28:31<5:23:26, 62.00s/it] 82%|████████▏ | 1414/1726 [24:29:36<5:27:05, 62.90s/it] 82%|████████▏ | 1414/1726 [24:29:36<5:27:05, 62.90s/it] 82%|████████▏ | 1415/1726 [24:30:38<5:23:52, 62.48s/it] 82%|████████▏ |dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 01:08:03,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1379.31 | bwd_inner_microstep: 1379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-11 01:08:04,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 791.64 | bwd_inner_microstep: 791.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-11 01:08:06,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1320.02 | bwd_inner_microstep: 1319.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 01:08:08,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1383.92 | bwd_inner_microstep: 1383.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 01:08:10,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 01:08:12,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1249.03 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1972 [2024-06-11 01:08:13,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.17 | bwd_microstep: 797.77 | bwd_inner_microstep: 797.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 01:08:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.49 | bwd_microstep: 1387.15 | bwd_inner_microstep: 1387.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3435 [2024-06-11 01:08:16,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.04 | bwd_microstep: 1313.57 | bwd_inner_microstep: 1313.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3660 [2024-06-11 01:08:19,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.92 | bwd_microstep: 1615.38 | bwd_inner_microstep: 1615.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414 [2024-06-11 01:08:21,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.41 | bwd_microstep: 1374.27 | bwd_inner_microstep: 1374.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.24 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3409 [2024-06-11 01:08:22,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1308.90 | bwd_inner_microstep: 1308.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-11 01:08:24,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.00 | bwd_microstep: 887.93 | bwd_inner_microstep: 887.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 01:08:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.78 | bwd_microstep: 1601.48 | bwd_inner_microstep: 1601.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137 [2024-06-11 01:08:27,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.41 | bwd_microstep: 832.67 | bwd_inner_microstep: 832.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 01:08:29,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.41 | bwd_microstep: 1258.12 | bwd_inner_microstep: 1258.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-11 01:08:31,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.81 | bwd_microstep: 1391.60 | bwd_inner_microstep: 1391.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-11 01:08:33,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.63 | bwd_microstep: 1584.10 | bwd_inner_microstep: 1584.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3824 [2024-06-11 01:08:35,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.91 | bwd_microstep: 1482.93 | bwd_inner_microstep: 1482.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 01:08:37,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.42 | bwd_microstep: 1380.36 | bwd_inner_microstep: 1380.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-11 01:08:38,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.29 | bwd_microstep: 879.36 | bwd_inner_microstep: 879.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-11 01:08:40,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1404.47 | bwd_inner_microstep: 1404.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2282 [2024-06-11 01:08:41,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.89 | bwd_microstep: 784.57 | bwd_inner_microstep: 784.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3833 [2024-06-11 01:08:44,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.13 | bwd_microstep: 1859.10 | bwd_inner_microstep: 1859.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-11 01:08:46,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.02 | bwd_microstep: 1625.31 | bwd_inner_microstep: 1625.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-11 01:08:48,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1510.13 | bwd_inner_microstep: 1510.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-11 01:08:51,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 2511.73 | bwd_inner_microstep: 2511.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686 [2024-06-11 01:08:53,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.98 | bwd_microstep: 1527.44 | bwd_inner_microstep: 1527.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 01:08:55,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1604.23 | bwd_inner_microstep: 1604.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 01:08:57,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.46 | bwd_microstep: 1277.34 | bwd_inner_microstep: 1277.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777 [2024-06-11 01:08:59,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.95 | bwd_microstep: 1652.37 | bwd_inner_microstep: 1652.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 01:09:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.05 | optimizer_step: 6.60 [2024-06-11 01:09:02,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1797.45 | bwd_inner_microstep: 1480.95 | bwd_allreduce_microstep: 316.45 | step_microstep: 37.84 [2024-06-11 01:09:02,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15949.54 | bwd: 44159.44 | bwd_inner: 43842.08 | bwd_allreduce: 316.68 | step: 39.59 {'loss': 1.1516, 'learning_rate': 3.290243771741275e-06, 'epoch': 0.82} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3403 [2024-06-11 01:09:03,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1301.52 | bwd_inner_microstep: 1301.34 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.18 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3931 [2024-06-11 01:09:06,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1496.69 | bwd_inner_microstep: 1496.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2981 [2024-06-11 01:09:07,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.80 | bwd_microstep: 1050.34 | bwd_inner_microstep: 1050.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 01:09:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.21 | bwd_microstep: 1346.47 | bwd_inner_microstep: 1346.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-11 01:09:11,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.10 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3626 [2024-06-11 01:09:12,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.88 | bwd_microstep: 1311.95 | bwd_inner_microstep: 1311.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-11 01:09:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.58 | bwd_microstep: 1633.26 | bwd_inner_microstep: 1633.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-11 01:09:16,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1280.62 | bwd_inner_microstep: 1280.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-11 01:09:18,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.66 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500 [2024-06-11 01:09:20,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950 [2024-06-11 01:09:21,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.79 | bwd_microstep: 850.41 | bwd_inner_microstep: 850.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2152 [2024-06-11 01:09:23,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.75 | bwd_microstep: 909.48 | bwd_inner_microstep: 909.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3413 [2024-06-11 01:09:24,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.94 | bwd_microstep: 1211.62 | bwd_inner_microstep: 1211.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 01:09:26,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1477.52 | bwd_inner_microstep: 1477.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3530 [2024-06-11 01:09:28,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.68 | bwd_microstep: 1451.63 | bwd_inner_microstep: 1451.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-11 01:09:31,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.83 | bwd_microstep: 1717.30 | bwd_inner_microstep: 1717.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1931 [2024-06-11 01:09:32,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.58 | bwd_microstep: 697.48 | bwd_inner_microstep: 697.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 01:09:34,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.09 | bwd_microstep: 1392.88 | bwd_inner_microstep: 1392.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-11 01:09:35,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.97 | bwd_microstep: 808.62 | bwd_inner_microstep: 808.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3701 [2024-06-11 01:09:37,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1530.06 | bwd_inner_microstep: 1530.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-11 01:09:39,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.75 | bwd_microstep: 1598.86 | bwd_inner_microstep: 1598.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1996 [2024-06-11 01:09:40,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.40 | bwd_microstep: 859.24 | bwd_inner_microstep: 859.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546 [2024-06-11 01:09:42,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.11 | bwd_microstep: 1592.13 | bwd_inner_microstep: 1592.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 01:09:45,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.06 | bwd_microstep: 1545.76 | bwd_inner_microstep: 1545.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 01:09:47,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.53 | bwd_microstep: 1507.52 | bwd_inner_microstep: 1507.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684 [2024-06-11 01:09:48,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.28 | bwd_microstep: 1328.60 | bwd_inner_microstep: 1328.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 01:09:50,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.04 | bwd_microstep: 1253.40 | bwd_inner_microstep: 1253.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 01:09:52,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.60 | bwd_microstep: 1280.39 | bwd_inner_microstep: 1280.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3769 [2024-06-11 01:09:54,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.02 | bwd_microstep: 1569.53 | bwd_inner_microstep: 1569.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 01:09:56,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.04 | bwd_microstep: 1350.05 | bwd_inner_microstep: 1350.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-11 01:09:58,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.48 | bwd_microstep: 1534.82 | bwd_inner_microstep: 1534.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3591 [2024-06-11 01:10:03,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-11 01:10:03,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.42 | bwd_microstep: 4083.39 | bwd_inner_microstep: 1445.98 | bwd_allreduce_microstep: 2637.35 | step_microstep: 38.27 [2024-06-11 01:10:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15770.56 | bwd: 44956.11 | bwd_inner: 42317.72 | bwd_allreduce: 2637.66 | step: 39.92 {'loss': 1.1757, 'learning_rate': 3.269647960359532e-06, 'epoch': 0.82} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 01:10:05,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2440 [2024-06-11 01:10:06,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.51 | bwd_microstep: 1044.44 | bwd_inner_microstep: 1044.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 01:10:08,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.39 | bwd_microstep: 1648.77 | bwd_inner_microstep: 1648.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505 [2024-06-11 01:10:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1316.15 | bwd_inner_microstep: 1316.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 01:10:12,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.85 | bwd_microstep: 1385.51 | bwd_inner_microstep: 1385.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788 [2024-06-11 01:10:14,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.64 | bwd_microstep: 1450.63 | bwd_inner_microstep: 1450.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961 [2024-06-11 01:10:15,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.84 | bwd_microstep: 701.83 | bwd_inner_microstep: 701.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-11 01:10:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.55 | bwd_microstep: 1278.15 | bwd_inner_microstep: 1278.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 01:10:18,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.59 | bwd_microstep: 790.85 | bwd_inner_microstep: 790.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2074 [2024-06-11 01:10:19,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.12 | bwd_microstep: 818.92 | bwd_inner_microstep: 818.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3759 [2024-06-11 01:10:21,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1469.67 | bwd_inner_microstep: 1469.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1923 [2024-06-11 01:10:22,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.20 | bwd_microstep: 771.87 | bwd_inner_microstep: 771.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 01:10:24,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1253.28 | bwd_inner_microstep: 1253.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457 [2024-06-11 01:10:26,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.89 | bwd_microstep: 1343.05 | bwd_inner_microstep: 1343.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-11 01:10:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1390.72 | bwd_inner_microstep: 1390.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 01:10:30,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.00 | bwd_microstep: 1478.77 | bwd_inner_microstep: 1478.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3444 [2024-06-11 01:10:32,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.76 | bwd_microstep: 1478.72 | bwd_inner_microstep: 1478.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1914 [2024-06-11 01:10:33,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.02 | bwd_microstep: 687.31 | bwd_inner_microstep: 687.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3547 [2024-06-11 01:10:34,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.84 | bwd_microstep: 1229.34 | bwd_inner_microstep: 1229.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 01:10:37,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.26 | bwd_microstep: 1547.47 | bwd_inner_microstep: 1547.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 01:10:39,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.19 | bwd_microstep: 1648.17 | bwd_inner_microstep: 1648.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-11 01:10:41,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1489.61 | bwd_inner_microstep: 1489.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2084 [2024-06-11 01:10:42,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.54 | bwd_microstep: 918.08 | bwd_inner_microstep: 918.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3556 [2024-06-11 01:10:44,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.11 | bwd_microstep: 1328.04 | bwd_inner_microstep: 1328.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-11 01:10:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.52 | bwd_microstep: 1436.04 | bwd_inner_microstep: 1436.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 01:10:48,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.00 | bwd_microstep: 1283.97 | bwd_inner_microstep: 1283.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-11 01:10:50,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1499.51 | bwd_inner_microstep: 1499.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 01:10:52,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1555.91 | bwd_inner_microstep: 1555.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-11 01:10:54,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.18 | bwd_microstep: 1423.54 | bwd_inner_microstep: 1423.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 01:10:56,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.40 | bwd_microstep: 1401.49 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-11 01:10:58,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1504.08 | bwd_inner_microstep: 1504.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-11 01:11:03,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 01:11:03,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 4904.47 | bwd_inner_microstep: 1695.04 | bwd_allreduce_microstep: 3209.38 | step_microstep: 37.81 [2024-06-11 01:11:03,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15554.74 | bwd: 44841.48 | bwd_inner: 41631.20 | bwd_allreduce: 3209.61 | step: 39.26 {'loss': 1.1747, 'learning_rate': 3.2491110731906982e-06, 'epoch': 0.82} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387 [2024-06-11 01:11:05,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1326.65 | bwd_inner_microstep: 1326.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 01:11:07,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.88 | bwd_microstep: 1471.71 | bwd_inner_microstep: 1471.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2327 [2024-06-11 01:11:09,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.51 | bwd_microstep: 981.03 | bwd_inner_microstep: 981.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 01:11:10,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755 [2024-06-11 01:11:12,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1339.00 | bwd_inner_microstep: 1338.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 01:11:14,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1279.51 | bwd_inner_microstep: 1279.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2646 [2024-06-11 01:11:15,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.53 | bwd_microstep: 1019.42 | bwd_inner_microstep: 1019.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-11 01:11:17,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.61 | bwd_microstep: 814.20 | bwd_inner_microstep: 814.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 01:11:19,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1981 [2024-06-11 01:11:20,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.53 | bwd_microstep: 855.57 | bwd_inner_microstep: 855.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2147 [2024-06-11 01:11:21,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.19 | bwd_microstep: 1039.67 | bwd_inner_microstep: 1039.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-11 01:11:23,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.72 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 01:11:25,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.37 | bwd_microstep: 1245.01 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 01:11:27,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1376.06 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-11 01:11:29,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1556.81 | bwd_inner_microstep: 1556.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3606 [2024-06-11 01:11:31,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.02 | bwd_microstep: 1537.98 | bwd_inner_microstep: 1537.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3634 [2024-06-11 01:11:33,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1343.03 | bwd_inner_microstep: 1343.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3664 [2024-06-11 01:11:35,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.63 | bwd_microstep: 1323.98 | bwd_inner_microstep: 1323.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619 [2024-06-11 01:11:36,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.33 | bwd_microstep: 1310.07 | bwd_inner_microstep: 1310.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 01:11:38,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1385.72 | bwd_inner_microstep: 1385.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3506 [2024-06-11 01:11:40,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1316.39 | bwd_inner_microstep: 1316.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3576 [2024-06-11 01:11:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1251.82 | bwd_inner_microstep: 1251.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2225 [2024-06-11 01:11:43,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.27 | bwd_microstep: 923.83 | bwd_inner_microstep: 923.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 01:11:45,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190 [2024-06-11 01:11:46,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.15 | bwd_microstep: 858.37 | bwd_inner_microstep: 858.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-11 01:11:48,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.87 | bwd_microstep: 1308.23 | bwd_inner_microstep: 1308.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2286 [2024-06-11 01:11:49,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.88 | bwd_microstep: 937.26 | bwd_inner_microstep: 937.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-11 01:11:51,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.26 | bwd_microstep: 1642.80 | bwd_inner_microstep: 1642.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-11 01:11:53,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.90 | bwd_microstep: 976.66 | bwd_inner_microstep: 976.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 01:11:55,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1556.42 | bwd_inner_microstep: 1556.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046 [2024-06-11 01:11:56,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.37 | bwd_microstep: 902.51 | bwd_inner_microstep: 902.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379 [2024-06-11 01:12:02,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 01:12:02,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.00 | bwd_microstep: 5488.49 | bwd_inner_microstep: 1626.60 | bwd_allreduce_microstep: 3861.84 | step_microstep: 38.90 [2024-06-11 01:12:02,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14855.09 | bwd: 43640.48 | bwd_inner: 39777.73 | bwd_allreduce: 3862.07 | step: 40.37 {'loss': 1.2013, 'learning_rate': 3.2286331825655882e-06, 'epoch': 0.82} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 01:12:04,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1365.12 | bwd_inner_microstep: 1365.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 4044 [2024-06-11 01:12:06,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.92 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 01:12:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1242.65 | bwd_inner_microstep: 1242.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 01:12:10,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 01:12:12,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1382.04 | bwd_inner_microstep: 1382.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3756 [2024-06-11 01:12:14,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 1442.46 | bwd_inner_microstep: 1442.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3398 [2024-06-11 01:12:16,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.07 | bwd_microstep: 1178.70 | bwd_inner_microstep: 1178.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 01:12:18,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.49 | bwd_inner_microstep: 1389.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713 [2024-06-11 01:12:20,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.44 | bwd_microstep: 1459.35 | bwd_inner_microstep: 1459.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 01:12:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.54 | bwd_microstep: 1474.41 | bwd_inner_microstep: 1474.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 01:12:23,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1378.08 | bwd_inner_microstep: 1378.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-11 01:12:26,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1480.88 | bwd_inner_microstep: 1480.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 01:12:27,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1375.22 | bwd_inner_microstep: 1375.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641 [2024-06-11 01:12:29,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.01 | bwd_microstep: 1435.72 | bwd_inner_microstep: 1435.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 01:12:31,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.97 | bwd_microstep: 1450.93 | bwd_inner_microstep: 1450.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 01:12:33,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.64 | bwd_microstep: 1398.22 | bwd_inner_microstep: 1398.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2069 [2024-06-11 01:12:34,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.39 | bwd_microstep: 753.80 | bwd_inner_microstep: 753.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100 [2024-06-11 01:12:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.98 | bwd_microstep: 921.17 | bwd_inner_microstep: 921.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 01:12:38,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 1405.56 | bwd_inner_microstep: 1405.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-11 01:12:40,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.36 | bwd_microstep: 1414.87 | bwd_inner_microstep: 1414.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-11 01:12:42,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1415.74 | bwd_inner_microstep: 1415.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-11 01:12:44,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.44 | bwd_microstep: 1451.44 | bwd_inner_microstep: 1451.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3541 [2024-06-11 01:12:46,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.00 | bwd_microstep: 1520.77 | bwd_inner_microstep: 1520.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590 [2024-06-11 01:12:47,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.51 | bwd_microstep: 1307.42 | bwd_inner_microstep: 1307.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1997 [2024-06-11 01:12:49,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.25 | bwd_microstep: 833.91 | bwd_inner_microstep: 833.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 01:12:51,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.43 | bwd_microstep: 1657.85 | bwd_inner_microstep: 1657.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-11 01:12:53,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.92 | bwd_microstep: 1518.41 | bwd_inner_microstep: 1518.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1911 [2024-06-11 01:12:54,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.65 | bwd_microstep: 718.28 | bwd_inner_microstep: 718.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3674 [2024-06-11 01:12:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.33 | bwd_microstep: 1585.58 | bwd_inner_microstep: 1585.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2533 [2024-06-11 01:12:58,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.70 | bwd_microstep: 1091.58 | bwd_inner_microstep: 1091.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563 [2024-06-11 01:13:00,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.32 | bwd_microstep: 1462.62 | bwd_inner_microstep: 1462.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2237 [2024-06-11 01:13:04,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.02 | optimizer_step: 6.58 [2024-06-11 01:13:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.46 | bwd_microstep: 3766.96 | bwd_inner_microstep: 983.79 | bwd_allreduce_microstep: 2783.12 | step_microstep: 37.34 [2024-06-11 01:13:04,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15882.59 | bwd: 45307.01 | bwd_inner: 42523.00 | bwd_allreduce: 2783.35 | step: 38.87 {'loss': 1.2181, 'learning_rate': 3.20821436060722e-06, 'epoch': 0.82} 1415/1726 [24:30:38<5:23:52, 62.48s/it] 82%|████████▏ | 1416/1726 [24:31:38<5:19:41, 61.87s/it] 82%|████████▏ | 1416/1726 [24:31:38<5:19:41, 61.87s/it] 82%|████████▏ | 1417/1726 [24:32:39<5:17:25, 61.63s/it] 82%|████████▏ | 1417/1726 [24:32:39<5:17:25, 61.63s/it] 82%|████████▏ | 1418/1726 [24:33:40<5:14:59, 61.36s/it] 82%|████████▏ | 1418/1726 [24:33:40<5:14:59, 61.36s/it] 82%|████████▏ | 1419/1726 [24:34:39<5:10:03, 60.60s/it] 82%|████████▏ | 1419/1726 [24:34:39<5:10:03, 60.60s/it] 82%|████████▏ | 1420/1726 [24:35:41<5:10:28, 60.88s/it] 82%|████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:13:06,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.85 | bwd_microstep: 1328.96 | bwd_inner_microstep: 1328.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-11 01:13:07,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1282.52 | bwd_inner_microstep: 1282.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2351 [2024-06-11 01:13:09,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.20 | bwd_microstep: 985.44 | bwd_inner_microstep: 985.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821 [2024-06-11 01:13:11,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.45 | bwd_microstep: 1509.93 | bwd_inner_microstep: 1509.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-11 01:13:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.50 | bwd_microstep: 1283.77 | bwd_inner_microstep: 1283.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1961 [2024-06-11 01:13:14,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.03 | bwd_microstep: 703.60 | bwd_inner_microstep: 703.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269 [2024-06-11 01:13:15,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.51 | bwd_microstep: 969.19 | bwd_inner_microstep: 969.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3719 [2024-06-11 01:13:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1363.76 | bwd_inner_microstep: 1363.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 01:13:19,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.05 | bwd_microstep: 1251.96 | bwd_inner_microstep: 1251.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3702 [2024-06-11 01:13:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.69 | bwd_microstep: 1579.03 | bwd_inner_microstep: 1579.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3416 [2024-06-11 01:13:23,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.99 | bwd_microstep: 1327.81 | bwd_inner_microstep: 1327.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3672 [2024-06-11 01:13:25,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.30 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-11 01:13:27,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.70 | bwd_microstep: 1487.39 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3523 [2024-06-11 01:13:29,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1590.56 | bwd_inner_microstep: 1590.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 01:13:31,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1490.81 | bwd_inner_microstep: 1490.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1910 [2024-06-11 01:13:32,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.08 | bwd_microstep: 778.30 | bwd_inner_microstep: 778.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-11 01:13:34,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.29 | bwd_microstep: 1294.34 | bwd_inner_microstep: 1294.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 807 [2024-06-11 01:13:34,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 122.57 | bwd_microstep: 311.99 | bwd_inner_microstep: 311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3476 [2024-06-11 01:13:36,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1407.46 | bwd_inner_microstep: 1407.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2105 [2024-06-11 01:13:37,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.76 | bwd_microstep: 822.84 | bwd_inner_microstep: 822.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-11 01:13:39,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.37 | bwd_microstep: 1497.31 | bwd_inner_microstep: 1497.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3443 [2024-06-11 01:13:41,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3441 [2024-06-11 01:13:43,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.28 | bwd_microstep: 1300.48 | bwd_inner_microstep: 1300.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-11 01:13:45,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.76 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-11 01:13:47,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1497.76 | bwd_inner_microstep: 1497.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-11 01:13:49,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.04 | bwd_microstep: 1405.41 | bwd_inner_microstep: 1405.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577 [2024-06-11 01:13:51,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.27 | bwd_microstep: 1557.46 | bwd_inner_microstep: 1557.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601 [2024-06-11 01:13:54,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.62 | bwd_microstep: 1706.20 | bwd_inner_microstep: 1706.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 01:13:55,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.16 | bwd_microstep: 1346.50 | bwd_inner_microstep: 1346.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 01:13:57,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.43 | bwd_microstep: 1458.76 | bwd_inner_microstep: 1458.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3586 [2024-06-11 01:14:00,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.11 | bwd_microstep: 1805.76 | bwd_inner_microstep: 1805.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-11 01:14:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.05 | optimizer_step: 6.59 [2024-06-11 01:14:05,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.31 | bwd_microstep: 4313.18 | bwd_inner_microstep: 1985.21 | bwd_allreduce_microstep: 2327.92 | step_microstep: 37.56 [2024-06-11 01:14:05,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15836.07 | bwd: 44998.53 | bwd_inner: 42669.70 | bwd_allreduce: 2328.14 | step: 38.96 {'loss': 1.156, 'learning_rate': 3.1878546792305908e-06, 'epoch': 0.82} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-11 01:14:06,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.45 | bwd_microstep: 779.97 | bwd_inner_microstep: 779.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3945 [2024-06-11 01:14:08,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.17 | bwd_microstep: 1524.75 | bwd_inner_microstep: 1524.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-11 01:14:10,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.13 | bwd_microstep: 1350.87 | bwd_inner_microstep: 1350.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-11 01:14:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.94 | bwd_microstep: 1208.47 | bwd_inner_microstep: 1208.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-11 01:14:13,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.21 | bwd_microstep: 787.47 | bwd_inner_microstep: 787.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3415 [2024-06-11 01:14:14,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.79 | bwd_microstep: 1149.02 | bwd_inner_microstep: 1148.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-11 01:14:16,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.65 | bwd_microstep: 1427.53 | bwd_inner_microstep: 1427.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2158 [2024-06-11 01:14:18,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 878.42 | bwd_inner_microstep: 878.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2630 [2024-06-11 01:14:19,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.19 | bwd_microstep: 1015.90 | bwd_inner_microstep: 1015.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-11 01:14:21,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.89 | bwd_microstep: 1388.72 | bwd_inner_microstep: 1388.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474 [2024-06-11 01:14:23,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.24 | bwd_microstep: 1214.94 | bwd_inner_microstep: 1214.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-11 01:14:24,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.91 | bwd_microstep: 691.55 | bwd_inner_microstep: 691.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 01:14:26,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1499.95 | bwd_inner_microstep: 1499.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3494 [2024-06-11 01:14:27,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.46 | bwd_microstep: 1343.25 | bwd_inner_microstep: 1343.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-11 01:14:29,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1412.78 | bwd_inner_microstep: 1412.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3635 [2024-06-11 01:14:31,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1418.05 | bwd_inner_microstep: 1418.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899 [2024-06-11 01:14:32,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 143.75 | bwd_microstep: 371.74 | bwd_inner_microstep: 371.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1988 [2024-06-11 01:14:33,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.14 | bwd_microstep: 800.77 | bwd_inner_microstep: 800.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-11 01:14:35,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1253.30 | bwd_inner_microstep: 1253.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-11 01:14:37,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.08 | bwd_microstep: 1489.78 | bwd_inner_microstep: 1489.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 01:14:39,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1397.65 | bwd_inner_microstep: 1397.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3604 [2024-06-11 01:14:41,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.69 | bwd_microstep: 1440.24 | bwd_inner_microstep: 1440.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461 [2024-06-11 01:14:42,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 1183.96 | bwd_inner_microstep: 1183.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-11 01:14:44,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.14 | bwd_microstep: 1326.71 | bwd_inner_microstep: 1326.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 01:14:46,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.15 | bwd_microstep: 1660.81 | bwd_inner_microstep: 1660.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3785 [2024-06-11 01:14:49,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1575.09 | bwd_inner_microstep: 1575.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 01:14:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580 [2024-06-11 01:14:53,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.82 | bwd_microstep: 1596.57 | bwd_inner_microstep: 1596.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1922 [2024-06-11 01:14:54,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.18 | bwd_microstep: 756.88 | bwd_inner_microstep: 756.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3780 [2024-06-11 01:14:56,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.23 | bwd_microstep: 1742.28 | bwd_inner_microstep: 1742.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 01:14:58,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.66 | bwd_microstep: 1502.44 | bwd_inner_microstep: 1502.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 01:15:06,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 01:15:06,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 6736.31 | bwd_inner_microstep: 1561.80 | bwd_allreduce_microstep: 5174.46 | step_microstep: 37.76 [2024-06-11 01:15:06,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14985.46 | bwd: 45267.45 | bwd_inner: 40092.06 | bwd_allreduce: 5174.69 | step: 39.24 {'loss': 1.154, 'learning_rate': 3.167554210142374e-06, 'epoch': 0.82} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-11 01:15:07,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.06 | bwd_microstep: 1327.50 | bwd_inner_microstep: 1327.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 01:15:09,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.01 | bwd_microstep: 1240.55 | bwd_inner_microstep: 1240.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-11 01:15:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1641.85 | bwd_inner_microstep: 1641.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1898 [2024-06-11 01:15:12,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.41 | bwd_microstep: 771.49 | bwd_inner_microstep: 771.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 01:15:14,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.43 | bwd_microstep: 1242.08 | bwd_inner_microstep: 1242.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730 [2024-06-11 01:15:16,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.45 | bwd_microstep: 1625.37 | bwd_inner_microstep: 1625.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938 [2024-06-11 01:15:17,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.83 | bwd_microstep: 725.91 | bwd_inner_microstep: 725.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 01:15:19,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-11 01:15:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.30 | bwd_microstep: 1617.30 | bwd_inner_microstep: 1617.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 01:15:23,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1257.78 | bwd_inner_microstep: 1257.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 01:15:25,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.45 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3569 [2024-06-11 01:15:27,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.29 | bwd_microstep: 1590.29 | bwd_inner_microstep: 1590.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 01:15:29,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1344.44 | bwd_inner_microstep: 1344.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1948 [2024-06-11 01:15:30,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.44 | bwd_microstep: 885.53 | bwd_inner_microstep: 885.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 01:15:32,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1242.83 | bwd_inner_microstep: 1242.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 01:15:34,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.13 | bwd_microstep: 1385.01 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392 [2024-06-11 01:15:36,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.77 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 01:15:38,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.24 | bwd_microstep: 1517.48 | bwd_inner_microstep: 1517.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 01:15:39,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1253.06 | bwd_inner_microstep: 1253.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-11 01:15:41,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1459.22 | bwd_inner_microstep: 1459.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-11 01:15:43,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1356.25 | bwd_inner_microstep: 1356.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2183 [2024-06-11 01:15:45,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.94 | bwd_microstep: 952.84 | bwd_inner_microstep: 952.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-11 01:15:47,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.17 | bwd_microstep: 1405.47 | bwd_inner_microstep: 1405.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-11 01:15:48,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.15 | bwd_microstep: 1159.17 | bwd_inner_microstep: 1159.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 01:15:50,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.57 | bwd_microstep: 1451.00 | bwd_inner_microstep: 1450.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042 [2024-06-11 01:15:51,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.92 | bwd_microstep: 908.38 | bwd_inner_microstep: 908.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 01:15:53,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.79 | bwd_microstep: 1355.17 | bwd_inner_microstep: 1355.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3845 [2024-06-11 01:15:56,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.38 | bwd_microstep: 1591.19 | bwd_inner_microstep: 1591.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-11 01:15:57,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.70 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3815 [2024-06-11 01:15:59,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.68 | bwd_microstep: 1448.76 | bwd_inner_microstep: 1448.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769 [2024-06-11 01:16:01,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1444.74 | bwd_inner_microstep: 1444.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-11 01:16:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.22 | optimizer_step: 6.61 [2024-06-11 01:16:50,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.12 | bwd_microstep: 48047.10 | bwd_inner_microstep: 1682.02 | bwd_allreduce_microstep: 46365.01 | step_microstep: 38.85 [2024-06-11 01:16:50,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15724.17 | bwd: 88565.35 | bwd_inner: 42199.42 | bwd_allreduce: 46365.25 | step: 40.30 {'loss': 1.1723, 'learning_rate': 3.1473130248407278e-06, 'epoch': 0.82} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 01:16:52,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.03 | bwd_microstep: 1271.55 | bwd_inner_microstep: 1271.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872 [2024-06-11 01:16:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.14 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 01:16:56,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.92 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242 [2024-06-11 01:16:58,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.81 | bwd_microstep: 958.32 | bwd_inner_microstep: 958.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 01:16:59,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1273.63 | bwd_inner_microstep: 1273.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3628 [2024-06-11 01:17:01,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1308.67 | bwd_inner_microstep: 1308.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-11 01:17:03,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.08 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 01:17:05,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.53 | bwd_microstep: 1387.52 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3509 [2024-06-11 01:18:21,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.33 | bwd_microstep: 1497.96 | bwd_inner_microstep: 1497.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 01:18:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1374.61 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-11 01:18:25,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.68 | bwd_microstep: 1336.40 | bwd_inner_microstep: 1336.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 01:18:27,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.10 | bwd_microstep: 1378.81 | bwd_inner_microstep: 1378.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637 [2024-06-11 01:18:29,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.29 | bwd_microstep: 1690.36 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 01:18:31,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.32 | bwd_microstep: 1374.37 | bwd_inner_microstep: 1374.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2089 [2024-06-11 01:18:32,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.49 | bwd_microstep: 847.49 | bwd_inner_microstep: 847.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 01:18:34,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.67 | bwd_microstep: 1275.53 | bwd_inner_microstep: 1275.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-11 01:18:36,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.86 | bwd_microstep: 1507.38 | bwd_inner_microstep: 1507.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3551 [2024-06-11 01:18:38,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.65 | bwd_microstep: 1227.69 | bwd_inner_microstep: 1227.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227 [2024-06-11 01:18:39,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.62 | bwd_microstep: 860.64 | bwd_inner_microstep: 860.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 01:18:41,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.64 | bwd_microstep: 1284.13 | bwd_inner_microstep: 1284.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 01:18:43,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.03 | bwd_microstep: 1277.86 | bwd_inner_microstep: 1277.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 01:18:44,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 01:18:47,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.64 | bwd_microstep: 1552.67 | bwd_inner_microstep: 1552.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 01:18:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.19 | bwd_microstep: 1298.11 | bwd_inner_microstep: 1298.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-11 01:18:49,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.05 | bwd_microstep: 814.44 | bwd_inner_microstep: 814.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387 [2024-06-11 01:18:51,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.26 | bwd_microstep: 1335.15 | bwd_inner_microstep: 1335.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 01:18:53,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.82 | bwd_microstep: 1343.26 | bwd_inner_microstep: 1343.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-11 01:18:55,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.75 | bwd_microstep: 1444.71 | bwd_inner_microstep: 1444.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3436 [2024-06-11 01:18:57,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.73 | bwd_microstep: 1296.40 | bwd_inner_microstep: 1296.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-11 01:18:59,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.58 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3811 [2024-06-11 01:19:01,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.92 | bwd_microstep: 1385.52 | bwd_inner_microstep: 1385.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3599 [2024-06-11 01:19:06,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-11 01:19:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 3878.13 | bwd_inner_microstep: 1707.08 | bwd_allreduce_microstep: 2170.99 | step_microstep: 38.18 [2024-06-11 01:19:06,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15907.46 | bwd: 44737.67 | bwd_inner: 42565.78 | bwd_allreduce: 2171.22 | step: 39.66 {'loss': 1.1842, 'learning_rate': 3.127131194615003e-06, 'epoch': 0.82} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3468 [2024-06-11 01:19:08,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.68 | bwd_microstep: 1516.29 | bwd_inner_microstep: 1516.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474 [2024-06-11 01:19:09,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.54 | bwd_microstep: 1242.87 | bwd_inner_microstep: 1242.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 01:19:11,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.17 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-11 01:19:13,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.47 | bwd_microstep: 1548.19 | bwd_inner_microstep: 1548.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:19:15,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.40 | bwd_microstep: 1247.79 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-11 01:19:17,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.97 | bwd_microstep: 1532.29 | bwd_inner_microstep: 1532.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3742 [2024-06-11 01:19:19,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1496.60 | bwd_inner_microstep: 1496.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-11 01:19:20,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 793.34 | bwd_inner_microstep: 793.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-11 01:19:22,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.61 | bwd_microstep: 797.00 | bwd_inner_microstep: 796.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 01:19:23,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1249.08 | bwd_inner_microstep: 1249.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3522 [2024-06-11 01:19:25,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1226.88 | bwd_inner_microstep: 1226.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1894 [2024-06-11 01:19:26,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.90 | bwd_microstep: 746.06 | bwd_inner_microstep: 746.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 01:19:28,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1387.96 | bwd_inner_microstep: 1387.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-11 01:19:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.64 | bwd_microstep: 790.35 | bwd_inner_microstep: 790.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3442 [2024-06-11 01:19:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.98 | bwd_microstep: 1409.21 | bwd_inner_microstep: 1409.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 01:19:33,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1348.17 | bwd_inner_microstep: 1348.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-11 01:19:35,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.87 | bwd_microstep: 1240.16 | bwd_inner_microstep: 1240.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2129 [2024-06-11 01:19:36,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.37 | bwd_microstep: 931.32 | bwd_inner_microstep: 931.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3431 [2024-06-11 01:19:38,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.00 | bwd_microstep: 1539.91 | bwd_inner_microstep: 1539.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3115 [2024-06-11 01:19:40,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1343.66 | bwd_inner_microstep: 1343.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2909 [2024-06-11 01:19:41,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.31 | bwd_microstep: 1092.35 | bwd_inner_microstep: 1092.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3703 [2024-06-11 01:19:43,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.48 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1531.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 01:19:46,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.30 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150 [2024-06-11 01:19:47,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.86 | bwd_microstep: 849.68 | bwd_inner_microstep: 849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1930 [2024-06-11 01:19:48,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.01 | bwd_microstep: 694.86 | bwd_inner_microstep: 694.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 01:19:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.54 | bwd_microstep: 1160.62 | bwd_inner_microstep: 1160.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-11 01:19:51,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1501.00 | bwd_inner_microstep: 1500.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473 [2024-06-11 01:19:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1404.89 | bwd_inner_microstep: 1404.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-11 01:19:55,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.54 | bwd_microstep: 971.03 | bwd_inner_microstep: 971.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2235 [2024-06-11 01:19:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.68 | bwd_microstep: 901.59 | bwd_inner_microstep: 901.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-11 01:19:58,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.93 | bwd_microstep: 1201.99 | bwd_inner_microstep: 1201.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2044 [2024-06-11 01:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-11 01:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.20 | bwd_microstep: 22509.65 | bwd_inner_microstep: 931.19 | bwd_allreduce_microstep: 21578.41 | step_microstep: 37.96 [2024-06-11 01:20:20,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14454.56 | bwd: 60138.89 | bwd_inner: 38559.58 | bwd_allreduce: 21578.63 | step: 39.45 {'loss': 1.2155, 'learning_rate': 3.107008790545494e-06, 'epoch': 0.83} ��███▏ | 1420/1726 [24:35:41<5:10:28, 60.88s/it] 82%|████████▏ | 1421/1726 [24:36:42<5:09:53, 60.96s/it] 82%|████████▏ | 1421/1726 [24:36:42<5:09:53, 60.96s/it] 82%|████████▏ | 1422/1726 [24:37:42<5:08:16, 60.84s/it] 82%|████████▏ | 1422/1726 [24:37:42<5:08:16, 60.84s/it] 82%|████████▏ | 1423/1726 [24:39:27<6:13:34, 73.98s/it] 82%|████████▏ | 1423/1726 [24:39:27<6:13:34, 73.98s/it] 83%|████████▎ | 1424/1726 [24:41:42<7:45:05, 92.40s/it] 83%|████████▎ | 1424/1726 [24:41:42<7:45:05, 92.40s/it] 83%|████████▎ | 1425/1726 [24:42:57<7:17:14, 87.16s/it] 8dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:20:22,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1322.32 | bwd_inner_microstep: 1322.26 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-11 01:20:24,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1278.79 | bwd_inner_microstep: 1278.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2368 [2024-06-11 01:20:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.54 | bwd_microstep: 887.94 | bwd_inner_microstep: 887.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3917 [2024-06-11 01:20:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.42 | bwd_microstep: 1681.22 | bwd_inner_microstep: 1681.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 01:20:30,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1476.17 | bwd_inner_microstep: 1476.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 01:20:31,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.40 | bwd_microstep: 1246.15 | bwd_inner_microstep: 1246.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 01:20:33,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.06 | bwd_microstep: 1239.99 | bwd_inner_microstep: 1239.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750 [2024-06-11 01:20:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2225 [2024-06-11 01:20:36,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.92 | bwd_microstep: 952.87 | bwd_inner_microstep: 952.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-11 01:20:38,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.42 | bwd_microstep: 1350.84 | bwd_inner_microstep: 1350.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3406 [2024-06-11 01:20:40,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.36 | bwd_microstep: 1320.91 | bwd_inner_microstep: 1320.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625 [2024-06-11 01:20:42,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.37 | bwd_microstep: 1308.48 | bwd_inner_microstep: 1308.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379 [2024-06-11 01:20:44,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1332.90 | bwd_inner_microstep: 1332.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 01:20:45,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1249.59 | bwd_inner_microstep: 1249.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 01:20:47,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1382.58 | bwd_inner_microstep: 1382.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3489 [2024-06-11 01:20:50,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.27 | bwd_microstep: 1572.01 | bwd_inner_microstep: 1571.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647 [2024-06-11 01:20:52,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1960 [2024-06-11 01:20:53,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.60 | bwd_microstep: 703.37 | bwd_inner_microstep: 703.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 01:20:54,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1253.34 | bwd_inner_microstep: 1253.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 01:20:56,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.70 | bwd_microstep: 1554.62 | bwd_inner_microstep: 1554.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 01:20:58,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1479.90 | bwd_inner_microstep: 1479.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3546 [2024-06-11 01:21:00,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.64 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-11 01:21:02,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1295.05 | bwd_inner_microstep: 1295.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826 [2024-06-11 01:21:04,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.56 | bwd_microstep: 1487.67 | bwd_inner_microstep: 1487.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-11 01:21:06,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.77 | bwd_microstep: 1553.47 | bwd_inner_microstep: 1553.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2271 [2024-06-11 01:21:07,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.62 | bwd_microstep: 875.43 | bwd_inner_microstep: 875.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2281 [2024-06-11 01:21:09,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.90 | bwd_microstep: 876.63 | bwd_inner_microstep: 876.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3821 [2024-06-11 01:21:11,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.07 | bwd_microstep: 1691.62 | bwd_inner_microstep: 1691.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-11 01:21:13,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.30 | bwd_microstep: 1555.45 | bwd_inner_microstep: 1555.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2278 [2024-06-11 01:21:14,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.30 | bwd_microstep: 907.51 | bwd_inner_microstep: 907.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-11 01:21:16,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.85 | bwd_microstep: 1474.01 | bwd_inner_microstep: 1473.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-11 01:21:22,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 01:21:22,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 4755.12 | bwd_inner_microstep: 1637.48 | bwd_allreduce_microstep: 3117.59 | step_microstep: 37.87 [2024-06-11 01:21:22,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15684.18 | bwd: 45209.39 | bwd_inner: 42090.84 | bwd_allreduce: 3117.84 | step: 39.42 {'loss': 1.2222, 'learning_rate': 3.0869458835032097e-06, 'epoch': 0.83} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2380 [2024-06-11 01:21:23,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.85 | bwd_microstep: 955.02 | bwd_inner_microstep: 954.95 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 01:21:25,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.47 | bwd_microstep: 1340.17 | bwd_inner_microstep: 1340.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475 [2024-06-11 01:21:27,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.18 | bwd_microstep: 1214.09 | bwd_inner_microstep: 1214.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 01:21:29,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.03 | bwd_microstep: 1474.81 | bwd_inner_microstep: 1474.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2228 [2024-06-11 01:21:30,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.51 | bwd_microstep: 862.67 | bwd_inner_microstep: 862.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3772 [2024-06-11 01:21:32,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.33 | bwd_microstep: 1742.24 | bwd_inner_microstep: 1742.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3772 [2024-06-11 01:21:34,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1541.30 | bwd_inner_microstep: 1541.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-11 01:21:35,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.76 | bwd_microstep: 799.33 | bwd_inner_microstep: 799.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 01:21:37,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1387.13 | bwd_inner_microstep: 1387.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571 [2024-06-11 01:21:39,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.88 | bwd_microstep: 1421.12 | bwd_inner_microstep: 1421.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 01:21:41,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.66 | bwd_microstep: 1380.04 | bwd_inner_microstep: 1380.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3548 [2024-06-11 01:21:43,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.53 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-11 01:21:45,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1581.74 | bwd_inner_microstep: 1581.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3474 [2024-06-11 01:21:47,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.63 | bwd_microstep: 1544.39 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844 [2024-06-11 01:21:50,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1653.69 | bwd_inner_microstep: 1653.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-11 01:21:52,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.14 | bwd_microstep: 1452.40 | bwd_inner_microstep: 1452.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-11 01:21:53,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.37 | bwd_microstep: 1310.99 | bwd_inner_microstep: 1310.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-11 01:21:55,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.24 | bwd_microstep: 1186.28 | bwd_inner_microstep: 1186.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 01:21:57,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.21 | bwd_microstep: 1295.72 | bwd_inner_microstep: 1295.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2286 [2024-06-11 01:21:58,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.06 | bwd_microstep: 1073.42 | bwd_inner_microstep: 1073.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-11 01:22:00,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1404.13 | bwd_inner_microstep: 1404.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-11 01:22:02,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.81 | bwd_microstep: 1498.47 | bwd_inner_microstep: 1498.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2279 [2024-06-11 01:22:04,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 876.90 | bwd_inner_microstep: 876.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-11 01:22:05,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.36 | bwd_microstep: 975.17 | bwd_inner_microstep: 975.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 01:22:07,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.26 | bwd_microstep: 1554.35 | bwd_inner_microstep: 1554.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3546 [2024-06-11 01:22:09,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.86 | bwd_microstep: 1561.49 | bwd_inner_microstep: 1561.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-11 01:22:11,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1392.11 | bwd_inner_microstep: 1392.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-11 01:22:13,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.06 | bwd_microstep: 1387.17 | bwd_inner_microstep: 1387.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3783 [2024-06-11 01:22:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.85 | bwd_microstep: 1689.41 | bwd_inner_microstep: 1689.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2196 [2024-06-11 01:22:17,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.84 | bwd_microstep: 1053.29 | bwd_inner_microstep: 1053.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-11 01:22:18,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.47 | bwd_microstep: 790.07 | bwd_inner_microstep: 790.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764 [2024-06-11 01:22:21,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.03 | optimizer_step: 6.61 [2024-06-11 01:22:21,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.34 | bwd_microstep: 2531.02 | bwd_inner_microstep: 1722.94 | bwd_allreduce_microstep: 808.04 | step_microstep: 37.90 [2024-06-11 01:22:21,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15798.31 | bwd: 43233.95 | bwd_inner: 42424.97 | bwd_allreduce: 808.29 | step: 39.39 {'loss': 1.164, 'learning_rate': 3.0669425441495936e-06, 'epoch': 0.83} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 01:22:23,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.87 | bwd_microstep: 1238.91 | bwd_inner_microstep: 1238.85 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1351 [2024-06-11 01:22:24,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 200.66 | bwd_microstep: 516.84 | bwd_inner_microstep: 516.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-11 01:22:26,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.00 | bwd_microstep: 1486.49 | bwd_inner_microstep: 1486.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 01:22:27,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.03 | bwd_inner_microstep: 1251.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:22:29,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1247.62 | bwd_inner_microstep: 1247.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 01:22:31,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1385.07 | bwd_inner_microstep: 1385.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4079 [2024-06-11 01:22:33,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1522.69 | bwd_inner_microstep: 1522.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-11 01:22:35,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.19 | bwd_microstep: 1527.30 | bwd_inner_microstep: 1527.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-11 01:22:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.98 | bwd_microstep: 1150.06 | bwd_inner_microstep: 1150.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 01:22:39,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1387.48 | bwd_inner_microstep: 1387.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-11 01:22:40,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1251.34 | bwd_inner_microstep: 1251.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3438 [2024-06-11 01:22:42,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.21 | bwd_microstep: 1219.91 | bwd_inner_microstep: 1219.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408 [2024-06-11 01:22:44,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1281.53 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-11 01:22:46,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.15 | bwd_microstep: 1446.84 | bwd_inner_microstep: 1446.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3661 [2024-06-11 01:22:48,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.33 | bwd_microstep: 1512.07 | bwd_inner_microstep: 1512.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2113 [2024-06-11 01:22:49,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.49 | bwd_microstep: 888.51 | bwd_inner_microstep: 888.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3461 [2024-06-11 01:22:51,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1182.79 | bwd_inner_microstep: 1182.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-11 01:22:53,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1409.00 | bwd_inner_microstep: 1408.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-11 01:22:55,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.02 | bwd_microstep: 1486.71 | bwd_inner_microstep: 1486.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 01:22:57,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.34 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 01:22:59,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.79 | bwd_microstep: 1381.87 | bwd_inner_microstep: 1381.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-11 01:23:00,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.29 | bwd_microstep: 1291.30 | bwd_inner_microstep: 1291.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 01:23:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1253.71 | bwd_inner_microstep: 1253.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-11 01:23:04,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.67 | bwd_microstep: 1460.12 | bwd_inner_microstep: 1460.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3679 [2024-06-11 01:23:06,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1399.12 | bwd_inner_microstep: 1399.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-11 01:23:08,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-11 01:23:10,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.62 | bwd_microstep: 1357.56 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-11 01:23:12,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.68 | bwd_microstep: 1424.28 | bwd_inner_microstep: 1424.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477 [2024-06-11 01:23:14,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1407.63 | bwd_inner_microstep: 1407.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-11 01:23:16,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1446.51 | bwd_inner_microstep: 1446.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 01:23:18,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.59 | bwd_microstep: 1650.20 | bwd_inner_microstep: 1650.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2902 [2024-06-11 01:23:22,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.05 | optimizer_step: 6.59 [2024-06-11 01:23:22,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.75 | bwd_microstep: 3707.87 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 2325.43 | step_microstep: 37.90 [2024-06-11 01:23:22,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.91 | bwd: 45116.55 | bwd_inner: 42790.17 | bwd_allreduce: 2325.67 | step: 39.39 {'loss': 1.1931, 'learning_rate': 3.046998842936315e-06, 'epoch': 0.83} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2006 [2024-06-11 01:23:24,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 890.79 | bwd_inner_microstep: 890.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 01:23:26,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1488.21 | bwd_inner_microstep: 1488.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3955 [2024-06-11 01:23:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.39 | bwd_microstep: 1692.47 | bwd_inner_microstep: 1692.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 01:23:30,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.28 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3847 [2024-06-11 01:23:32,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.95 | bwd_microstep: 1658.03 | bwd_inner_microstep: 1658.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-11 01:23:34,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.28 | bwd_microstep: 1528.75 | bwd_inner_microstep: 1528.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3537 [2024-06-11 01:23:36,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.08 | bwd_microstep: 1228.97 | bwd_inner_microstep: 1228.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 01:23:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1247.90 | bwd_inner_microstep: 1247.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 01:23:40,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1479.60 | bwd_inner_microstep: 1479.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440 [2024-06-11 01:23:41,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.55 | bwd_microstep: 1157.33 | bwd_inner_microstep: 1157.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 01:23:43,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.98 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3435 [2024-06-11 01:23:45,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.14 | bwd_microstep: 1282.25 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3697 [2024-06-11 01:23:47,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1471.98 | bwd_inner_microstep: 1471.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1975 [2024-06-11 01:23:48,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.91 | bwd_microstep: 891.40 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3519 [2024-06-11 01:23:50,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.48 | bwd_microstep: 1325.35 | bwd_inner_microstep: 1325.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-11 01:23:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.49 | bwd_microstep: 1426.54 | bwd_inner_microstep: 1426.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-11 01:23:54,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.99 | bwd_microstep: 1415.54 | bwd_inner_microstep: 1415.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-11 01:23:56,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.79 | bwd_microstep: 1312.37 | bwd_inner_microstep: 1312.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-11 01:23:58,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.94 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3636 [2024-06-11 01:24:00,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1514.16 | bwd_inner_microstep: 1514.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933 [2024-06-11 01:24:02,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.45 | bwd_microstep: 1600.11 | bwd_inner_microstep: 1600.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-11 01:24:04,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.91 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000 [2024-06-11 01:24:05,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.15 | bwd_microstep: 769.88 | bwd_inner_microstep: 769.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3605 [2024-06-11 01:24:07,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1310.22 | bwd_inner_microstep: 1310.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-11 01:24:09,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1500.52 | bwd_inner_microstep: 1500.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3902 [2024-06-11 01:24:11,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.49 | bwd_microstep: 1637.12 | bwd_inner_microstep: 1637.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3763 [2024-06-11 01:24:13,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1376.96 | bwd_inner_microstep: 1376.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 01:24:15,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.42 | bwd_microstep: 1511.38 | bwd_inner_microstep: 1511.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797 [2024-06-11 01:24:17,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.80 | bwd_microstep: 1513.26 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3599 [2024-06-11 01:24:19,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.43 | bwd_microstep: 1463.00 | bwd_inner_microstep: 1462.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713 [2024-06-11 01:24:21,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.18 | bwd_microstep: 1129.13 | bwd_inner_microstep: 1129.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 01:24:27,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.07 | optimizer_step: 6.61 [2024-06-11 01:24:27,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 5686.44 | bwd_inner_microstep: 1755.16 | bwd_allreduce_microstep: 3931.22 | step_microstep: 37.76 [2024-06-11 01:24:27,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16353.58 | bwd: 47807.21 | bwd_inner: 43875.07 | bwd_allreduce: 3931.45 | step: 39.25 {'loss': 1.2133, 'learning_rate': 3.0271148501049796e-06, 'epoch': 0.83} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 01:24:29,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1476.75 | bwd_inner_microstep: 1476.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3998 [2024-06-11 01:24:31,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.82 | bwd_microstep: 1705.07 | bwd_inner_microstep: 1705.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 01:24:34,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.39 | bwd_microstep: 1543.80 | bwd_inner_microstep: 1543.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-11 01:24:35,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.27 | bwd_microstep: 1250.00 | bwd_inner_microstep: 1249.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-11 01:24:37,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.40 | bwd_microstep: 1183.31 | bwd_inner_microstep: 1183.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 01:24:39,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.79 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-11 01:24:40,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.59 | bwd_microstep: 1295.32 | bwd_inner_microstep: 1295.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 01:24:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1384.18 | bwd_inner_microstep: 1384.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3696 [2024-06-11 01:24:44,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.63 | bwd_microstep: 1428.18 | bwd_inner_microstep: 1428.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3692 [2024-06-11 01:24:46,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.30 | bwd_microstep: 1522.28 | bwd_inner_microstep: 1522.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 01:24:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.12 | bwd_microstep: 1281.69 | bwd_inner_microstep: 1281.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-11 01:24:50,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1281.40 | bwd_inner_microstep: 1281.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3737 [2024-06-11 01:24:52,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.39 | bwd_microstep: 1666.34 | bwd_inner_microstep: 1666.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-11 01:24:54,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1512.83 | bwd_inner_microstep: 1512.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2907 [2024-06-11 01:24:56,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.83 | bwd_microstep: 1220.79 | bwd_inner_microstep: 1220.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3685 [2024-06-11 01:24:58,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.70 | bwd_microstep: 1720.34 | bwd_inner_microstep: 1720.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-11 01:25:00,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1344.44 | bwd_inner_microstep: 1344.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3819 [2024-06-11 01:25:03,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.89 | bwd_microstep: 1752.19 | bwd_inner_microstep: 1752.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 01:25:05,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1387.87 | bwd_inner_microstep: 1387.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3527 [2024-06-11 01:25:07,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.45 | bwd_microstep: 1583.47 | bwd_inner_microstep: 1583.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 01:25:09,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1463.85 | bwd_inner_microstep: 1463.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3447 [2024-06-11 01:25:11,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.58 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-11 01:25:13,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1404.24 | bwd_inner_microstep: 1404.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 01:25:15,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.94 | bwd_microstep: 1452.24 | bwd_inner_microstep: 1452.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 01:25:17,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3779 [2024-06-11 01:25:19,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.76 | bwd_microstep: 1573.74 | bwd_inner_microstep: 1573.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 01:25:21,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.97 | bwd_microstep: 1656.64 | bwd_inner_microstep: 1656.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 01:25:23,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.44 | bwd_microstep: 1454.37 | bwd_inner_microstep: 1454.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3385 [2024-06-11 01:25:25,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.73 | bwd_microstep: 1241.82 | bwd_inner_microstep: 1241.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-11 01:25:27,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.70 | bwd_microstep: 1501.22 | bwd_inner_microstep: 1501.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-11 01:25:29,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.18 | bwd_microstep: 1502.40 | bwd_inner_microstep: 1502.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 01:25:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 01:25:31,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.26 | bwd_microstep: 1442.74 | bwd_inner_microstep: 1434.82 | bwd_allreduce_microstep: 7.87 | step_microstep: 37.69 [2024-06-11 01:25:31,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17358.83 | bwd: 46485.47 | bwd_inner: 46476.68 | bwd_allreduce: 8.10 | step: 39.16 3%|████████▎ | 1425/1726 [24:42:57<7:17:14, 87.16s/it] 83%|████████▎ | 1426/1726 [24:43:58<6:36:53, 79.38s/it] 83%|████████▎ | 1426/1726 [24:43:58<6:36:53, 79.38s/it] 83%|████████▎ | 1427/1726 [24:44:58<6:05:38, 73.37s/it] 83%|████████▎ | 1427/1726 [24:44:58<6:05:38, 73.37s/it] 83%|████████▎ | 1428/1726 [24:45:59<5:46:38, 69.79s/it] 83%|████████▎ | 1428/1726 [24:45:59<5:46:38, 69.79s/it] 83%|████████▎ | 1429/1726 [24:47:04<5:37:36, 68.20s/it] 83%|████████▎ | 1429/1726 [24:47:04<5:37:36, 68.20s/it] 83%|████████▎ | 1430/1726 [24:48:08<5:30:31, 67.00s/it] {'loss': 1.1703, 'learning_rate': 3.0072906356869145e-06, 'epoch': 0.83} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 01:25:33,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.39 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1277.84 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:25:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.20 | bwd_inner_microstep: 1244.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1936 [2024-06-11 01:25:36,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.15 | bwd_microstep: 696.72 | bwd_inner_microstep: 696.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 01:25:38,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1499.83 | bwd_inner_microstep: 1499.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 01:25:40,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.81 | bwd_microstep: 1381.35 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 01:25:41,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.71 | bwd_microstep: 1246.22 | bwd_inner_microstep: 1246.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 01:25:43,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.43 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-11 01:25:44,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.73 | bwd_microstep: 790.73 | bwd_inner_microstep: 790.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958 [2024-06-11 01:25:45,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.12 | bwd_microstep: 797.60 | bwd_inner_microstep: 797.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2185 [2024-06-11 01:25:47,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.36 | bwd_microstep: 856.81 | bwd_inner_microstep: 856.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-11 01:25:49,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1500.61 | bwd_inner_microstep: 1500.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3931 [2024-06-11 01:25:51,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.09 | bwd_microstep: 1619.41 | bwd_inner_microstep: 1619.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3662 [2024-06-11 01:25:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1385.98 | bwd_inner_microstep: 1385.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-11 01:25:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.15 | bwd_microstep: 1585.92 | bwd_inner_microstep: 1585.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2384 [2024-06-11 01:25:56,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 933.50 | bwd_inner_microstep: 933.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-11 01:25:58,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.01 | bwd_microstep: 1485.96 | bwd_inner_microstep: 1485.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3688 [2024-06-11 01:26:01,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.86 | bwd_microstep: 1661.54 | bwd_inner_microstep: 1661.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828 [2024-06-11 01:26:03,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.90 | bwd_microstep: 1578.24 | bwd_inner_microstep: 1578.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-11 01:26:05,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.23 | bwd_microstep: 1525.56 | bwd_inner_microstep: 1525.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-11 01:26:07,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1487.09 | bwd_inner_microstep: 1487.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2070 [2024-06-11 01:26:08,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.39 | bwd_microstep: 821.28 | bwd_inner_microstep: 821.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-11 01:26:10,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1258.76 | bwd_inner_microstep: 1258.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3795 [2024-06-11 01:26:12,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-11 01:26:14,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.06 | bwd_microstep: 1501.53 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3580 [2024-06-11 01:26:16,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.44 | bwd_microstep: 1422.35 | bwd_inner_microstep: 1422.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-11 01:26:18,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.07 | bwd_microstep: 1414.08 | bwd_inner_microstep: 1414.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 01:26:20,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1381.28 | bwd_inner_microstep: 1381.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 01:26:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1409.08 | bwd_inner_microstep: 1409.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-11 01:26:24,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.24 | bwd_microstep: 1508.37 | bwd_inner_microstep: 1508.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 01:26:26,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1560.25 | bwd_inner_microstep: 1560.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031 [2024-06-11 01:26:27,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.26 | bwd_microstep: 805.21 | bwd_inner_microstep: 805.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3770 [2024-06-11 01:26:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 01:26:33,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.10 | bwd_microstep: 4927.42 | bwd_inner_microstep: 2109.83 | bwd_allreduce_microstep: 2817.54 | step_microstep: 38.10 [2024-06-11 01:26:33,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15870.05 | bwd: 45517.40 | bwd_inner: 42698.82 | bwd_allreduce: 2817.85 | step: 39.64 {'loss': 1.13, 'learning_rate': 2.9875262695028874e-06, 'epoch': 0.83} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 01:26:35,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.59 | bwd_microstep: 1370.71 | bwd_inner_microstep: 1370.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 01:26:37,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 01:26:39,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.70 | bwd_microstep: 1457.25 | bwd_inner_microstep: 1457.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3475 [2024-06-11 01:26:40,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.21 | bwd_microstep: 1307.39 | bwd_inner_microstep: 1307.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 01:26:42,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1337.79 | bwd_inner_microstep: 1337.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:26:44,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.61 | bwd_inner_microstep: 1244.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-11 01:26:46,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.98 | bwd_microstep: 1249.63 | bwd_inner_microstep: 1249.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-11 01:26:47,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.98 | bwd_microstep: 1185.83 | bwd_inner_microstep: 1185.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423 [2024-06-11 01:26:49,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1305.86 | bwd_inner_microstep: 1305.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2080 [2024-06-11 01:26:50,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.94 | bwd_microstep: 848.86 | bwd_inner_microstep: 848.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-11 01:26:52,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1414.26 | bwd_inner_microstep: 1414.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3441 [2024-06-11 01:26:54,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.09 | bwd_microstep: 1395.80 | bwd_inner_microstep: 1395.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-11 01:26:56,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.54 | bwd_microstep: 1406.63 | bwd_inner_microstep: 1406.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 01:26:58,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.85 | bwd_microstep: 1282.54 | bwd_inner_microstep: 1282.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-11 01:27:00,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.67 | bwd_microstep: 1208.17 | bwd_inner_microstep: 1208.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 01:27:01,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.07 | bwd_microstep: 1382.29 | bwd_inner_microstep: 1382.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 01:27:04,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 1646.09 | bwd_inner_microstep: 1646.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-11 01:27:06,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1614.91 | bwd_inner_microstep: 1614.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3824 [2024-06-11 01:27:08,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.72 | bwd_microstep: 1813.08 | bwd_inner_microstep: 1813.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3599 [2024-06-11 01:27:11,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.32 | bwd_microstep: 1672.38 | bwd_inner_microstep: 1672.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-11 01:27:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.96 | bwd_microstep: 1280.66 | bwd_inner_microstep: 1280.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 01:27:15,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1644.74 | bwd_inner_microstep: 1644.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3033 [2024-06-11 01:27:16,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.88 | bwd_microstep: 1072.58 | bwd_inner_microstep: 1072.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3429 [2024-06-11 01:27:18,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.67 | bwd_microstep: 1310.18 | bwd_inner_microstep: 1310.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-11 01:27:20,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.88 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 01:27:22,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.99 | bwd_microstep: 1400.47 | bwd_inner_microstep: 1400.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 01:27:24,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.60 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617 [2024-06-11 01:27:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.75 | bwd_microstep: 1508.76 | bwd_inner_microstep: 1508.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 01:27:28,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.17 | bwd_microstep: 1394.33 | bwd_inner_microstep: 1394.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-11 01:27:29,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1309.80 | bwd_inner_microstep: 1309.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3511 [2024-06-11 01:27:31,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.80 | bwd_microstep: 1193.68 | bwd_inner_microstep: 1193.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-11 01:27:35,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.58 [2024-06-11 01:27:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.40 | bwd_microstep: 3538.05 | bwd_inner_microstep: 1628.34 | bwd_allreduce_microstep: 1909.67 | step_microstep: 37.61 [2024-06-11 01:27:35,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16342.65 | bwd: 45638.49 | bwd_inner: 43727.93 | bwd_allreduce: 1909.90 | step: 39.03 {'loss': 1.149, 'learning_rate': 2.967821821162904e-06, 'epoch': 0.83} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 01:27:37,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.43 | bwd_microstep: 1371.00 | bwd_inner_microstep: 1370.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2406 [2024-06-11 01:27:38,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.75 | bwd_microstep: 999.87 | bwd_inner_microstep: 999.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-11 01:27:41,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1545.99 | bwd_inner_microstep: 1545.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1885 [2024-06-11 01:27:42,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.72 | bwd_microstep: 743.43 | bwd_inner_microstep: 743.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742 [2024-06-11 01:27:44,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.78 | bwd_microstep: 1634.73 | bwd_inner_microstep: 1634.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 01:27:46,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1389.96 | bwd_inner_microstep: 1389.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485 [2024-06-11 01:27:47,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.69 | bwd_microstep: 1187.41 | bwd_inner_microstep: 1187.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-11 01:27:49,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.63 | bwd_microstep: 1423.78 | bwd_inner_microstep: 1423.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 01:27:51,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.82 | bwd_microstep: 1376.20 | bwd_inner_microstep: 1376.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3694 [2024-06-11 01:27:54,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.76 | bwd_microstep: 1720.67 | bwd_inner_microstep: 1720.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 01:27:56,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.12 | bwd_microstep: 1481.55 | bwd_inner_microstep: 1481.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 01:27:58,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1483.58 | bwd_inner_microstep: 1483.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 01:28:00,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1373.17 | bwd_inner_microstep: 1373.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-11 01:28:01,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.95 | bwd_microstep: 682.60 | bwd_inner_microstep: 682.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3647 [2024-06-11 01:28:03,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1467.38 | bwd_inner_microstep: 1467.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-11 01:28:05,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.78 | bwd_microstep: 1606.89 | bwd_inner_microstep: 1606.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 01:28:07,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.82 | bwd_microstep: 1289.49 | bwd_inner_microstep: 1289.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 01:28:09,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.95 | bwd_microstep: 1389.72 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-11 01:28:11,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.12 | bwd_microstep: 1527.07 | bwd_inner_microstep: 1527.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1992 [2024-06-11 01:28:12,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.20 | bwd_microstep: 706.76 | bwd_inner_microstep: 706.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 01:28:13,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1280.69 | bwd_inner_microstep: 1280.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2009 [2024-06-11 01:28:15,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.15 | bwd_microstep: 805.31 | bwd_inner_microstep: 805.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3440 [2024-06-11 01:28:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1379.92 | bwd_inner_microstep: 1379.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-11 01:28:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.01 | bwd_microstep: 1458.24 | bwd_inner_microstep: 1458.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3312 [2024-06-11 01:28:20,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.19 | bwd_microstep: 1192.06 | bwd_inner_microstep: 1192.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828 [2024-06-11 01:28:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.00 | bwd_microstep: 1390.60 | bwd_inner_microstep: 1390.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 01:28:24,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1448.52 | bwd_inner_microstep: 1448.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529 [2024-06-11 01:28:26,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.23 | bwd_microstep: 1450.72 | bwd_inner_microstep: 1450.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566 [2024-06-11 01:28:28,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.51 | bwd_microstep: 1594.06 | bwd_inner_microstep: 1594.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 01:28:30,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-11 01:28:32,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.95 | bwd_microstep: 1251.48 | bwd_inner_microstep: 1251.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3584 [2024-06-11 01:28:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.11 | optimizer_step: 6.57 [2024-06-11 01:28:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 3460.96 | bwd_inner_microstep: 1802.16 | bwd_allreduce_microstep: 1658.75 | step_microstep: 37.78 [2024-06-11 01:28:36,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15941.71 | bwd: 44467.17 | bwd_inner: 42807.52 | bwd_allreduce: 1658.98 | step: 39.26 {'loss': 1.1924, 'learning_rate': 2.948177360065918e-06, 'epoch': 0.83} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 01:28:38,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.39 | bwd_microstep: 1472.90 | bwd_inner_microstep: 1472.74 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 01:28:40,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.82 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3898 [2024-06-11 01:28:42,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.23 | bwd_microstep: 1481.44 | bwd_inner_microstep: 1481.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 01:28:44,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1377.73 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754 [2024-06-11 01:28:46,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.96 | bwd_microstep: 1533.65 | bwd_inner_microstep: 1533.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3846 [2024-06-11 01:28:48,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.11 | bwd_microstep: 1661.45 | bwd_inner_microstep: 1661.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3733 [2024-06-11 01:28:50,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1494.75 | bwd_inner_microstep: 1494.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-11 01:28:52,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1436.33 | bwd_inner_microstep: 1436.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-11 01:28:54,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.25 | bwd_microstep: 1317.35 | bwd_inner_microstep: 1317.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2192 [2024-06-11 01:28:55,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.14 | bwd_microstep: 921.38 | bwd_inner_microstep: 921.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-11 01:28:57,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.55 | bwd_microstep: 798.35 | bwd_inner_microstep: 798.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 01:28:58,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.97 | bwd_microstep: 1344.50 | bwd_inner_microstep: 1344.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122 [2024-06-11 01:29:00,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.60 | bwd_microstep: 937.08 | bwd_inner_microstep: 937.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708 [2024-06-11 01:29:02,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.64 | bwd_microstep: 1533.41 | bwd_inner_microstep: 1533.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3668 [2024-06-11 01:29:04,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.59 | bwd_microstep: 1547.65 | bwd_inner_microstep: 1547.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-11 01:29:06,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.59 | bwd_microstep: 1617.66 | bwd_inner_microstep: 1617.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-11 01:29:08,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1491.20 | bwd_inner_microstep: 1491.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-11 01:29:10,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.39 | bwd_microstep: 1351.97 | bwd_inner_microstep: 1351.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3691 [2024-06-11 01:29:12,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1431.10 | bwd_inner_microstep: 1431.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-11 01:29:14,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1426.22 | bwd_inner_microstep: 1426.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302 [2024-06-11 01:29:15,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.36 | bwd_microstep: 977.22 | bwd_inner_microstep: 977.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 01:29:17,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1304.07 | bwd_inner_microstep: 1304.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3754 [2024-06-11 01:29:19,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.48 | bwd_microstep: 1307.92 | bwd_inner_microstep: 1307.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3562 [2024-06-11 01:29:21,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1303.31 | bwd_inner_microstep: 1303.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3718 [2024-06-11 01:29:23,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.22 | bwd_microstep: 1537.23 | bwd_inner_microstep: 1537.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3481 [2024-06-11 01:29:25,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.64 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032 [2024-06-11 01:29:26,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.31 | bwd_microstep: 812.30 | bwd_inner_microstep: 812.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3762 [2024-06-11 01:29:28,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.79 | bwd_microstep: 1349.04 | bwd_inner_microstep: 1349.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2275 [2024-06-11 01:29:29,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.47 | bwd_microstep: 814.22 | bwd_inner_microstep: 814.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3810 [2024-06-11 01:29:31,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.32 | bwd_microstep: 1362.77 | bwd_inner_microstep: 1362.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3385 [2024-06-11 01:29:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.63 | bwd_microstep: 1366.12 | bwd_inner_microstep: 1366.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3861 [2024-06-11 01:29:36,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.62 [2024-06-11 01:29:36,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 681.15 | bwd_microstep: 1999.50 | bwd_inner_microstep: 1972.83 | bwd_allreduce_microstep: 26.62 | step_microstep: 37.38 [2024-06-11 01:29:36,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.97 | bwd: 43129.40 | bwd_inner: 43101.75 | bwd_allreduce: 26.91 | step: 38.92 {'loss': 1.1135, 'learning_rate': 2.9285929553996384e-06, 'epoch': 0.83} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 01:29:37,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1372.31 | bwd_inner_microstep: 1372.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882 [2024-06-11 01:29:38,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.28 | bwd_microstep: 684.01 | bwd_inner_microstep: 683.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 01:29:40,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1481.89 | bwd_inner_microstep: 1481.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 01:29:42,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.15 | bwd_microstep: 1395.32 | bwd_inner_microstep: 1395.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-11 01:29:44,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.43 | bwd_microstep: 1447.74 | bwd_inner_microstep: 1447.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-11 01:29:46,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.47 | bwd_microstep: 1154.01 | bwd_inner_microstep: 1153.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3746 [2024-06-11 01:29:48,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.15 | bwd_microstep: 1538.48 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1909 [2024-06-11 01:29:49,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.30 | bwd_microstep: 730.41 | bwd_inner_microstep: 730.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 01:29:51,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.22 | bwd_microstep: 1284.85 | bwd_inner_microstep: 1284.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 01:29:53,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1402.99 | bwd_inner_microstep: 1402.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-11 01:29:55,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.59 | bwd_microstep: 1531.28 | bwd_inner_microstep: 1531.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3450 [2024-06-11 01:29:57,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.87 | bwd_microstep: 1409.39 | bwd_inner_microstep: 1409.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 01:29:59,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.85 | bwd_microstep: 1287.23 | bwd_inner_microstep: 1287.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 01:30:01,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.69 | bwd_microstep: 1381.95 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 01:30:03,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.03 | bwd_microstep: 1482.38 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 01:30:04,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.83 | bwd_microstep: 1342.63 | bwd_inner_microstep: 1342.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-11 01:30:07,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1491.05 | bwd_inner_microstep: 1491.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-11 01:30:08,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1344.97 | bwd_inner_microstep: 1344.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2097 [2024-06-11 01:30:10,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.15 | bwd_microstep: 831.26 | bwd_inner_microstep: 831.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-11 01:30:11,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1416.63 | bwd_inner_microstep: 1416.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3545 [2024-06-11 01:30:14,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.20 | bwd_microstep: 1523.45 | bwd_inner_microstep: 1523.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-11 01:30:15,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.99 | bwd_microstep: 1297.05 | bwd_inner_microstep: 1297.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-11 01:30:16,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.09 | bwd_microstep: 710.16 | bwd_inner_microstep: 710.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3474 [2024-06-11 01:30:18,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.66 | bwd_microstep: 1235.05 | bwd_inner_microstep: 1235.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474 [2024-06-11 01:30:20,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.87 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2239 [2024-06-11 01:30:21,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.21 | bwd_microstep: 1015.21 | bwd_inner_microstep: 1015.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-11 01:30:23,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.81 | bwd_microstep: 1442.70 | bwd_inner_microstep: 1442.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3809 [2024-06-11 01:30:25,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.94 | bwd_microstep: 1355.75 | bwd_inner_microstep: 1355.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3605 [2024-06-11 01:30:27,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1555.57 | bwd_inner_microstep: 1555.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2030 [2024-06-11 01:30:29,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.26 | bwd_microstep: 1002.41 | bwd_inner_microstep: 1002.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3734 [2024-06-11 01:30:31,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.33 | bwd_microstep: 1556.87 | bwd_inner_microstep: 1556.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280 [2024-06-11 01:30:38,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 01:30:38,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.78 | bwd_microstep: 6310.00 | bwd_inner_microstep: 1142.30 | bwd_allreduce_microstep: 5167.63 | step_microstep: 38.90 [2024-06-11 01:30:38,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.86 | bwd: 46421.04 | bwd_inner: 41252.47 | bwd_allreduce: 5167.87 | step: 40.38 83%|████████▎ | 1430/1726 [24:48:08<5:30:31, 67.00s/it] 83%|████████▎ | 1431/1726 [24:49:10<5:21:37, 65.42s/it] 83%|████████▎ | 1431/1726 [24:49:10<5:21:37, 65.42s/it] 83%|████████▎ | 1432/1726 [24:50:12<5:15:58, 64.49s/it] 83%|████████▎ | 1432/1726 [24:50:12<5:15:58, 64.49s/it] 83%|████████▎ | 1433/1726 [24:51:13<5:09:24, 63.36s/it] 83%|████████▎ | 1433/1726 [24:51:13<5:09:24, 63.36s/it] 83%|████████▎ | 1434/1726 [24:52:12<5:02:48, 62.22s/it] 83%|████████▎ | 1434/1726 [24:52:12<5:02:48, 62.22s/it] 83%|████████▎ | 1435/1726 [24:53:14<5:01:42, 62.21s/it] {'loss': 1.1851, 'learning_rate': 2.909068676140212e-06, 'epoch': 0.83} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520 [2024-06-11 01:30:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.76 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3485 [2024-06-11 01:30:42,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.05 | bwd_microstep: 1242.51 | bwd_inner_microstep: 1242.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 01:30:44,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.81 | bwd_microstep: 1557.17 | bwd_inner_microstep: 1557.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788 [2024-06-11 01:30:46,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.90 | bwd_microstep: 1451.42 | bwd_inner_microstep: 1451.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3755 [2024-06-11 01:30:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.30 | bwd_microstep: 1338.43 | bwd_inner_microstep: 1338.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 01:30:49,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.57 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711 [2024-06-11 01:30:52,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.80 | bwd_microstep: 1625.85 | bwd_inner_microstep: 1625.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-11 01:30:54,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1525.24 | bwd_inner_microstep: 1525.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4025 [2024-06-11 01:30:56,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.02 | bwd_microstep: 1616.01 | bwd_inner_microstep: 1615.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-11 01:30:58,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.19 | bwd_microstep: 1279.40 | bwd_inner_microstep: 1279.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-11 01:31:00,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.53 | bwd_microstep: 1342.05 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661 [2024-06-11 01:31:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.07 | bwd_microstep: 1565.87 | bwd_inner_microstep: 1565.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-11 01:31:04,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.18 | bwd_microstep: 1313.22 | bwd_inner_microstep: 1313.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2905 [2024-06-11 01:31:05,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.39 | bwd_microstep: 1220.01 | bwd_inner_microstep: 1219.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-11 01:31:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.36 | bwd_microstep: 1556.89 | bwd_inner_microstep: 1556.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2171 [2024-06-11 01:31:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.85 | bwd_microstep: 952.14 | bwd_inner_microstep: 952.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379 [2024-06-11 01:31:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.25 | bwd_microstep: 1336.33 | bwd_inner_microstep: 1336.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3630 [2024-06-11 01:31:13,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1646.19 | bwd_inner_microstep: 1646.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-11 01:31:15,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1290.87 | bwd_inner_microstep: 1290.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3533 [2024-06-11 01:31:17,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1426.49 | bwd_inner_microstep: 1426.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-11 01:31:18,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.49 | bwd_microstep: 800.03 | bwd_inner_microstep: 800.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3834 [2024-06-11 01:31:20,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.78 | bwd_microstep: 1437.14 | bwd_inner_microstep: 1437.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 01:31:22,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.62 | bwd_microstep: 1493.26 | bwd_inner_microstep: 1493.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 01:31:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.36 | bwd_microstep: 1647.61 | bwd_inner_microstep: 1647.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 01:31:26,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.90 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 01:31:28,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.07 | bwd_microstep: 1287.02 | bwd_inner_microstep: 1286.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-11 01:31:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1479.24 | bwd_inner_microstep: 1479.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 01:31:32,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1376.10 | bwd_inner_microstep: 1376.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 01:31:34,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 1372.87 | bwd_inner_microstep: 1372.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-11 01:31:36,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.45 | bwd_microstep: 1496.34 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-11 01:31:38,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.72 | bwd_microstep: 1431.72 | bwd_inner_microstep: 1431.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1207 [2024-06-11 01:31:40,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.05 | optimizer_step: 6.59 [2024-06-11 01:31:40,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 173.29 | bwd_microstep: 2201.44 | bwd_inner_microstep: 515.60 | bwd_allreduce_microstep: 1685.78 | step_microstep: 37.86 [2024-06-11 01:31:40,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16391.58 | bwd: 45607.16 | bwd_inner: 43920.47 | bwd_allreduce: 1686.01 | step: 39.34 {'loss': 1.1599, 'learning_rate': 2.8896045910520663e-06, 'epoch': 0.83} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2657 [2024-06-11 01:31:42,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.51 | bwd_microstep: 1102.86 | bwd_inner_microstep: 1102.79 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3941 [2024-06-11 01:31:44,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1543.98 | bwd_inner_microstep: 1543.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 01:31:46,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.51 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 01:31:48,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.94 | bwd_microstep: 1482.50 | bwd_inner_microstep: 1482.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 01:31:49,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.13 | bwd_microstep: 1283.65 | bwd_inner_microstep: 1283.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 01:31:51,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1387.30 | bwd_inner_microstep: 1387.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 01:31:53,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.94 | bwd_microstep: 1249.74 | bwd_inner_microstep: 1249.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 01:31:55,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1250.28 | bwd_inner_microstep: 1250.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-11 01:31:56,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.18 | bwd_microstep: 1191.61 | bwd_inner_microstep: 1191.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 01:31:58,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1379.01 | bwd_inner_microstep: 1378.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3657 [2024-06-11 01:32:00,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.34 | bwd_microstep: 1544.74 | bwd_inner_microstep: 1544.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509 [2024-06-11 01:32:03,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.85 | bwd_microstep: 1577.50 | bwd_inner_microstep: 1577.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 01:32:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1248.15 | bwd_inner_microstep: 1248.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2164 [2024-06-11 01:32:06,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 884.45 | bwd_inner_microstep: 884.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 01:32:08,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.65 | bwd_microstep: 1380.83 | bwd_inner_microstep: 1380.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3829 [2024-06-11 01:32:10,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.90 | bwd_microstep: 1487.77 | bwd_inner_microstep: 1487.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-11 01:32:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1290.81 | bwd_inner_microstep: 1290.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3532 [2024-06-11 01:32:13,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.12 | bwd_microstep: 1197.86 | bwd_inner_microstep: 1197.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 01:32:15,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1297.27 | bwd_inner_microstep: 1297.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-11 01:32:17,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.81 | bwd_microstep: 1409.20 | bwd_inner_microstep: 1409.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-11 01:32:18,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.78 | bwd_microstep: 1159.26 | bwd_inner_microstep: 1159.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 01:32:20,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.73 | bwd_microstep: 1397.31 | bwd_inner_microstep: 1397.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-11 01:32:21,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.99 | bwd_microstep: 702.12 | bwd_inner_microstep: 702.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589 [2024-06-11 01:32:23,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.57 | bwd_microstep: 1606.52 | bwd_inner_microstep: 1606.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3437 [2024-06-11 01:32:25,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.53 | bwd_microstep: 1215.33 | bwd_inner_microstep: 1215.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3562 [2024-06-11 01:32:27,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1360.16 | bwd_inner_microstep: 1360.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3609 [2024-06-11 01:32:29,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.91 | bwd_microstep: 1537.75 | bwd_inner_microstep: 1537.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3558 [2024-06-11 01:32:31,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1358.64 | bwd_inner_microstep: 1358.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3379 [2024-06-11 01:32:33,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.36 | bwd_microstep: 1337.95 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2682 [2024-06-11 01:32:34,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.28 | bwd_microstep: 1108.04 | bwd_inner_microstep: 1108.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3811 [2024-06-11 01:32:37,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.53 | bwd_microstep: 1615.42 | bwd_inner_microstep: 1615.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3025 [2024-06-11 01:32:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-11 01:32:41,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.98 | bwd_microstep: 4266.36 | bwd_inner_microstep: 1389.58 | bwd_allreduce_microstep: 2876.73 | step_microstep: 37.97 [2024-06-11 01:32:41,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.67 | bwd: 45233.09 | bwd_inner: 42355.41 | bwd_allreduce: 2876.98 | step: 39.51 {'loss': 1.1832, 'learning_rate': 2.870200768687603e-06, 'epoch': 0.83} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-11 01:32:43,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.75 | bwd_microstep: 1308.62 | bwd_inner_microstep: 1308.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-11 01:32:45,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.61 | bwd_microstep: 1152.73 | bwd_inner_microstep: 1152.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837 [2024-06-11 01:32:47,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1451.83 | bwd_inner_microstep: 1451.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-11 01:32:48,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 790.79 | bwd_inner_microstep: 790.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 01:32:50,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.12 | bwd_microstep: 1246.83 | bwd_inner_microstep: 1246.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 01:32:51,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1284.05 | bwd_inner_microstep: 1284.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-11 01:32:54,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.31 | bwd_microstep: 1627.67 | bwd_inner_microstep: 1627.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 01:32:56,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1343.64 | bwd_inner_microstep: 1343.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2116 [2024-06-11 01:32:57,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 862.12 | bwd_inner_microstep: 862.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3491 [2024-06-11 01:32:59,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.08 | bwd_microstep: 1444.03 | bwd_inner_microstep: 1444.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3496 [2024-06-11 01:33:01,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1678.16 | bwd_inner_microstep: 1678.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 01:33:03,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1380.21 | bwd_inner_microstep: 1380.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3681 [2024-06-11 01:33:05,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.71 | bwd_microstep: 1421.25 | bwd_inner_microstep: 1421.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1953 [2024-06-11 01:33:06,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.66 | bwd_microstep: 893.70 | bwd_inner_microstep: 893.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-11 01:33:08,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.41 | bwd_microstep: 1427.10 | bwd_inner_microstep: 1427.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3507 [2024-06-11 01:33:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.79 | bwd_microstep: 1551.28 | bwd_inner_microstep: 1551.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 01:33:12,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1511.43 | bwd_inner_microstep: 1511.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826 [2024-06-11 01:33:14,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.07 | bwd_microstep: 1519.68 | bwd_inner_microstep: 1519.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-11 01:33:16,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1397.51 | bwd_inner_microstep: 1397.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 01:33:19,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.79 | bwd_inner_microstep: 1556.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3798 [2024-06-11 01:33:21,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.58 | bwd_microstep: 1749.18 | bwd_inner_microstep: 1749.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836 [2024-06-11 01:33:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.47 | bwd_microstep: 1768.86 | bwd_inner_microstep: 1768.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3833 [2024-06-11 01:33:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.27 | bwd_microstep: 1621.40 | bwd_inner_microstep: 1621.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-11 01:33:27,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.39 | bwd_microstep: 1191.06 | bwd_inner_microstep: 1191.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2185 [2024-06-11 01:33:29,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.44 | bwd_microstep: 1051.77 | bwd_inner_microstep: 1051.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 01:33:31,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.67 | bwd_microstep: 1349.83 | bwd_inner_microstep: 1349.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 01:33:32,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.93 | bwd_microstep: 1256.11 | bwd_inner_microstep: 1256.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3549 [2024-06-11 01:33:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.41 | bwd_microstep: 1452.58 | bwd_inner_microstep: 1452.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 01:33:36,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1560.59 | bwd_inner_microstep: 1560.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4000 [2024-06-11 01:33:39,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.95 | bwd_microstep: 1710.41 | bwd_inner_microstep: 1710.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3583 [2024-06-11 01:33:41,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.62 | bwd_microstep: 1803.54 | bwd_inner_microstep: 1803.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2813 [2024-06-11 01:33:57,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-11 01:33:57,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.09 | bwd_microstep: 14775.17 | bwd_inner_microstep: 1383.39 | bwd_allreduce_microstep: 13391.72 | step_microstep: 38.06 [2024-06-11 01:33:57,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16598.52 | bwd: 58139.92 | bwd_inner: 44747.29 | bwd_allreduce: 13391.95 | step: 39.55 {'loss': 1.1188, 'learning_rate': 2.850857277386978e-06, 'epoch': 0.83} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3626 [2024-06-11 01:33:58,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.36 | bwd_microstep: 1264.63 | bwd_inner_microstep: 1264.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4565 [2024-06-11 01:34:01,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.61 | bwd_microstep: 1843.66 | bwd_inner_microstep: 1843.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3970 [2024-06-11 01:34:03,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.91 | bwd_microstep: 1693.09 | bwd_inner_microstep: 1693.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3876 [2024-06-11 01:34:05,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1475.82 | bwd_inner_microstep: 1475.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-11 01:34:07,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.02 | bwd_microstep: 1147.97 | bwd_inner_microstep: 1147.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 01:34:09,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.78 | bwd_microstep: 1552.32 | bwd_inner_microstep: 1552.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 01:34:11,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.89 | bwd_microstep: 1247.72 | bwd_inner_microstep: 1247.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-11 01:34:13,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.11 | bwd_microstep: 1642.06 | bwd_inner_microstep: 1642.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2516 [2024-06-11 01:34:14,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 961.58 | bwd_inner_microstep: 961.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 01:34:16,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.65 | bwd_microstep: 1280.56 | bwd_inner_microstep: 1280.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 01:34:18,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.68 | bwd_inner_microstep: 1254.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673 [2024-06-11 01:34:20,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.10 | bwd_microstep: 1327.26 | bwd_inner_microstep: 1327.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-11 01:34:21,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 01:34:23,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.58 | bwd_microstep: 1387.24 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3631 [2024-06-11 01:34:25,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.03 | bwd_microstep: 1451.72 | bwd_inner_microstep: 1451.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-11 01:34:27,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1144.80 | bwd_inner_microstep: 1144.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 01:34:29,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1515.21 | bwd_inner_microstep: 1515.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 01:34:31,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1348.69 | bwd_inner_microstep: 1348.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 01:34:33,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1397.82 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3473 [2024-06-11 01:34:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.99 | bwd_microstep: 1344.07 | bwd_inner_microstep: 1344.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-11 01:34:37,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1449.49 | bwd_inner_microstep: 1449.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3816 [2024-06-11 01:34:39,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.32 | bwd_microstep: 1754.61 | bwd_inner_microstep: 1754.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-11 01:34:41,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.17 | bwd_microstep: 1499.30 | bwd_inner_microstep: 1499.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-11 01:34:43,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.17 | bwd_microstep: 1319.02 | bwd_inner_microstep: 1318.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-11 01:34:45,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.63 | bwd_microstep: 1359.79 | bwd_inner_microstep: 1359.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 01:34:47,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.48 | bwd_microstep: 1187.85 | bwd_inner_microstep: 1187.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1997 [2024-06-11 01:34:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.38 | bwd_microstep: 801.08 | bwd_inner_microstep: 801.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2269 [2024-06-11 01:34:49,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.80 | bwd_microstep: 811.15 | bwd_inner_microstep: 811.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-11 01:34:51,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.46 | bwd_microstep: 1637.43 | bwd_inner_microstep: 1637.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 01:34:53,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.46 | bwd_microstep: 1377.59 | bwd_inner_microstep: 1377.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3817 [2024-06-11 01:34:55,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.24 | bwd_microstep: 1721.18 | bwd_inner_microstep: 1721.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3799 [2024-06-11 01:35:01,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.07 | optimizer_step: 6.61 [2024-06-11 01:35:01,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 4795.44 | bwd_inner_microstep: 1734.56 | bwd_allreduce_microstep: 3060.82 | step_microstep: 37.94 [2024-06-11 01:35:01,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16440.91 | bwd: 47339.91 | bwd_inner: 44278.19 | bwd_allreduce: 3061.04 | step: 39.37 {'loss': 1.1472, 'learning_rate': 2.831574185277883e-06, 'epoch': 0.83} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2926 [2024-06-11 01:35:02,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.24 | bwd_microstep: 1181.75 | bwd_inner_microstep: 1181.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3448 [2024-06-11 01:35:04,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.37 | bwd_microstep: 1216.32 | bwd_inner_microstep: 1216.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3402 [2024-06-11 01:35:06,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.82 | bwd_microstep: 1306.48 | bwd_inner_microstep: 1306.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803 [2024-06-11 01:35:08,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.50 | bwd_microstep: 1646.92 | bwd_inner_microstep: 1646.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747 [2024-06-11 01:35:10,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.11 | bwd_microstep: 1536.11 | bwd_inner_microstep: 1536.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-11 01:35:12,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1250.58 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 01:35:14,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.78 | bwd_microstep: 1250.90 | bwd_inner_microstep: 1250.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3733 [2024-06-11 01:35:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.95 | bwd_microstep: 1396.49 | bwd_inner_microstep: 1396.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-11 01:35:16,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.74 | bwd_microstep: 685.27 | bwd_inner_microstep: 685.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651 [2024-06-11 01:35:18,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.27 | bwd_microstep: 1425.38 | bwd_inner_microstep: 1425.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 01:35:20,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.13 | bwd_microstep: 1389.81 | bwd_inner_microstep: 1389.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 01:35:22,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.30 | bwd_microstep: 1256.32 | bwd_inner_microstep: 1256.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 01:35:24,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.90 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-11 01:35:26,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.37 | bwd_microstep: 1612.14 | bwd_inner_microstep: 1612.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3631 [2024-06-11 01:35:28,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.65 | bwd_microstep: 1458.90 | bwd_inner_microstep: 1458.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-11 01:35:30,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.62 | bwd_microstep: 1301.42 | bwd_inner_microstep: 1301.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3824 [2024-06-11 01:35:32,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.14 | bwd_microstep: 1356.21 | bwd_inner_microstep: 1356.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948 [2024-06-11 01:35:33,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.52 | bwd_microstep: 728.63 | bwd_inner_microstep: 728.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 01:35:35,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1255.54 | bwd_inner_microstep: 1255.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 01:35:37,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.88 | bwd_microstep: 1549.78 | bwd_inner_microstep: 1549.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3641 [2024-06-11 01:35:38,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1249.61 | bwd_inner_microstep: 1249.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540 [2024-06-11 01:35:40,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1424.09 | bwd_inner_microstep: 1424.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3806 [2024-06-11 01:35:42,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.93 | bwd_microstep: 1288.51 | bwd_inner_microstep: 1288.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-11 01:35:44,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1506.08 | bwd_inner_microstep: 1506.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3811 [2024-06-11 01:35:46,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.28 | bwd_microstep: 1581.35 | bwd_inner_microstep: 1581.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 01:35:48,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1390.36 | bwd_inner_microstep: 1390.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-11 01:35:50,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.75 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3570 [2024-06-11 01:35:52,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.93 | bwd_microstep: 1489.67 | bwd_inner_microstep: 1489.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3583 [2024-06-11 01:35:54,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1524.34 | bwd_inner_microstep: 1524.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2021 [2024-06-11 01:35:56,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.44 | bwd_microstep: 809.35 | bwd_inner_microstep: 809.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3481 [2024-06-11 01:35:58,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.48 | bwd_microstep: 1426.92 | bwd_inner_microstep: 1426.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776 [2024-06-11 01:36:01,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.61 [2024-06-11 01:36:01,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.70 | bwd_microstep: 3234.92 | bwd_inner_microstep: 1975.84 | bwd_allreduce_microstep: 1259.03 | step_microstep: 37.59 [2024-06-11 01:36:01,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16095.28 | bwd: 44425.62 | bwd_inner: 43165.70 | bwd_allreduce: 1259.25 | step: 39.13 83%|████████▎ | 1435/1726 [24:53:14<5:01:42, 62.21s/it] 83%|████████▎ | 1436/1726 [24:54:17<5:00:50, 62.24s/it] 83%|████████▎ | 1436/1726 [24:54:17<5:00:50, 62.24s/it] 83%|████████▎ | 1437/1726 [24:55:18<4:58:37, 62.00s/it] 83%|████████▎ | 1437/1726 [24:55:18<4:58:37, 62.00s/it] 83%|████████▎ | 1438/1726 [24:56:33<5:16:25, 65.92s/it] 83%|████████▎ | 1438/1726 [24:56:33<5:16:25, 65.92s/it] 83%|████████▎ | 1439/1726 [24:57:37<5:12:43, 65.38s/it] 83%|████████▎ | 1439/1726 [24:57:37<5:12:43, 65.38s/it] 83%|████████▎ | 1440/1726 [24:58:38<5:05:10, 64.02s/it] {'loss': 1.2137, 'learning_rate': 2.812351560275246e-06, 'epoch': 0.83} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 01:36:04,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.85 | bwd_microstep: 1491.99 | bwd_inner_microstep: 1491.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3991 [2024-06-11 01:36:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.65 | bwd_microstep: 1701.77 | bwd_inner_microstep: 1701.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3888 [2024-06-11 01:36:08,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.14 | bwd_microstep: 1681.26 | bwd_inner_microstep: 1681.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 01:36:10,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.96 | bwd_microstep: 1274.90 | bwd_inner_microstep: 1274.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-11 01:36:12,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.42 | bwd_microstep: 1348.64 | bwd_inner_microstep: 1348.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-11 01:36:14,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.60 | bwd_microstep: 1482.37 | bwd_inner_microstep: 1482.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-11 01:36:16,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1429.28 | bwd_inner_microstep: 1429.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 01:36:17,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.71 | bwd_microstep: 792.51 | bwd_inner_microstep: 792.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 01:36:19,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3527 [2024-06-11 01:36:21,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.48 | bwd_microstep: 1541.43 | bwd_inner_microstep: 1541.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3430 [2024-06-11 01:36:23,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.30 | bwd_microstep: 1451.77 | bwd_inner_microstep: 1451.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474 [2024-06-11 01:36:25,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.11 | bwd_microstep: 1577.66 | bwd_inner_microstep: 1577.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-11 01:36:26,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.74 | bwd_microstep: 679.88 | bwd_inner_microstep: 679.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583 [2024-06-11 01:36:28,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.07 | bwd_microstep: 1238.68 | bwd_inner_microstep: 1238.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-11 01:36:29,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.00 | bwd_microstep: 796.86 | bwd_inner_microstep: 796.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984 [2024-06-11 01:36:30,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.99 | bwd_microstep: 736.45 | bwd_inner_microstep: 736.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676 [2024-06-11 01:36:32,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.73 | bwd_microstep: 1624.84 | bwd_inner_microstep: 1624.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 01:36:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1283.33 | bwd_inner_microstep: 1283.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 01:36:36,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.34 | bwd_microstep: 1293.60 | bwd_inner_microstep: 1293.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2107 [2024-06-11 01:36:37,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.64 | bwd_microstep: 923.34 | bwd_inner_microstep: 923.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1984 [2024-06-11 01:36:38,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.03 | bwd_microstep: 706.80 | bwd_inner_microstep: 706.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637 [2024-06-11 01:36:40,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1515.98 | bwd_inner_microstep: 1515.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735 [2024-06-11 01:36:42,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.40 | bwd_microstep: 1541.12 | bwd_inner_microstep: 1541.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1998 [2024-06-11 01:36:43,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.93 | bwd_microstep: 862.47 | bwd_inner_microstep: 862.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 01:36:45,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1393.07 | bwd_inner_microstep: 1393.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-11 01:36:47,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.43 | bwd_microstep: 1439.61 | bwd_inner_microstep: 1439.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3383 [2024-06-11 01:36:49,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.08 | bwd_microstep: 1274.46 | bwd_inner_microstep: 1274.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 01:36:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.39 | bwd_microstep: 1476.84 | bwd_inner_microstep: 1476.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3601 [2024-06-11 01:36:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.39 | bwd_microstep: 1702.56 | bwd_inner_microstep: 1702.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3622 [2024-06-11 01:36:55,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.40 | bwd_microstep: 1376.22 | bwd_inner_microstep: 1376.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3669 [2024-06-11 01:36:57,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1479.76 | bwd_inner_microstep: 1479.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1194 [2024-06-11 01:37:02,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.06 | optimizer_step: 6.61 [2024-06-11 01:37:02,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.63 | bwd_microstep: 3947.93 | bwd_inner_microstep: 512.75 | bwd_allreduce_microstep: 3435.12 | step_microstep: 37.82 [2024-06-11 01:37:02,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.38 | bwd: 44448.17 | bwd_inner: 41012.14 | bwd_allreduce: 3435.35 | step: 39.27 {'loss': 1.19, 'learning_rate': 2.7931894700810703e-06, 'epoch': 0.83} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3466 [2024-06-11 01:37:03,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.65 | bwd_microstep: 1336.03 | bwd_inner_microstep: 1336.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3902 [2024-06-11 01:37:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.29 | bwd_microstep: 1515.66 | bwd_inner_microstep: 1515.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 01:37:07,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.20 | bwd_microstep: 1273.81 | bwd_inner_microstep: 1273.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3991 [2024-06-11 01:37:10,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.02 | bwd_microstep: 1702.47 | bwd_inner_microstep: 1702.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-11 01:37:11,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 794.34 | bwd_inner_microstep: 794.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736 [2024-06-11 01:37:13,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.48 | bwd_microstep: 1429.78 | bwd_inner_microstep: 1429.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 01:37:15,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1282.04 | bwd_inner_microstep: 1282.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3497 [2024-06-11 01:37:16,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.23 | bwd_microstep: 1219.49 | bwd_inner_microstep: 1219.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-11 01:37:17,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 794.00 | bwd_inner_microstep: 793.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 01:37:19,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1246.24 | bwd_inner_microstep: 1246.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667 [2024-06-11 01:37:21,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.23 | bwd_microstep: 1323.03 | bwd_inner_microstep: 1323.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 01:37:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.19 | bwd_microstep: 1482.90 | bwd_inner_microstep: 1482.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2659 [2024-06-11 01:37:24,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.35 | bwd_microstep: 1120.37 | bwd_inner_microstep: 1120.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 01:37:26,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1473.49 | bwd_inner_microstep: 1473.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3637 [2024-06-11 01:37:29,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.34 | bwd_microstep: 1707.06 | bwd_inner_microstep: 1707.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 01:37:31,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.88 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667 [2024-06-11 01:37:33,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.27 | bwd_microstep: 1512.56 | bwd_inner_microstep: 1512.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2097 [2024-06-11 01:37:34,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.11 | bwd_microstep: 917.56 | bwd_inner_microstep: 917.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3430 [2024-06-11 01:37:36,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1392.60 | bwd_inner_microstep: 1392.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1928 [2024-06-11 01:37:37,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.86 | bwd_microstep: 760.16 | bwd_inner_microstep: 760.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 01:37:39,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1376.19 | bwd_inner_microstep: 1376.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 01:37:41,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1412.62 | bwd_inner_microstep: 1412.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-11 01:37:43,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.84 | bwd_microstep: 1186.79 | bwd_inner_microstep: 1186.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3671 [2024-06-11 01:37:45,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1429.89 | bwd_inner_microstep: 1429.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3522 [2024-06-11 01:37:47,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1555.62 | bwd_inner_microstep: 1555.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582 [2024-06-11 01:37:49,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.40 | bwd_microstep: 1305.56 | bwd_inner_microstep: 1305.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428 [2024-06-11 01:37:50,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1282.47 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-11 01:37:52,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.01 | bwd_microstep: 1181.73 | bwd_inner_microstep: 1181.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 01:37:54,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.19 | bwd_microstep: 1457.25 | bwd_inner_microstep: 1457.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-11 01:37:56,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1458.28 | bwd_inner_microstep: 1458.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3817 [2024-06-11 01:37:59,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.58 | bwd_microstep: 1860.13 | bwd_inner_microstep: 1860.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442 [2024-06-11 01:38:03,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.63 [2024-06-11 01:38:03,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.93 | bwd_microstep: 4102.73 | bwd_inner_microstep: 1641.60 | bwd_allreduce_microstep: 2461.08 | step_microstep: 37.75 [2024-06-11 01:38:03,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16004.12 | bwd: 45377.29 | bwd_inner: 42915.31 | bwd_allreduce: 2461.31 | step: 39.18 {'loss': 1.2245, 'learning_rate': 2.774087982184124e-06, 'epoch': 0.84} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507 [2024-06-11 01:38:05,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1402.49 | bwd_inner_microstep: 1402.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 4040 [2024-06-11 01:38:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.66 | bwd_microstep: 1355.07 | bwd_inner_microstep: 1355.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:38:09,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.42 | bwd_microstep: 1241.39 | bwd_inner_microstep: 1241.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 01:38:11,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798 [2024-06-11 01:38:13,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.49 | bwd_microstep: 1452.18 | bwd_inner_microstep: 1452.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3782 [2024-06-11 01:38:15,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.90 | bwd_microstep: 1477.99 | bwd_inner_microstep: 1477.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3763 [2024-06-11 01:38:17,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.70 | bwd_microstep: 1606.94 | bwd_inner_microstep: 1606.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-11 01:38:19,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.53 | bwd_microstep: 1149.63 | bwd_inner_microstep: 1149.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 01:38:20,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.67 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2402 [2024-06-11 01:38:22,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.07 | bwd_microstep: 839.86 | bwd_inner_microstep: 839.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 01:38:23,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.14 | bwd_microstep: 1282.42 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 01:38:25,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1281.13 | bwd_inner_microstep: 1281.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3592 [2024-06-11 01:38:27,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.39 | bwd_microstep: 1212.86 | bwd_inner_microstep: 1212.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3696 [2024-06-11 01:38:29,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.83 | bwd_microstep: 1358.73 | bwd_inner_microstep: 1358.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 01:38:31,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1383.55 | bwd_inner_microstep: 1383.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3501 [2024-06-11 01:38:33,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.80 | bwd_microstep: 1680.25 | bwd_inner_microstep: 1680.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 01:38:35,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.09 | bwd_microstep: 1486.16 | bwd_inner_microstep: 1486.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 01:38:37,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.54 | bwd_microstep: 1472.53 | bwd_inner_microstep: 1472.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 01:38:39,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.06 | bwd_microstep: 1352.25 | bwd_inner_microstep: 1352.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1955 [2024-06-11 01:38:40,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.95 | bwd_microstep: 823.85 | bwd_inner_microstep: 823.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-11 01:38:42,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1556.23 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 01:38:44,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.19 | bwd_microstep: 1392.55 | bwd_inner_microstep: 1392.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-11 01:38:46,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.98 | bwd_microstep: 1517.83 | bwd_inner_microstep: 1517.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 01:38:48,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1258.73 | bwd_inner_microstep: 1258.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 01:38:50,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1397.04 | bwd_inner_microstep: 1397.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3490 [2024-06-11 01:38:52,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.28 | bwd_microstep: 1316.63 | bwd_inner_microstep: 1316.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1947 [2024-06-11 01:38:53,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.39 | bwd_microstep: 729.26 | bwd_inner_microstep: 729.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 01:38:55,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1410.51 | bwd_inner_microstep: 1410.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 01:38:57,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.32 | bwd_microstep: 1543.88 | bwd_inner_microstep: 1543.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 01:38:59,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 01:39:01,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.56 | bwd_microstep: 1482.25 | bwd_inner_microstep: 1482.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3397 [2024-06-11 01:39:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.01 | optimizer_step: 6.60 [2024-06-11 01:39:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.81 | bwd_microstep: 2330.01 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 738.44 | step_microstep: 37.53 [2024-06-11 01:39:04,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16142.13 | bwd: 44002.92 | bwd_inner: 43263.58 | bwd_allreduce: 738.67 | step: 38.99 {'loss': 1.1462, 'learning_rate': 2.755047163859763e-06, 'epoch': 0.84} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 01:39:06,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921 [2024-06-11 01:39:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.29 | bwd_microstep: 786.14 | bwd_inner_microstep: 786.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2369 [2024-06-11 01:39:08,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.41 | bwd_microstep: 995.76 | bwd_inner_microstep: 995.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-11 01:39:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.60 | bwd_microstep: 1531.16 | bwd_inner_microstep: 1531.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-11 01:39:11,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.54 | bwd_microstep: 789.99 | bwd_inner_microstep: 789.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-11 01:39:13,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.00 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 01:39:15,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1283.20 | bwd_inner_microstep: 1283.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2689 [2024-06-11 01:39:16,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 400.52 | bwd_microstep: 1061.99 | bwd_inner_microstep: 1061.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486 [2024-06-11 01:39:19,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.88 | bwd_microstep: 1509.45 | bwd_inner_microstep: 1509.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3399 [2024-06-11 01:39:20,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.12 | bwd_microstep: 1207.44 | bwd_inner_microstep: 1207.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-11 01:39:22,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.49 | bwd_microstep: 1624.25 | bwd_inner_microstep: 1624.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 01:39:25,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.09 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 01:39:27,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1559.19 | bwd_inner_microstep: 1559.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3503 [2024-06-11 01:39:29,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1445.72 | bwd_inner_microstep: 1445.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 01:39:30,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-11 01:39:32,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.60 | bwd_microstep: 1393.09 | bwd_inner_microstep: 1393.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-11 01:39:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1510.82 | bwd_inner_microstep: 1510.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494 [2024-06-11 01:39:36,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.57 | bwd_microstep: 1413.03 | bwd_inner_microstep: 1413.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2091 [2024-06-11 01:39:38,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.54 | bwd_microstep: 821.95 | bwd_inner_microstep: 821.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 01:39:40,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1556.64 | bwd_inner_microstep: 1556.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069 [2024-06-11 01:39:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.11 | bwd_microstep: 819.55 | bwd_inner_microstep: 819.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2081 [2024-06-11 01:39:42,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.12 | bwd_microstep: 881.48 | bwd_inner_microstep: 881.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 01:39:44,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.94 | bwd_microstep: 1391.39 | bwd_inner_microstep: 1391.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2115 [2024-06-11 01:39:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.10 | bwd_microstep: 925.46 | bwd_inner_microstep: 925.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3428 [2024-06-11 01:39:47,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.69 | bwd_microstep: 1284.88 | bwd_inner_microstep: 1284.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3817 [2024-06-11 01:39:49,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.30 | bwd_microstep: 1511.42 | bwd_inner_microstep: 1511.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3564 [2024-06-11 01:39:51,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.59 | bwd_microstep: 1524.68 | bwd_inner_microstep: 1524.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 01:39:53,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3547 [2024-06-11 01:39:55,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1523.41 | bwd_inner_microstep: 1523.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 01:39:57,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.50 | bwd_microstep: 1396.43 | bwd_inner_microstep: 1396.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3456 [2024-06-11 01:39:59,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1400.00 | bwd_inner_microstep: 1399.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3579 [2024-06-11 01:40:05,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.20 | optimizer_step: 6.61 [2024-06-11 01:40:05,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.31 | bwd_microstep: 4974.82 | bwd_inner_microstep: 1605.11 | bwd_allreduce_microstep: 3369.65 | step_microstep: 38.64 [2024-06-11 01:40:05,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15556.16 | bwd: 45123.57 | bwd_inner: 41753.00 | bwd_allreduce: 3369.89 | step: 40.17 {'loss': 1.1567, 'learning_rate': 2.7360670821696422e-06, 'epoch': 0.84} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-11 01:40:06,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.79 | bwd_microstep: 1141.86 | bwd_inner_microstep: 1141.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3855 [2024-06-11 01:40:09,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.76 | bwd_microstep: 1655.48 | bwd_inner_microstep: 1655.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4226 [2024-06-11 01:40:11,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.45 | bwd_microstep: 1564.80 | bwd_inner_microstep: 1564.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 01:40:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 01:40:15,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1374.72 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 01:40:17,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1408.84 | bwd_inner_microstep: 1408.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 01:40:18,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1252.15 | bwd_inner_microstep: 1252.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-11 01:40:19,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 699.06 | bwd_inner_microstep: 699.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 01:40:21,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.25 | bwd_microstep: 1255.66 | bwd_inner_microstep: 1255.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 01:40:23,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1286.97 | bwd_inner_microstep: 1286.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-11 01:40:25,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.73 | bwd_microstep: 1278.46 | bwd_inner_microstep: 1278.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3650 [2024-06-11 01:40:27,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.71 | bwd_microstep: 1666.25 | bwd_inner_microstep: 1666.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-11 01:40:29,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.20 | bwd_microstep: 1617.82 | bwd_inner_microstep: 1617.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-11 01:40:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.19 | bwd_microstep: 1550.45 | bwd_inner_microstep: 1550.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-11 01:40:32,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 788.21 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1967 [2024-06-11 01:40:33,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.37 | bwd_microstep: 824.45 | bwd_inner_microstep: 824.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-11 01:40:36,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1558.26 | bwd_inner_microstep: 1558.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2437 [2024-06-11 01:40:37,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.23 | bwd_microstep: 995.11 | bwd_inner_microstep: 995.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3531 [2024-06-11 01:40:39,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.51 | bwd_microstep: 1453.70 | bwd_inner_microstep: 1453.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 01:40:41,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.30 | bwd_microstep: 1259.95 | bwd_inner_microstep: 1259.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-11 01:40:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.28 | bwd_microstep: 1554.55 | bwd_inner_microstep: 1554.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 01:40:45,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.85 | bwd_microstep: 1392.22 | bwd_inner_microstep: 1392.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 01:40:47,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.77 | bwd_microstep: 1375.05 | bwd_inner_microstep: 1375.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 01:40:49,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1558.70 | bwd_inner_microstep: 1558.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3526 [2024-06-11 01:40:51,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.41 | bwd_microstep: 1582.25 | bwd_inner_microstep: 1582.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2180 [2024-06-11 01:40:52,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.52 | bwd_microstep: 952.03 | bwd_inner_microstep: 952.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3571 [2024-06-11 01:40:54,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1457.82 | bwd_inner_microstep: 1457.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3595 [2024-06-11 01:40:56,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.11 | bwd_microstep: 1373.14 | bwd_inner_microstep: 1373.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2722 [2024-06-11 01:40:58,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.39 | bwd_microstep: 1233.14 | bwd_inner_microstep: 1233.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3383 [2024-06-11 01:41:00,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.31 | bwd_microstep: 1242.34 | bwd_inner_microstep: 1242.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-11 01:41:02,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.80 | bwd_microstep: 1500.85 | bwd_inner_microstep: 1500.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:41:07,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.09 | optimizer_step: 6.59 [2024-06-11 01:41:07,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 5120.19 | bwd_inner_microstep: 1411.62 | bwd_allreduce_microstep: 3708.52 | step_microstep: 39.09 [2024-06-11 01:41:07,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15884.14 | bwd: 46315.53 | bwd_inner: 42606.11 | bwd_allreduce: 3708.75 | step: 40.54 83%|████████▎ | 1440/1726 [24:58:38<5:05:10, 64.02s/it] 83%|████████▎ | 1441/1726 [24:59:38<4:58:32, 62.85s/it] 83%|████████▎ | 1441/1726 [24:59:38<4:58:32, 62.85s/it] 84%|████████▎ | 1442/1726 [25:00:40<4:55:52, 62.51s/it] 84%|████████▎ | 1442/1726 [25:00:40<4:55:52, 62.51s/it] 84%|████████▎ | 1443/1726 [25:01:41<4:51:56, 61.90s/it] 84%|████████▎ | 1443/1726 [25:01:41<4:51:56, 61.90s/it] 84%|████████▎ | 1444/1726 [25:02:42<4:49:39, 61.63s/it] 84%|████████▎ | 1444/1726 [25:02:42<4:49:39, 61.63s/it] 84%|████████▎ | 1445/1726 [25:03:44<4:49:53, 61.90s/{'loss': 1.1724, 'learning_rate': 2.717147803961511e-06, 'epoch': 0.84} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 01:41:09,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1383.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:41:11,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1339.83 | bwd_inner_microstep: 1339.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3775 [2024-06-11 01:41:13,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.40 | bwd_microstep: 1637.24 | bwd_inner_microstep: 1637.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-11 01:41:15,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1536.74 | bwd_inner_microstep: 1536.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-11 01:41:17,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1394.09 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3714 [2024-06-11 01:41:19,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1330.12 | bwd_inner_microstep: 1330.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3710 [2024-06-11 01:41:21,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1456.88 | bwd_inner_microstep: 1456.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3475 [2024-06-11 01:41:23,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.14 | bwd_microstep: 1411.81 | bwd_inner_microstep: 1411.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1895 [2024-06-11 01:41:24,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.84 | bwd_microstep: 872.76 | bwd_inner_microstep: 872.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 01:41:26,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.64 | bwd_microstep: 1484.39 | bwd_inner_microstep: 1484.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3721 [2024-06-11 01:41:29,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.69 | bwd_microstep: 1728.65 | bwd_inner_microstep: 1728.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-11 01:41:31,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.97 | bwd_microstep: 1340.97 | bwd_inner_microstep: 1340.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3404 [2024-06-11 01:41:33,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.81 | bwd_microstep: 1536.34 | bwd_inner_microstep: 1536.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3908 [2024-06-11 01:41:35,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3444 [2024-06-11 01:41:37,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.95 | bwd_microstep: 1218.68 | bwd_inner_microstep: 1218.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644 [2024-06-11 01:41:39,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.04 | bwd_microstep: 1418.72 | bwd_inner_microstep: 1418.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-11 01:41:41,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.98 | bwd_microstep: 1656.98 | bwd_inner_microstep: 1656.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 955 [2024-06-11 01:41:41,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.37 | bwd_microstep: 379.56 | bwd_inner_microstep: 379.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3698 [2024-06-11 01:41:43,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.89 | bwd_microstep: 1234.29 | bwd_inner_microstep: 1234.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-11 01:41:45,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.90 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 01:41:47,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.08 | bwd_microstep: 1396.67 | bwd_inner_microstep: 1396.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-11 01:41:49,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1511.12 | bwd_inner_microstep: 1511.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3536 [2024-06-11 01:41:51,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.77 | bwd_microstep: 1232.67 | bwd_inner_microstep: 1232.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 01:41:53,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.32 | bwd_microstep: 1295.27 | bwd_inner_microstep: 1295.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-11 01:41:55,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.01 | bwd_microstep: 1581.85 | bwd_inner_microstep: 1581.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 01:41:57,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.93 | bwd_microstep: 1461.27 | bwd_inner_microstep: 1461.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1995 [2024-06-11 01:41:58,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 833.62 | bwd_inner_microstep: 833.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-11 01:42:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1514.67 | bwd_inner_microstep: 1514.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-11 01:42:02,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.89 | bwd_microstep: 1530.97 | bwd_inner_microstep: 1530.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589 [2024-06-11 01:42:05,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.28 | bwd_microstep: 1805.04 | bwd_inner_microstep: 1805.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3746 [2024-06-11 01:42:07,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.83 | bwd_microstep: 1738.66 | bwd_inner_microstep: 1738.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3439 [2024-06-11 01:42:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.06 | optimizer_step: 6.60 [2024-06-11 01:42:09,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.34 | bwd_microstep: 1585.08 | bwd_inner_microstep: 1577.25 | bwd_allreduce_microstep: 7.78 | step_microstep: 37.68 [2024-06-11 01:42:09,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16681.29 | bwd: 44786.44 | bwd_inner: 44777.74 | bwd_allreduce: 8.00 | step: 39.16 {'loss': 1.2062, 'learning_rate': 2.698289395868965e-06, 'epoch': 0.84} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 01:42:11,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1472.42 | bwd_inner_microstep: 1472.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 01:42:13,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1384.31 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 01:42:15,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.07 | bwd_microstep: 1399.93 | bwd_inner_microstep: 1399.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-11 01:42:17,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.95 | bwd_microstep: 1564.40 | bwd_inner_microstep: 1564.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796 [2024-06-11 01:42:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.41 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 01:42:21,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.05 | bwd_microstep: 1387.43 | bwd_inner_microstep: 1387.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-11 01:42:23,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1415.04 | bwd_inner_microstep: 1415.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4053 [2024-06-11 01:42:25,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1526.32 | bwd_inner_microstep: 1526.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 01:42:27,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.29 | bwd_microstep: 1153.60 | bwd_inner_microstep: 1153.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 01:42:29,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.20 | bwd_microstep: 1383.74 | bwd_inner_microstep: 1383.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-11 01:42:31,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 01:42:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1260.60 | bwd_inner_microstep: 1260.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3449 [2024-06-11 01:42:35,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.22 | bwd_microstep: 1479.91 | bwd_inner_microstep: 1479.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3730 [2024-06-11 01:42:37,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.47 | bwd_microstep: 1730.05 | bwd_inner_microstep: 1730.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 01:42:39,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.55 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-11 01:42:41,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.84 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674 [2024-06-11 01:42:43,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.82 | bwd_microstep: 1625.34 | bwd_inner_microstep: 1625.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 01:42:45,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.31 | bwd_microstep: 1289.36 | bwd_inner_microstep: 1289.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3653 [2024-06-11 01:42:47,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1325.71 | bwd_inner_microstep: 1325.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-11 01:42:49,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.80 | bwd_microstep: 1610.84 | bwd_inner_microstep: 1610.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3667 [2024-06-11 01:42:51,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.15 | bwd_microstep: 1325.56 | bwd_inner_microstep: 1325.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 01:42:53,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1409.82 | bwd_inner_microstep: 1409.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956 [2024-06-11 01:42:54,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.73 | bwd_microstep: 702.32 | bwd_inner_microstep: 702.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3732 [2024-06-11 01:42:56,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.32 | bwd_microstep: 1339.19 | bwd_inner_microstep: 1339.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2102 [2024-06-11 01:42:57,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.80 | bwd_microstep: 854.20 | bwd_inner_microstep: 854.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063 [2024-06-11 01:42:58,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.94 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 01:43:00,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.15 | bwd_microstep: 1645.99 | bwd_inner_microstep: 1645.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3572 [2024-06-11 01:43:03,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.45 | bwd_microstep: 1664.08 | bwd_inner_microstep: 1664.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3450 [2024-06-11 01:43:04,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1401.14 | bwd_inner_microstep: 1401.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3440 [2024-06-11 01:43:06,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.72 | bwd_microstep: 1452.76 | bwd_inner_microstep: 1452.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 01:43:08,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.74 | bwd_microstep: 1344.32 | bwd_inner_microstep: 1344.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-11 01:43:14,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.14 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-11 01:43:14,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 4634.54 | bwd_inner_microstep: 1686.59 | bwd_allreduce_microstep: 2947.90 | step_microstep: 38.41 [2024-06-11 01:43:14,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16579.19 | bwd: 47521.71 | bwd_inner: 44572.90 | bwd_allreduce: 2948.13 | step: 39.89 {'loss': 1.2167, 'learning_rate': 2.679491924311226e-06, 'epoch': 0.84} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3414 [2024-06-11 01:43:15,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.27 | bwd_microstep: 1364.99 | bwd_inner_microstep: 1364.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 01:43:17,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1243.01 | bwd_inner_microstep: 1242.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 01:43:19,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.29 | bwd_microstep: 1480.70 | bwd_inner_microstep: 1480.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3854 [2024-06-11 01:43:22,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.26 | bwd_microstep: 1657.19 | bwd_inner_microstep: 1657.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759 [2024-06-11 01:43:23,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1435.60 | bwd_inner_microstep: 1435.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3406 [2024-06-11 01:43:25,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.92 | bwd_microstep: 1181.91 | bwd_inner_microstep: 1181.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743 [2024-06-11 01:43:27,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.50 | bwd_microstep: 1635.23 | bwd_inner_microstep: 1635.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3491 [2024-06-11 01:43:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.51 | bwd_microstep: 1333.01 | bwd_inner_microstep: 1332.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 01:43:31,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.25 | bwd_microstep: 1394.87 | bwd_inner_microstep: 1394.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-11 01:43:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1409.52 | bwd_inner_microstep: 1409.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3683 [2024-06-11 01:43:35,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.03 | bwd_microstep: 1625.36 | bwd_inner_microstep: 1625.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-11 01:43:36,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.07 | bwd_microstep: 707.98 | bwd_inner_microstep: 707.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3408 [2024-06-11 01:43:38,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.17 | bwd_microstep: 1291.57 | bwd_inner_microstep: 1291.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-11 01:43:40,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1490.01 | bwd_inner_microstep: 1489.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 01:43:42,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.97 | bwd_microstep: 1255.87 | bwd_inner_microstep: 1255.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 01:43:44,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.95 | bwd_microstep: 1349.29 | bwd_inner_microstep: 1349.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3612 [2024-06-11 01:43:46,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.39 | bwd_microstep: 1534.47 | bwd_inner_microstep: 1534.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-11 01:43:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.55 | bwd_microstep: 695.83 | bwd_inner_microstep: 695.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-11 01:43:49,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1324.48 | bwd_inner_microstep: 1324.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-11 01:43:50,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1293.51 | bwd_inner_microstep: 1293.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-11 01:43:53,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.37 | bwd_microstep: 1580.32 | bwd_inner_microstep: 1580.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-11 01:43:55,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.52 | bwd_microstep: 1499.36 | bwd_inner_microstep: 1499.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3436 [2024-06-11 01:43:56,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.20 | bwd_microstep: 1286.68 | bwd_inner_microstep: 1286.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 01:43:59,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.35 | bwd_microstep: 1547.70 | bwd_inner_microstep: 1547.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3905 [2024-06-11 01:44:01,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.35 | bwd_microstep: 1696.79 | bwd_inner_microstep: 1696.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-11 01:44:03,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1457.36 | bwd_inner_microstep: 1457.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-11 01:44:05,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.98 | bwd_microstep: 1157.32 | bwd_inner_microstep: 1157.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652 [2024-06-11 01:44:07,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.71 | bwd_microstep: 1621.95 | bwd_inner_microstep: 1621.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 01:44:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.93 | bwd_microstep: 1556.23 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 01:44:11,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1497.92 | bwd_inner_microstep: 1497.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3527 [2024-06-11 01:44:13,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1494.21 | bwd_inner_microstep: 1494.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3571 [2024-06-11 01:44:15,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.39 | optimizer_gradients: 4.06 | optimizer_step: 6.65 [2024-06-11 01:44:15,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.38 | bwd_microstep: 1541.40 | bwd_inner_microstep: 1533.73 | bwd_allreduce_microstep: 7.63 | step_microstep: 39.03 [2024-06-11 01:44:15,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16684.41 | bwd: 44641.69 | bwd_inner: 44633.17 | bwd_allreduce: 7.85 | step: 40.54 {'loss': 1.1905, 'learning_rate': 2.6607554554928917e-06, 'epoch': 0.84} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-11 01:44:17,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3908 [2024-06-11 01:44:20,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.46 | bwd_microstep: 1691.23 | bwd_inner_microstep: 1691.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3872 [2024-06-11 01:44:22,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.57 | bwd_microstep: 1398.97 | bwd_inner_microstep: 1398.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 01:44:24,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.28 | bwd_microstep: 1560.70 | bwd_inner_microstep: 1560.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 01:44:26,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.90 | bwd_microstep: 1388.18 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-11 01:44:28,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1549.31 | bwd_inner_microstep: 1549.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-11 01:44:30,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1385.94 | bwd_inner_microstep: 1385.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 01:44:32,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.66 | bwd_microstep: 1388.53 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 01:44:33,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1251.33 | bwd_inner_microstep: 1251.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3509 [2024-06-11 01:44:35,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.98 | bwd_microstep: 1250.51 | bwd_inner_microstep: 1250.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3969 [2024-06-11 01:44:38,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.52 | bwd_microstep: 1809.62 | bwd_inner_microstep: 1809.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-11 01:44:39,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1408.43 | bwd_inner_microstep: 1408.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 949 [2024-06-11 01:44:40,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 157.52 | bwd_microstep: 412.33 | bwd_inner_microstep: 412.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3513 [2024-06-11 01:44:42,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.19 | bwd_microstep: 1420.20 | bwd_inner_microstep: 1420.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-11 01:44:44,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.40 | bwd_microstep: 1522.78 | bwd_inner_microstep: 1522.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-11 01:44:45,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 790.63 | bwd_inner_microstep: 790.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3648 [2024-06-11 01:44:48,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.17 | bwd_microstep: 1815.13 | bwd_inner_microstep: 1815.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 01:44:50,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1489.28 | bwd_inner_microstep: 1489.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-11 01:44:52,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.70 | bwd_microstep: 1557.04 | bwd_inner_microstep: 1557.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-11 01:44:54,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.07 | bwd_microstep: 1495.67 | bwd_inner_microstep: 1495.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099 [2024-06-11 01:44:55,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.51 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-11 01:44:57,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.86 | bwd_microstep: 1504.25 | bwd_inner_microstep: 1504.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999 [2024-06-11 01:44:58,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.00 | bwd_microstep: 709.61 | bwd_inner_microstep: 709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2030 [2024-06-11 01:44:59,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.07 | bwd_microstep: 808.72 | bwd_inner_microstep: 808.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-11 01:45:01,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.90 | bwd_microstep: 1452.95 | bwd_inner_microstep: 1452.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3521 [2024-06-11 01:45:03,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.41 | bwd_microstep: 1396.08 | bwd_inner_microstep: 1396.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 01:45:05,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.56 | bwd_microstep: 1382.32 | bwd_inner_microstep: 1382.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3477 [2024-06-11 01:45:07,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.09 | bwd_microstep: 1477.34 | bwd_inner_microstep: 1477.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 01:45:09,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1549.37 | bwd_inner_microstep: 1549.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 01:45:11,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1403.47 | bwd_inner_microstep: 1403.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3563 [2024-06-11 01:45:14,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.15 | bwd_microstep: 1629.65 | bwd_inner_microstep: 1629.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-11 01:45:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.03 | optimizer_step: 6.58 [2024-06-11 01:45:16,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.12 | bwd_microstep: 1847.78 | bwd_inner_microstep: 1605.13 | bwd_allreduce_microstep: 242.60 | step_microstep: 37.54 [2024-06-11 01:45:16,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16367.28 | bwd: 44147.74 | bwd_inner: 43904.11 | bwd_allreduce: 242.89 | step: 39.07 {'loss': 1.193, 'learning_rate': 2.642080055403704e-06, 'epoch': 0.84} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-11 01:45:18,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.74 | bwd_microstep: 1577.62 | bwd_inner_microstep: 1577.53 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 01:45:20,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1398.68 | bwd_inner_microstep: 1398.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4251 [2024-06-11 01:45:23,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.75 | bwd_microstep: 1766.90 | bwd_inner_microstep: 1766.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1866 [2024-06-11 01:45:24,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.32 | bwd_microstep: 677.25 | bwd_inner_microstep: 677.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 01:45:25,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1281.52 | bwd_inner_microstep: 1281.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-11 01:45:27,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.04 | bwd_microstep: 1152.41 | bwd_inner_microstep: 1152.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3399 [2024-06-11 01:45:29,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1150.11 | bwd_inner_microstep: 1150.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 01:45:30,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.77 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 01:45:32,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954 [2024-06-11 01:45:33,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.54 | bwd_microstep: 701.65 | bwd_inner_microstep: 701.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471 [2024-06-11 01:45:35,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.71 | bwd_microstep: 1412.54 | bwd_inner_microstep: 1412.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-11 01:45:37,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1530.37 | bwd_inner_microstep: 1530.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659 [2024-06-11 01:45:39,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.98 | bwd_microstep: 1714.43 | bwd_inner_microstep: 1714.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3549 [2024-06-11 01:45:41,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.26 | bwd_microstep: 1230.89 | bwd_inner_microstep: 1230.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-11 01:45:43,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.86 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3839 [2024-06-11 01:45:45,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1360.56 | bwd_inner_microstep: 1360.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3529 [2024-06-11 01:45:47,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.96 | bwd_microstep: 1259.15 | bwd_inner_microstep: 1259.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628 [2024-06-11 01:45:49,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.43 | bwd_microstep: 1440.81 | bwd_inner_microstep: 1440.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3646 [2024-06-11 01:45:51,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1511.11 | bwd_inner_microstep: 1511.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3486 [2024-06-11 01:45:52,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.40 | bwd_microstep: 1248.65 | bwd_inner_microstep: 1248.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-11 01:45:54,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.49 | bwd_microstep: 975.51 | bwd_inner_microstep: 975.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 01:45:56,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.97 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3431 [2024-06-11 01:45:57,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.72 | bwd_microstep: 1155.22 | bwd_inner_microstep: 1155.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533 [2024-06-11 01:45:59,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.05 | bwd_microstep: 1230.42 | bwd_inner_microstep: 1230.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-11 01:46:01,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1183.35 | bwd_inner_microstep: 1183.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3600 [2024-06-11 01:46:03,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.62 | bwd_microstep: 1539.67 | bwd_inner_microstep: 1539.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3651 [2024-06-11 01:46:05,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.79 | bwd_microstep: 1450.03 | bwd_inner_microstep: 1450.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 01:46:07,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1402.49 | bwd_inner_microstep: 1402.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-11 01:46:09,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1360.31 | bwd_inner_microstep: 1360.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3720 [2024-06-11 01:46:11,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1562.91 | bwd_inner_microstep: 1562.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558 [2024-06-11 01:46:13,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1298.03 | bwd_inner_microstep: 1298.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3588 [2024-06-11 01:46:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.05 | optimizer_step: 6.64 [2024-06-11 01:46:15,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.04 | bwd_microstep: 1740.65 | bwd_inner_microstep: 1732.88 | bwd_allreduce_microstep: 7.72 | step_microstep: 37.70 [2024-06-11 01:46:15,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16044.55 | bwd: 42667.02 | bwd_inner: 42658.32 | bwd_allreduce: 8.00 | step: 39.32 it] 84%|████████▎ | 1445/1726 [25:03:44<4:49:53, 61.90s/it] 84%|████████▍ | 1446/1726 [25:04:46<4:48:43, 61.87s/it] 84%|████████▍ | 1446/1726 [25:04:46<4:48:43, 61.87s/it] 84%|████████▍ | 1447/1726 [25:05:50<4:51:17, 62.64s/it] 84%|████████▍ | 1447/1726 [25:05:50<4:51:17, 62.64s/it] 84%|████████▍ | 1448/1726 [25:06:52<4:48:53, 62.35s/it] 84%|████████▍ | 1448/1726 [25:06:52<4:48:53, 62.35s/it] 84%|████████▍ | 1449/1726 [25:07:53<4:45:46, 61.90s/it] 84%|████████▍ | 1449/1726 [25:07:53<4:45:46, 61.90s/it] 84%|████████▍ | 1450/1726 [25:08:52<{'loss': 1.1686, 'learning_rate': 2.623465789818327e-06, 'epoch': 0.84} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 01:46:17,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.77 | bwd_microstep: 1242.79 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4434 [2024-06-11 01:46:19,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.11 | bwd_microstep: 1823.35 | bwd_inner_microstep: 1823.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 01:46:21,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1480.64 | bwd_inner_microstep: 1480.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3798 [2024-06-11 01:46:23,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.60 | bwd_microstep: 1446.51 | bwd_inner_microstep: 1446.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 01:46:25,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.30 | bwd_microstep: 1282.96 | bwd_inner_microstep: 1282.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-11 01:46:27,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.03 | bwd_microstep: 1529.88 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 845 [2024-06-11 01:46:28,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.31 | bwd_microstep: 347.56 | bwd_inner_microstep: 347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411 [2024-06-11 01:46:30,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.13 | bwd_microstep: 1283.46 | bwd_inner_microstep: 1283.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-11 01:46:32,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.13 | bwd_microstep: 1525.55 | bwd_inner_microstep: 1525.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485 [2024-06-11 01:46:33,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.06 | bwd_microstep: 1220.32 | bwd_inner_microstep: 1220.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1936 [2024-06-11 01:46:35,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.84 | bwd_microstep: 849.28 | bwd_inner_microstep: 849.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2135 [2024-06-11 01:46:36,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.63 | bwd_microstep: 987.54 | bwd_inner_microstep: 987.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3663 [2024-06-11 01:46:38,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.48 | bwd_microstep: 1550.41 | bwd_inner_microstep: 1550.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-11 01:46:40,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1706.51 | bwd_inner_microstep: 1706.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2011 [2024-06-11 01:46:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.57 | bwd_microstep: 897.75 | bwd_inner_microstep: 897.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-11 01:46:44,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.80 | bwd_microstep: 1709.34 | bwd_inner_microstep: 1709.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-11 01:46:46,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.62 | bwd_microstep: 1481.08 | bwd_inner_microstep: 1481.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 01:46:48,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.14 | bwd_microstep: 1288.91 | bwd_inner_microstep: 1288.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 01:46:50,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1552.27 | bwd_inner_microstep: 1552.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-11 01:46:52,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1512.41 | bwd_inner_microstep: 1512.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3515 [2024-06-11 01:46:54,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1351.42 | bwd_inner_microstep: 1351.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 01:46:56,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.75 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810 [2024-06-11 01:46:58,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.47 | bwd_microstep: 1761.65 | bwd_inner_microstep: 1761.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2181 [2024-06-11 01:47:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.90 | bwd_microstep: 889.00 | bwd_inner_microstep: 888.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-11 01:47:02,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.16 | bwd_microstep: 1497.13 | bwd_inner_microstep: 1497.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3263 [2024-06-11 01:47:03,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.48 | bwd_microstep: 1364.77 | bwd_inner_microstep: 1364.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-11 01:47:05,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.87 | bwd_microstep: 1183.91 | bwd_inner_microstep: 1183.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2275 [2024-06-11 01:47:06,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.72 | bwd_microstep: 877.35 | bwd_inner_microstep: 877.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-11 01:47:08,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1402.57 | bwd_inner_microstep: 1402.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-11 01:47:10,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.23 | bwd_microstep: 1456.16 | bwd_inner_microstep: 1456.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-11 01:47:12,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.95 | bwd_microstep: 1298.61 | bwd_inner_microstep: 1298.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-11 01:47:16,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 01:47:16,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.21 | bwd_microstep: 3533.45 | bwd_inner_microstep: 2156.06 | bwd_allreduce_microstep: 1377.33 | step_microstep: 38.01 [2024-06-11 01:47:16,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16019.68 | bwd: 44791.95 | bwd_inner: 43413.71 | bwd_allreduce: 1377.57 | step: 39.46 {'loss': 1.2031, 'learning_rate': 2.6049127242961005e-06, 'epoch': 0.84} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 01:47:18,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.59 | bwd_microstep: 1469.72 | bwd_inner_microstep: 1469.56 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3988 [2024-06-11 01:47:20,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.04 | bwd_microstep: 1435.96 | bwd_inner_microstep: 1435.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 01:47:22,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1382.08 | bwd_inner_microstep: 1382.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 01:47:24,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.97 | bwd_microstep: 1274.13 | bwd_inner_microstep: 1274.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 01:47:26,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1375.63 | bwd_inner_microstep: 1375.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 01:47:28,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.82 | bwd_microstep: 1282.34 | bwd_inner_microstep: 1282.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 01:47:29,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1248.16 | bwd_inner_microstep: 1248.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409 [2024-06-11 01:47:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.21 | bwd_microstep: 1179.31 | bwd_inner_microstep: 1179.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-11 01:47:32,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.80 | bwd_microstep: 792.49 | bwd_inner_microstep: 792.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 01:47:34,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1286.28 | bwd_inner_microstep: 1286.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758 [2024-06-11 01:47:36,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.01 | bwd_microstep: 1539.13 | bwd_inner_microstep: 1539.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504 [2024-06-11 01:47:38,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.20 | bwd_microstep: 1314.69 | bwd_inner_microstep: 1314.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-11 01:47:40,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.23 | bwd_microstep: 1489.79 | bwd_inner_microstep: 1489.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 01:47:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1348.48 | bwd_inner_microstep: 1348.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3424 [2024-06-11 01:47:44,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1407.05 | bwd_inner_microstep: 1407.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2918 [2024-06-11 01:47:45,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.50 | bwd_microstep: 1189.81 | bwd_inner_microstep: 1189.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3663 [2024-06-11 01:47:48,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.21 | bwd_microstep: 1717.32 | bwd_inner_microstep: 1717.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3513 [2024-06-11 01:47:50,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1337.70 | bwd_inner_microstep: 1337.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940 [2024-06-11 01:47:52,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.74 | bwd_microstep: 1599.54 | bwd_inner_microstep: 1599.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3582 [2024-06-11 01:47:54,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1307.79 | bwd_inner_microstep: 1307.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2590 [2024-06-11 01:47:55,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.09 | bwd_microstep: 1070.35 | bwd_inner_microstep: 1070.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-11 01:47:57,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.78 | bwd_microstep: 1491.97 | bwd_inner_microstep: 1491.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-11 01:47:59,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.07 | bwd_microstep: 1158.48 | bwd_inner_microstep: 1158.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 01:48:01,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.80 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3709 [2024-06-11 01:48:02,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1236.21 | bwd_inner_microstep: 1236.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-11 01:48:04,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.83 | bwd_microstep: 1282.32 | bwd_inner_microstep: 1282.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1905 [2024-06-11 01:48:05,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.07 | bwd_microstep: 684.81 | bwd_inner_microstep: 684.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3777 [2024-06-11 01:48:07,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1259.65 | bwd_inner_microstep: 1259.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562 [2024-06-11 01:48:09,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.42 | bwd_microstep: 1597.19 | bwd_inner_microstep: 1597.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 01:48:11,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1375.13 | bwd_inner_microstep: 1375.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3732 [2024-06-11 01:48:13,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.50 | bwd_microstep: 1733.02 | bwd_inner_microstep: 1733.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650 [2024-06-11 01:48:19,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.09 | optimizer_step: 6.63 [2024-06-11 01:48:19,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.06 | bwd_microstep: 5284.43 | bwd_inner_microstep: 1712.54 | bwd_allreduce_microstep: 3571.84 | step_microstep: 38.02 [2024-06-11 01:48:19,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16034.21 | bwd: 46430.22 | bwd_inner: 42857.36 | bwd_allreduce: 3572.13 | step: 39.55 {'loss': 1.1829, 'learning_rate': 2.586420924180837e-06, 'epoch': 0.84} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 01:48:21,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.11 | bwd_microstep: 1369.61 | bwd_inner_microstep: 1369.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 01:48:23,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1373.71 | bwd_inner_microstep: 1373.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-11 01:48:25,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1648.55 | bwd_inner_microstep: 1648.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3492 [2024-06-11 01:48:27,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1413.34 | bwd_inner_microstep: 1413.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 01:48:29,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.76 | bwd_microstep: 1278.68 | bwd_inner_microstep: 1278.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 01:48:31,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416 [2024-06-11 01:48:32,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.99 | bwd_microstep: 1278.93 | bwd_inner_microstep: 1278.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 01:48:33,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.65 | bwd_microstep: 791.10 | bwd_inner_microstep: 791.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 01:48:35,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.73 | bwd_microstep: 1247.51 | bwd_inner_microstep: 1247.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 01:48:37,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.61 | bwd_microstep: 1287.32 | bwd_inner_microstep: 1287.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-11 01:48:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.04 | bwd_microstep: 1278.48 | bwd_inner_microstep: 1278.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2461 [2024-06-11 01:48:40,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.76 | bwd_microstep: 1046.66 | bwd_inner_microstep: 1046.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501 [2024-06-11 01:48:42,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1552.11 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 01:48:44,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.70 | bwd_microstep: 1485.74 | bwd_inner_microstep: 1485.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 01:48:46,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.19 | bwd_microstep: 1246.45 | bwd_inner_microstep: 1246.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3632 [2024-06-11 01:48:48,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.77 | bwd_microstep: 1249.38 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3519 [2024-06-11 01:48:50,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.27 | bwd_microstep: 1443.82 | bwd_inner_microstep: 1443.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-11 01:48:52,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.45 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658 [2024-06-11 01:48:54,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.54 | bwd_microstep: 1423.83 | bwd_inner_microstep: 1423.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 01:48:56,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1489.55 | bwd_inner_microstep: 1489.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3523 [2024-06-11 01:48:58,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.45 | bwd_microstep: 1325.49 | bwd_inner_microstep: 1325.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702 [2024-06-11 01:49:00,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.54 | bwd_microstep: 1331.07 | bwd_inner_microstep: 1331.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 01:49:02,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1297.45 | bwd_inner_microstep: 1297.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 01:49:03,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.04 | bwd_microstep: 1287.25 | bwd_inner_microstep: 1287.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-11 01:49:05,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.35 | bwd_microstep: 1449.63 | bwd_inner_microstep: 1449.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 01:49:07,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.36 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 01:49:09,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1250.95 | bwd_inner_microstep: 1250.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 01:49:11,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 01:49:13,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1375.85 | bwd_inner_microstep: 1375.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 01:49:15,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.46 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-11 01:49:17,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.82 | bwd_microstep: 1544.34 | bwd_inner_microstep: 1544.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3577 [2024-06-11 01:49:19,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.00 | optimizer_step: 6.60 [2024-06-11 01:49:19,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.68 | bwd_microstep: 2215.98 | bwd_inner_microstep: 1443.60 | bwd_allreduce_microstep: 772.33 | step_microstep: 37.38 [2024-06-11 01:49:19,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16150.24 | bwd: 43925.16 | bwd_inner: 43151.93 | bwd_allreduce: 772.56 | step: 38.86 {'loss': 1.1475, 'learning_rate': 2.5679904546005507e-06, 'epoch': 0.84} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 01:49:21,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1373.83 | bwd_inner_microstep: 1373.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857 [2024-06-11 01:49:22,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.36 | bwd_microstep: 674.99 | bwd_inner_microstep: 674.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3847 [2024-06-11 01:49:24,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.05 | bwd_microstep: 1362.75 | bwd_inner_microstep: 1362.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 01:49:26,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1378.98 | bwd_inner_microstep: 1378.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 01:49:28,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1542.98 | bwd_inner_microstep: 1542.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 01:49:30,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1249.41 | bwd_inner_microstep: 1249.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-11 01:49:32,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.40 | bwd_microstep: 1188.22 | bwd_inner_microstep: 1188.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-11 01:49:33,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.12 | bwd_microstep: 1342.49 | bwd_inner_microstep: 1342.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-11 01:49:36,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.47 | bwd_microstep: 1650.49 | bwd_inner_microstep: 1650.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-11 01:49:37,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.52 | bwd_microstep: 680.46 | bwd_inner_microstep: 680.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3979 [2024-06-11 01:49:39,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.42 | bwd_microstep: 1745.38 | bwd_inner_microstep: 1745.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-11 01:49:41,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.13 | bwd_microstep: 1605.15 | bwd_inner_microstep: 1605.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-11 01:49:43,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.15 | bwd_microstep: 1308.54 | bwd_inner_microstep: 1308.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397 [2024-06-11 01:49:45,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1435.08 | bwd_inner_microstep: 1435.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3465 [2024-06-11 01:49:47,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.20 | bwd_microstep: 1604.68 | bwd_inner_microstep: 1604.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-11 01:49:49,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.77 | bwd_microstep: 1188.41 | bwd_inner_microstep: 1188.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3520 [2024-06-11 01:49:51,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.03 | bwd_microstep: 1318.08 | bwd_inner_microstep: 1318.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-11 01:49:53,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1496.57 | bwd_inner_microstep: 1496.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2291 [2024-06-11 01:49:54,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.17 | bwd_microstep: 975.29 | bwd_inner_microstep: 975.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-11 01:49:56,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.64 | bwd_microstep: 1509.70 | bwd_inner_microstep: 1509.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 01:49:58,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.33 | bwd_inner_microstep: 1380.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3825 [2024-06-11 01:50:00,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.61 | bwd_microstep: 1510.55 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2383 [2024-06-11 01:50:02,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.92 | bwd_microstep: 933.00 | bwd_inner_microstep: 932.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3605 [2024-06-11 01:50:04,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.64 | bwd_microstep: 1454.18 | bwd_inner_microstep: 1454.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3381 [2024-06-11 01:50:05,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.11 | bwd_microstep: 1367.29 | bwd_inner_microstep: 1367.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-11 01:50:08,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.32 | bwd_microstep: 1528.16 | bwd_inner_microstep: 1528.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 01:50:10,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1449.09 | bwd_inner_microstep: 1449.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2055 [2024-06-11 01:50:11,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.56 | bwd_microstep: 815.68 | bwd_inner_microstep: 815.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597 [2024-06-11 01:50:13,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.84 | bwd_microstep: 1402.78 | bwd_inner_microstep: 1402.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-11 01:50:15,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.29 | bwd_microstep: 1504.57 | bwd_inner_microstep: 1504.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3781 [2024-06-11 01:50:17,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.03 | bwd_microstep: 1352.13 | bwd_inner_microstep: 1352.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-11 01:50:21,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 01:50:21,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.88 | bwd_microstep: 3657.59 | bwd_inner_microstep: 1576.67 | bwd_allreduce_microstep: 2080.87 | step_microstep: 37.82 [2024-06-11 01:50:21,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16011.12 | bwd: 44986.85 | bwd_inner: 42905.07 | bwd_allreduce: 2081.10 | step: 39.24 {'loss': 1.1754, 'learning_rate': 2.5496213804672663e-06, 'epoch': 0.84} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3393 [2024-06-11 01:50:23,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.61 | bwd_microstep: 1302.29 | bwd_inner_microstep: 1302.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3064 [2024-06-11 01:50:24,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.33 | bwd_microstep: 1178.58 | bwd_inner_microstep: 1178.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2394 [2024-06-11 01:50:26,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.99 | bwd_microstep: 1000.56 | bwd_inner_microstep: 1000.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801 [2024-06-11 01:50:28,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.66 | bwd_microstep: 1445.71 | bwd_inner_microstep: 1445.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466 [2024-06-11 01:50:29,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.59 | bwd_microstep: 1212.15 | bwd_inner_microstep: 1212.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1900 [2024-06-11 01:50:30,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.68 | bwd_microstep: 774.11 | bwd_inner_microstep: 774.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-11 01:50:32,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1277.99 | bwd_inner_microstep: 1277.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-11 01:50:33,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 797.10 | bwd_inner_microstep: 797.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3720 [2024-06-11 01:50:35,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1533.27 | bwd_inner_microstep: 1533.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3975 [2024-06-11 01:50:38,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.62 | bwd_microstep: 1606.67 | bwd_inner_microstep: 1606.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3485 [2024-06-11 01:50:39,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.68 | bwd_microstep: 1265.28 | bwd_inner_microstep: 1265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-11 01:50:42,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.81 | bwd_microstep: 1611.96 | bwd_inner_microstep: 1611.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-11 01:50:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.88 | bwd_microstep: 1718.83 | bwd_inner_microstep: 1718.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2948 [2024-06-11 01:50:45,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.51 | bwd_microstep: 1008.74 | bwd_inner_microstep: 1008.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 01:50:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1285.06 | bwd_inner_microstep: 1285.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-11 01:50:49,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1290.17 | bwd_inner_microstep: 1290.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 01:50:51,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1412.66 | bwd_inner_microstep: 1412.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3449 [2024-06-11 01:50:52,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.91 | bwd_microstep: 1159.09 | bwd_inner_microstep: 1159.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-11 01:50:54,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.44 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-11 01:50:55,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.19 | bwd_microstep: 876.51 | bwd_inner_microstep: 876.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 01:50:57,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1387.54 | bwd_inner_microstep: 1387.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-11 01:50:59,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.99 | bwd_microstep: 1535.10 | bwd_inner_microstep: 1535.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1993 [2024-06-11 01:51:00,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.76 | bwd_microstep: 773.10 | bwd_inner_microstep: 773.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-11 01:51:02,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.96 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 01:51:04,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.86 | bwd_microstep: 1551.26 | bwd_inner_microstep: 1551.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3529 [2024-06-11 01:51:06,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.00 | bwd_microstep: 1437.64 | bwd_inner_microstep: 1437.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425 [2024-06-11 01:51:07,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.00 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3578 [2024-06-11 01:51:10,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.11 | bwd_microstep: 1631.93 | bwd_inner_microstep: 1631.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 01:51:12,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1493.23 | bwd_inner_microstep: 1493.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-11 01:51:14,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1451.00 | bwd_inner_microstep: 1450.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3464 [2024-06-11 01:51:16,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.08 | bwd_microstep: 1523.68 | bwd_inner_microstep: 1523.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-11 01:51:23,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.18 | optimizer_step: 6.59 [2024-06-11 01:51:23,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.94 | bwd_microstep: 6894.84 | bwd_inner_microstep: 1513.78 | bwd_allreduce_microstep: 5381.00 | step_microstep: 38.27 [2024-06-11 01:51:23,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15411.13 | bwd: 46657.41 | bwd_inner: 41275.50 | bwd_allreduce: 5381.23 | step: 39.72 4:40:48, 61.04s/it] 84%|████████▍ | 1450/1726 [25:08:52<4:40:48, 61.04s/it] 84%|████████▍ | 1451/1726 [25:09:53<4:39:55, 61.08s/it] 84%|████████▍ | 1451/1726 [25:09:53<4:39:55, 61.08s/it] 84%|████████▍ | 1452/1726 [25:10:56<4:41:16, 61.59s/it] 84%|████████▍ | 1452/1726 [25:10:56<4:41:16, 61.59s/it] 84%|████████▍ | 1453/1726 [25:11:56<4:38:37, 61.24s/it] 84%|████████▍ | 1453/1726 [25:11:56<4:38:37, 61.24s/it] 84%|████████▍ | 1454/1726 [25:12:58<4:37:43, 61.26s/it] 84%|████████▍ | 1454/1726 [25:12:58<4:37:43, 61.26s/it] 84%|████████▍ | 1455{'loss': 1.1774, 'learning_rate': 2.531313766476757e-06, 'epoch': 0.84} dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1888 [2024-06-11 01:51:24,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.22 | bwd_microstep: 704.29 | bwd_inner_microstep: 704.19 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3592 [2024-06-11 01:51:26,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.66 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415 [2024-06-11 01:51:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1304.55 | bwd_inner_microstep: 1304.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 01:51:30,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1442.20 | bwd_inner_microstep: 1442.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 01:51:32,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.08 | bwd_microstep: 1277.51 | bwd_inner_microstep: 1277.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3748 [2024-06-11 01:51:34,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.11 | bwd_microstep: 1365.38 | bwd_inner_microstep: 1365.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-11 01:51:36,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.92 | bwd_microstep: 1477.34 | bwd_inner_microstep: 1477.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-11 01:51:38,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.01 | bwd_microstep: 1404.45 | bwd_inner_microstep: 1404.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 01:51:39,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1382.50 | bwd_inner_microstep: 1382.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-11 01:51:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.99 | bwd_microstep: 1534.56 | bwd_inner_microstep: 1534.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 01:51:44,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-11 01:51:46,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.01 | bwd_microstep: 1632.08 | bwd_inner_microstep: 1632.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 01:51:48,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.90 | bwd_microstep: 1354.99 | bwd_inner_microstep: 1354.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 01:51:50,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1378.91 | bwd_inner_microstep: 1378.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 01:51:52,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.09 | bwd_microstep: 1374.10 | bwd_inner_microstep: 1374.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-11 01:51:53,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.21 | bwd_microstep: 1254.90 | bwd_inner_microstep: 1254.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2131 [2024-06-11 01:51:55,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.12 | bwd_microstep: 1021.63 | bwd_inner_microstep: 1021.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 01:51:57,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1377.31 | bwd_inner_microstep: 1377.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-11 01:51:58,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.19 | bwd_microstep: 1183.62 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 01:52:00,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.92 | bwd_microstep: 1292.84 | bwd_inner_microstep: 1292.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3753 [2024-06-11 01:52:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.72 | bwd_microstep: 1400.90 | bwd_inner_microstep: 1400.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-11 01:52:04,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.68 | bwd_microstep: 1160.33 | bwd_inner_microstep: 1160.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3311 [2024-06-11 01:52:05,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.77 | bwd_microstep: 1228.71 | bwd_inner_microstep: 1228.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2031 [2024-06-11 01:52:06,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.65 | bwd_microstep: 809.21 | bwd_inner_microstep: 809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-11 01:52:08,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1389.09 | bwd_inner_microstep: 1389.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 01:52:10,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066 [2024-06-11 01:52:13,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.87 | bwd_microstep: 1618.63 | bwd_inner_microstep: 1618.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3473 [2024-06-11 01:52:14,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.84 | bwd_microstep: 1326.06 | bwd_inner_microstep: 1326.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3568 [2024-06-11 01:52:17,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.36 | bwd_microstep: 1591.70 | bwd_inner_microstep: 1591.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3063 [2024-06-11 01:52:18,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.97 | bwd_microstep: 1270.97 | bwd_inner_microstep: 1270.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3773 [2024-06-11 01:52:21,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.93 | bwd_microstep: 1737.36 | bwd_inner_microstep: 1737.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3584 [2024-06-11 01:52:25,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.95 | optimizer_gradients: 4.07 | optimizer_step: 6.62 [2024-06-11 01:52:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.65 | bwd_microstep: 3366.63 | bwd_inner_microstep: 1668.37 | bwd_allreduce_microstep: 1698.22 | step_microstep: 37.86 [2024-06-11 01:52:25,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16163.49 | bwd: 44941.64 | bwd_inner: 43242.43 | bwd_allreduce: 1698.50 | step: 39.31 {'loss': 1.1836, 'learning_rate': 2.5130676771083585e-06, 'epoch': 0.84} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1958 [2024-06-11 01:52:26,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.80 | bwd_microstep: 885.09 | bwd_inner_microstep: 885.00 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3944 [2024-06-11 01:52:28,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.54 | bwd_microstep: 1594.98 | bwd_inner_microstep: 1594.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3873 [2024-06-11 01:52:30,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.17 | bwd_microstep: 1679.60 | bwd_inner_microstep: 1679.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 01:52:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.41 | bwd_microstep: 1379.05 | bwd_inner_microstep: 1379.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 01:52:34,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.44 | bwd_microstep: 1244.33 | bwd_inner_microstep: 1244.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 01:52:36,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.70 | bwd_microstep: 1281.34 | bwd_inner_microstep: 1281.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-11 01:52:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.73 | bwd_microstep: 790.11 | bwd_inner_microstep: 790.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 01:52:39,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1287.78 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 01:52:40,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.79 | bwd_microstep: 1246.55 | bwd_inner_microstep: 1246.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 01:52:42,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.51 | bwd_microstep: 1280.46 | bwd_inner_microstep: 1280.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 01:52:44,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.57 | bwd_microstep: 1251.86 | bwd_inner_microstep: 1251.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3501 [2024-06-11 01:52:46,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.84 | bwd_microstep: 1221.90 | bwd_inner_microstep: 1221.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 01:52:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.43 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1970 [2024-06-11 01:52:49,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.91 | bwd_microstep: 892.35 | bwd_inner_microstep: 892.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3654 [2024-06-11 01:52:51,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.36 | bwd_microstep: 1612.68 | bwd_inner_microstep: 1612.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 01:52:53,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3495 [2024-06-11 01:52:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1413.37 | bwd_inner_microstep: 1413.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-11 01:52:57,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1557.93 | bwd_inner_microstep: 1557.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2294 [2024-06-11 01:52:59,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.99 | bwd_microstep: 1072.68 | bwd_inner_microstep: 1072.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 01:53:00,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.99 | bwd_microstep: 1284.33 | bwd_inner_microstep: 1284.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3438 [2024-06-11 01:53:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.69 | bwd_microstep: 1300.42 | bwd_inner_microstep: 1300.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-11 01:53:04,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.16 | bwd_microstep: 1613.81 | bwd_inner_microstep: 1613.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827 [2024-06-11 01:53:06,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.13 | bwd_microstep: 1388.61 | bwd_inner_microstep: 1388.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-11 01:53:08,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1392.80 | bwd_inner_microstep: 1392.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 01:53:10,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.72 | bwd_microstep: 1494.44 | bwd_inner_microstep: 1494.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391 [2024-06-11 01:53:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.81 | bwd_microstep: 1338.56 | bwd_inner_microstep: 1338.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3813 [2024-06-11 01:53:14,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.06 | bwd_microstep: 1506.63 | bwd_inner_microstep: 1506.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3593 [2024-06-11 01:53:17,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.66 | bwd_microstep: 1705.66 | bwd_inner_microstep: 1705.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192 [2024-06-11 01:53:18,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.98 | bwd_microstep: 956.56 | bwd_inner_microstep: 956.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 01:53:20,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1403.36 | bwd_inner_microstep: 1403.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3582 [2024-06-11 01:53:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.65 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3773 [2024-06-11 01:53:28,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 01:53:28,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 5843.81 | bwd_inner_microstep: 1746.36 | bwd_allreduce_microstep: 4097.40 | step_microstep: 39.43 [2024-06-11 01:53:28,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15991.91 | bwd: 47035.26 | bwd_inner: 42936.88 | bwd_allreduce: 4097.67 | step: 40.91 {'loss': 1.2132, 'learning_rate': 2.494883176624694e-06, 'epoch': 0.84} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 01:53:30,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1276.09 | bwd_inner_microstep: 1276.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:53:32,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1342.35 | bwd_inner_microstep: 1342.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:53:33,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.98 | bwd_microstep: 1346.50 | bwd_inner_microstep: 1346.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 01:53:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.94 | bwd_microstep: 1650.42 | bwd_inner_microstep: 1650.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3801 [2024-06-11 01:53:38,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.86 | bwd_microstep: 1444.86 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 01:53:39,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.33 | bwd_microstep: 1246.41 | bwd_inner_microstep: 1246.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 01:53:41,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.19 | bwd_microstep: 1396.61 | bwd_inner_microstep: 1396.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 01:53:43,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-11 01:53:45,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1527.11 | bwd_inner_microstep: 1527.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 01:53:47,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.21 | bwd_microstep: 1258.43 | bwd_inner_microstep: 1258.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2029 [2024-06-11 01:53:48,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.84 | bwd_microstep: 904.92 | bwd_inner_microstep: 904.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-11 01:53:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.02 | bwd_microstep: 1338.29 | bwd_inner_microstep: 1338.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3512 [2024-06-11 01:53:52,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.60 | bwd_microstep: 1519.04 | bwd_inner_microstep: 1519.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3690 [2024-06-11 01:53:54,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.16 | bwd_microstep: 1671.11 | bwd_inner_microstep: 1671.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3646 [2024-06-11 01:53:56,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.40 | bwd_microstep: 1312.80 | bwd_inner_microstep: 1312.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3428 [2024-06-11 01:53:58,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.33 | bwd_microstep: 1470.33 | bwd_inner_microstep: 1470.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 01:54:00,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.94 | bwd_microstep: 1389.84 | bwd_inner_microstep: 1389.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3651 [2024-06-11 01:54:02,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.95 | bwd_microstep: 1413.82 | bwd_inner_microstep: 1413.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 01:54:04,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.49 | bwd_microstep: 1390.11 | bwd_inner_microstep: 1390.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-11 01:54:06,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1489.42 | bwd_inner_microstep: 1489.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 01:54:08,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.86 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1498.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137 [2024-06-11 01:54:09,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.42 | bwd_microstep: 932.49 | bwd_inner_microstep: 932.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3685 [2024-06-11 01:54:12,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.38 | bwd_microstep: 1556.03 | bwd_inner_microstep: 1556.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 01:54:14,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.37 | bwd_microstep: 1554.90 | bwd_inner_microstep: 1554.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-11 01:54:16,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 1432.39 | bwd_inner_microstep: 1432.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-11 01:54:18,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.01 | bwd_microstep: 1507.59 | bwd_inner_microstep: 1507.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2009 [2024-06-11 01:54:19,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.09 | bwd_microstep: 832.19 | bwd_inner_microstep: 832.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2078 [2024-06-11 01:54:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.71 | bwd_microstep: 916.91 | bwd_inner_microstep: 916.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3561 [2024-06-11 01:54:22,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.04 | bwd_microstep: 1427.18 | bwd_inner_microstep: 1427.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275 [2024-06-11 01:54:23,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.19 | bwd_microstep: 909.41 | bwd_inner_microstep: 909.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-11 01:54:25,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1399.48 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-11 01:54:30,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.04 | optimizer_step: 6.61 [2024-06-11 01:54:30,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.52 | bwd_microstep: 3817.71 | bwd_inner_microstep: 2231.56 | bwd_allreduce_microstep: 1586.11 | step_microstep: 37.50 [2024-06-11 01:54:30,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16157.44 | bwd: 45418.75 | bwd_inner: 43831.73 | bwd_allreduce: 1586.34 | step: 39.06 {'loss': 1.2044, 'learning_rate': 2.4767603290714812e-06, 'epoch': 0.84} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503 [2024-06-11 01:54:32,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1578.15 | bwd_inner_microstep: 1578.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3938 [2024-06-11 01:54:34,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.29 | bwd_microstep: 1692.56 | bwd_inner_microstep: 1692.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3874 [2024-06-11 01:54:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.14 | bwd_microstep: 1478.76 | bwd_inner_microstep: 1478.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-11 01:54:38,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.18 | bwd_microstep: 970.93 | bwd_inner_microstep: 970.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-11 01:54:40,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.05 | bwd_microstep: 1315.26 | bwd_inner_microstep: 1315.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 01:54:41,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432 [2024-06-11 01:54:42,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.29 | bwd_microstep: 1156.13 | bwd_inner_microstep: 1156.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 01:54:44,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1486.00 | bwd_inner_microstep: 1485.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-11 01:54:46,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.80 | bwd_microstep: 1149.26 | bwd_inner_microstep: 1149.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1365 [2024-06-11 01:54:47,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.47 | bwd_microstep: 551.98 | bwd_inner_microstep: 551.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414 [2024-06-11 01:54:48,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.19 | bwd_microstep: 1209.62 | bwd_inner_microstep: 1209.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501 [2024-06-11 01:54:51,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.03 | bwd_microstep: 1550.37 | bwd_inner_microstep: 1550.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 01:54:53,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.08 | bwd_microstep: 1494.31 | bwd_inner_microstep: 1494.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 01:54:55,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.36 | bwd_microstep: 1483.44 | bwd_inner_microstep: 1483.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-11 01:54:57,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1350.28 | bwd_inner_microstep: 1350.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2966 [2024-06-11 01:54:58,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.22 | bwd_microstep: 1102.30 | bwd_inner_microstep: 1102.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3680 [2024-06-11 01:55:00,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.69 | bwd_microstep: 1695.47 | bwd_inner_microstep: 1695.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 01:55:02,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.99 | bwd_microstep: 1249.22 | bwd_inner_microstep: 1249.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3937 [2024-06-11 01:55:04,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.25 | bwd_microstep: 1700.64 | bwd_inner_microstep: 1700.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 01:55:06,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.13 | bwd_microstep: 1296.73 | bwd_inner_microstep: 1296.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2172 [2024-06-11 01:55:07,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.52 | bwd_microstep: 885.44 | bwd_inner_microstep: 885.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3623 [2024-06-11 01:55:09,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.59 | bwd_microstep: 1444.82 | bwd_inner_microstep: 1444.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-11 01:55:11,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.41 | bwd_microstep: 974.51 | bwd_inner_microstep: 974.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3940 [2024-06-11 01:55:13,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.55 | bwd_microstep: 1602.37 | bwd_inner_microstep: 1602.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3919 [2024-06-11 01:55:15,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.77 | bwd_microstep: 1691.19 | bwd_inner_microstep: 1691.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 01:55:18,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.92 | bwd_microstep: 1644.89 | bwd_inner_microstep: 1644.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-11 01:55:19,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 697.42 | bwd_inner_microstep: 697.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-11 01:55:21,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.48 | bwd_microstep: 1664.97 | bwd_inner_microstep: 1664.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2039 [2024-06-11 01:55:22,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.65 | bwd_microstep: 844.63 | bwd_inner_microstep: 844.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-11 01:55:24,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.93 | bwd_microstep: 1342.90 | bwd_inner_microstep: 1342.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-11 01:55:26,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.47 | bwd_microstep: 1600.20 | bwd_inner_microstep: 1600.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3797 [2024-06-11 01:55:40,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.86 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-11 01:55:40,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.85 | bwd_microstep: 13049.66 | bwd_inner_microstep: 1703.26 | bwd_allreduce_microstep: 11346.33 | step_microstep: 38.82 [2024-06-11 01:55:40,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15751.68 | bwd: 53744.90 | bwd_inner: 42397.65 | bwd_allreduce: 11346.58 | step: 40.24 {'loss': 1.1637, 'learning_rate': 2.45869919827729e-06, 'epoch': 0.85} dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3403 [2024-06-11 01:55:42,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1377.67 | bwd_inner_microstep: 1377.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-11 01:55:43,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.49 | bwd_microstep: 1145.76 | bwd_inner_microstep: 1145.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3883 [2024-06-11 01:55:45,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.75 | bwd_microstep: 1482.19 | bwd_inner_microstep: 1482.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-11 01:55:47,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.98 | bwd_microstep: 1486.77 | bwd_inner_microstep: 1486.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3913 [2024-06-11 01:55:49,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.15 | bwd_microstep: 1547.46 | bwd_inner_microstep: 1547.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2036 [2024-06-11 01:55:51,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.72 | bwd_microstep: 745.95 | bwd_inner_microstep: 745.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 01:55:52,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.49 | bwd_microstep: 1343.68 | bwd_inner_microstep: 1343.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 01:55:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.59 | bwd_microstep: 1242.25 | bwd_inner_microstep: 1242.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 01:55:56,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3961 [2024-06-11 01:55:58,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.78 | bwd_microstep: 1558.32 | bwd_inner_microstep: 1558.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3494 [2024-06-11 01:56:00,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.42 | bwd_microstep: 1510.64 | bwd_inner_microstep: 1510.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 01:56:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1374.18 | bwd_inner_microstep: 1374.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496 [2024-06-11 01:56:04,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.07 | bwd_microstep: 1574.23 | bwd_inner_microstep: 1574.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3654 [2024-06-11 01:56:07,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.89 | bwd_microstep: 1817.30 | bwd_inner_microstep: 1817.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3465 [2024-06-11 01:56:09,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.88 | bwd_microstep: 1452.08 | bwd_inner_microstep: 1452.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3644 [2024-06-11 01:56:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 643.62 | bwd_microstep: 1775.65 | bwd_inner_microstep: 1775.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 01:56:13,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.48 | bwd_microstep: 1475.04 | bwd_inner_microstep: 1475.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2121 [2024-06-11 01:56:14,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.41 | bwd_microstep: 924.77 | bwd_inner_microstep: 924.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 01:56:16,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.80 | bwd_microstep: 1340.32 | bwd_inner_microstep: 1340.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-11 01:56:18,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.80 | bwd_microstep: 1255.88 | bwd_inner_microstep: 1255.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3531 [2024-06-11 01:56:20,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.55 | bwd_microstep: 1196.77 | bwd_inner_microstep: 1196.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 01:56:22,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1397.01 | bwd_inner_microstep: 1396.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-11 01:56:23,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1387.61 | bwd_inner_microstep: 1387.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3602 [2024-06-11 01:56:25,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.31 | bwd_microstep: 1458.89 | bwd_inner_microstep: 1458.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-11 01:56:27,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1395.53 | bwd_inner_microstep: 1395.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-11 01:56:30,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.74 | bwd_microstep: 1641.72 | bwd_inner_microstep: 1641.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-11 01:56:31,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.41 | bwd_microstep: 916.94 | bwd_inner_microstep: 916.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 01:56:33,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.77 | bwd_microstep: 1453.58 | bwd_inner_microstep: 1453.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3815 [2024-06-11 01:56:35,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1416.48 | bwd_inner_microstep: 1416.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 01:56:37,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.16 | bwd_microstep: 1556.38 | bwd_inner_microstep: 1556.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-11 01:56:39,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1590.98 | bwd_inner_microstep: 1590.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3587 [2024-06-11 01:56:59,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-11 01:56:59,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.67 | bwd_microstep: 19476.88 | bwd_inner_microstep: 1924.25 | bwd_allreduce_microstep: 17552.55 | step_microstep: 39.53 [2024-06-11 01:56:59,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16704.02 | bwd: 62565.57 | bwd_inner: 45012.07 | bwd_allreduce: 17552.80 | step: 40.98 /1726 [25:14:00<4:38:14, 61.60s/it] 84%|████████▍ | 1455/1726 [25:14:00<4:38:14, 61.60s/it] 84%|████████▍ | 1456/1726 [25:15:01<4:36:58, 61.55s/it] 84%|████████▍ | 1456/1726 [25:15:01<4:36:58, 61.55s/it] 84%|████████▍ | 1457/1726 [25:16:05<4:38:23, 62.09s/it] 84%|████████▍ | 1457/1726 [25:16:05<4:38:23, 62.09s/it] 84%|████████▍ | 1458/1726 [25:17:07<4:37:06, 62.04s/it] 84%|████████▍ | 1458/1726 [25:17:07<4:37:06, 62.04s/it] 85%|████████▍ | 1459/1726 [25:18:16<4:46:28, 64.38s/it] 85%|████████▍ | 1459/1726 [25:18:16<4:46:28, 64.38s/it] 85%|██████{'loss': 1.2112, 'learning_rate': 2.4406998478533384e-06, 'epoch': 0.85} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402 [2024-06-11 01:57:01,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.36 | bwd_microstep: 1427.42 | bwd_inner_microstep: 1427.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-11 01:57:04,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.69 | bwd_microstep: 1674.00 | bwd_inner_microstep: 1673.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 01:57:06,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1373.53 | bwd_inner_microstep: 1373.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416 [2024-06-11 01:57:07,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.51 | bwd_microstep: 1274.02 | bwd_inner_microstep: 1274.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-11 01:57:09,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.25 | bwd_microstep: 1533.11 | bwd_inner_microstep: 1533.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3522 [2024-06-11 01:57:11,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.17 | bwd_microstep: 1193.98 | bwd_inner_microstep: 1193.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 01:57:13,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1382.01 | bwd_inner_microstep: 1381.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 01:57:15,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.97 | bwd_microstep: 1254.07 | bwd_inner_microstep: 1254.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-11 01:57:16,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.43 | bwd_microstep: 1276.73 | bwd_inner_microstep: 1276.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-11 01:57:18,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.10 | bwd_microstep: 1424.43 | bwd_inner_microstep: 1424.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 01:57:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1483.90 | bwd_inner_microstep: 1483.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674 [2024-06-11 01:57:23,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.42 | bwd_microstep: 1615.63 | bwd_inner_microstep: 1615.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1962 [2024-06-11 01:57:24,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.07 | bwd_microstep: 856.96 | bwd_inner_microstep: 856.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2465 [2024-06-11 01:57:25,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.36 | bwd_microstep: 1045.66 | bwd_inner_microstep: 1045.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-11 01:57:27,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.23 | bwd_microstep: 1481.06 | bwd_inner_microstep: 1481.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105 [2024-06-11 01:57:29,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.20 | bwd_microstep: 918.53 | bwd_inner_microstep: 918.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 01:57:31,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1390.71 | bwd_inner_microstep: 1390.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1926 [2024-06-11 01:57:32,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.25 | bwd_microstep: 760.52 | bwd_inner_microstep: 760.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-11 01:57:34,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1627.98 | bwd_inner_microstep: 1627.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 01:57:36,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.33 | bwd_microstep: 1283.60 | bwd_inner_microstep: 1283.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 01:57:37,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.34 | bwd_microstep: 1283.99 | bwd_inner_microstep: 1283.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 01:57:39,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1418.08 | bwd_inner_microstep: 1418.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292 [2024-06-11 01:57:41,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 883.92 | bwd_inner_microstep: 883.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2287 [2024-06-11 01:57:42,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.04 | bwd_microstep: 939.18 | bwd_inner_microstep: 939.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 643 [2024-06-11 01:57:42,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.31 | bwd_microstep: 275.81 | bwd_inner_microstep: 275.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-11 01:57:44,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.07 | bwd_microstep: 1509.37 | bwd_inner_microstep: 1509.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3602 [2024-06-11 01:57:46,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.49 | bwd_microstep: 1517.25 | bwd_inner_microstep: 1517.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 01:57:48,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1400.81 | bwd_inner_microstep: 1400.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-11 01:57:50,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 915.04 | bwd_inner_microstep: 915.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 01:57:52,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1550.02 | bwd_inner_microstep: 1549.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3821 [2024-06-11 01:57:54,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.50 | bwd_microstep: 1750.46 | bwd_inner_microstep: 1750.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 01:58:00,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-11 01:58:00,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 5524.58 | bwd_inner_microstep: 1573.39 | bwd_allreduce_microstep: 3951.14 | step_microstep: 37.89 [2024-06-11 01:58:00,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15387.82 | bwd: 45246.39 | bwd_inner: 41294.33 | bwd_allreduce: 3951.37 | step: 39.44 {'loss': 1.1648, 'learning_rate': 2.4227623411932412e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2903 [2024-06-11 01:58:02,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1209.43 | bwd_inner_microstep: 1209.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 01:58:04,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.34 | bwd_microstep: 1240.40 | bwd_inner_microstep: 1240.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3915 [2024-06-11 01:58:06,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-11 01:58:08,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.16 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874 [2024-06-11 01:58:09,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.60 | bwd_microstep: 679.11 | bwd_inner_microstep: 679.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 01:58:10,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1244.06 | bwd_inner_microstep: 1244.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 01:58:12,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1285.05 | bwd_inner_microstep: 1285.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 01:58:14,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1246.86 | bwd_inner_microstep: 1246.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542 [2024-06-11 01:58:16,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.35 | bwd_microstep: 1450.23 | bwd_inner_microstep: 1450.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 01:58:18,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3744 [2024-06-11 01:58:20,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.32 | bwd_microstep: 1832.96 | bwd_inner_microstep: 1832.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2909 [2024-06-11 01:58:22,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.14 | bwd_microstep: 1186.73 | bwd_inner_microstep: 1186.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3489 [2024-06-11 01:58:24,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.76 | bwd_microstep: 1542.32 | bwd_inner_microstep: 1542.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 01:58:26,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.75 | bwd_microstep: 1387.23 | bwd_inner_microstep: 1387.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3672 [2024-06-11 01:58:28,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.38 | bwd_microstep: 1548.12 | bwd_inner_microstep: 1548.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 01:58:30,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1285.46 | bwd_inner_microstep: 1285.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2124 [2024-06-11 01:58:31,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.94 | bwd_microstep: 766.49 | bwd_inner_microstep: 766.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655 [2024-06-11 01:58:33,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.23 | bwd_microstep: 1422.91 | bwd_inner_microstep: 1422.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 01:58:35,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.10 | bwd_microstep: 1275.26 | bwd_inner_microstep: 1275.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-11 01:58:37,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.85 | bwd_microstep: 1609.18 | bwd_inner_microstep: 1609.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-11 01:58:38,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.52 | bwd_microstep: 974.32 | bwd_inner_microstep: 974.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1941 [2024-06-11 01:58:39,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.91 | bwd_microstep: 695.25 | bwd_inner_microstep: 695.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-11 01:58:40,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.15 | bwd_microstep: 801.05 | bwd_inner_microstep: 801.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-11 01:58:42,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1497.54 | bwd_inner_microstep: 1497.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 01:58:45,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1546.23 | bwd_inner_microstep: 1546.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-11 01:58:46,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.69 | bwd_microstep: 1406.08 | bwd_inner_microstep: 1406.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 01:58:49,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.68 | bwd_microstep: 1649.38 | bwd_inner_microstep: 1649.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3805 [2024-06-11 01:58:51,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.08 | bwd_microstep: 1291.35 | bwd_inner_microstep: 1291.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2274 [2024-06-11 01:58:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.00 | bwd_microstep: 824.33 | bwd_inner_microstep: 824.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3566 [2024-06-11 01:58:53,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.03 | bwd_microstep: 1299.63 | bwd_inner_microstep: 1299.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2713 [2024-06-11 01:58:55,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.49 | bwd_microstep: 1130.25 | bwd_inner_microstep: 1130.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1891 [2024-06-11 01:59:01,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 01:59:01,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.62 | bwd_microstep: 6009.53 | bwd_inner_microstep: 953.64 | bwd_allreduce_microstep: 5055.83 | step_microstep: 37.64 [2024-06-11 01:59:01,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15134.09 | bwd: 45606.73 | bwd_inner: 40549.99 | bwd_allreduce: 5056.06 | step: 39.09 {'loss': 1.2392, 'learning_rate': 2.4048867414728004e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-11 01:59:03,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1391.42 | bwd_inner_microstep: 1391.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3926 [2024-06-11 01:59:06,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.24 | bwd_microstep: 1689.51 | bwd_inner_microstep: 1689.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-11 01:59:07,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.25 | bwd_microstep: 1241.79 | bwd_inner_microstep: 1241.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787 [2024-06-11 01:59:09,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.36 | bwd_microstep: 1441.86 | bwd_inner_microstep: 1441.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-11 01:59:10,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.02 | bwd_microstep: 707.18 | bwd_inner_microstep: 707.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-11 01:59:13,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.39 | bwd_microstep: 1639.63 | bwd_inner_microstep: 1639.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 01:59:14,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.55 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026 [2024-06-11 01:59:15,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 805.88 | bwd_inner_microstep: 805.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3484 [2024-06-11 01:59:17,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1411.74 | bwd_inner_microstep: 1411.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3469 [2024-06-11 01:59:20,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.97 | bwd_microstep: 1538.91 | bwd_inner_microstep: 1538.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 01:59:22,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.06 | bwd_microstep: 1481.04 | bwd_inner_microstep: 1481.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 01:59:24,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.38 | bwd_microstep: 1503.91 | bwd_inner_microstep: 1503.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3658 [2024-06-11 01:59:26,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.24 | bwd_microstep: 1818.87 | bwd_inner_microstep: 1818.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1942 [2024-06-11 01:59:27,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.39 | bwd_microstep: 881.46 | bwd_inner_microstep: 881.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3505 [2024-06-11 01:59:29,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.76 | bwd_microstep: 1223.20 | bwd_inner_microstep: 1223.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3931 [2024-06-11 01:59:31,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.47 | bwd_microstep: 1397.56 | bwd_inner_microstep: 1397.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 01:59:33,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.06 | bwd_microstep: 1402.18 | bwd_inner_microstep: 1402.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2130 [2024-06-11 01:59:34,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.32 | bwd_microstep: 733.84 | bwd_inner_microstep: 733.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 01:59:36,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1252.46 | bwd_inner_microstep: 1252.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-11 01:59:38,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1509.29 | bwd_inner_microstep: 1509.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 01:59:40,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1400.01 | bwd_inner_microstep: 1399.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-11 01:59:41,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1183.20 | bwd_inner_microstep: 1183.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 01:59:43,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1552.82 | bwd_inner_microstep: 1552.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2273 [2024-06-11 01:59:45,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.74 | bwd_microstep: 1003.49 | bwd_inner_microstep: 1003.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3815 [2024-06-11 01:59:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.01 | bwd_microstep: 1753.37 | bwd_inner_microstep: 1753.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388 [2024-06-11 01:59:49,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1438.69 | bwd_inner_microstep: 1438.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3420 [2024-06-11 01:59:51,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.62 | bwd_microstep: 1442.56 | bwd_inner_microstep: 1442.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3454 [2024-06-11 01:59:53,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.58 | bwd_microstep: 1417.66 | bwd_inner_microstep: 1417.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 01:59:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.52 | bwd_microstep: 1446.98 | bwd_inner_microstep: 1446.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3771 [2024-06-11 01:59:57,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.57 | bwd_microstep: 1447.68 | bwd_inner_microstep: 1447.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3796 [2024-06-11 01:59:59,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 1476.23 | bwd_inner_microstep: 1476.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3763 [2024-06-11 02:00:02,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.19 | optimizer_step: 6.61 [2024-06-11 02:00:02,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 2722.63 | bwd_inner_microstep: 1512.92 | bwd_allreduce_microstep: 1209.66 | step_microstep: 37.84 [2024-06-11 02:00:02,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16151.72 | bwd: 44602.14 | bwd_inner: 43391.58 | bwd_allreduce: 1209.89 | step: 39.28 {'loss': 1.1369, 'learning_rate': 2.3870731116497915e-06, 'epoch': 0.85} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 02:00:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.64 | bwd_microstep: 1274.80 | bwd_inner_microstep: 1274.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1884 [2024-06-11 02:00:05,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.95 | bwd_microstep: 771.02 | bwd_inner_microstep: 771.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3921 [2024-06-11 02:00:07,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.96 | bwd_microstep: 1520.17 | bwd_inner_microstep: 1520.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4204 [2024-06-11 02:00:09,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.64 | bwd_microstep: 1458.84 | bwd_inner_microstep: 1458.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-11 02:00:12,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.66 | bwd_microstep: 1546.85 | bwd_inner_microstep: 1546.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-11 02:00:13,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.72 | bwd_microstep: 807.11 | bwd_inner_microstep: 807.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-11 02:00:15,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.48 | bwd_microstep: 1530.99 | bwd_inner_microstep: 1530.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 02:00:17,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.50 | bwd_microstep: 1279.56 | bwd_inner_microstep: 1279.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-11 02:00:18,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.10 | bwd_microstep: 1150.96 | bwd_inner_microstep: 1150.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 02:00:20,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1388.44 | bwd_inner_microstep: 1388.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2115 [2024-06-11 02:00:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.09 | bwd_microstep: 982.19 | bwd_inner_microstep: 982.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498 [2024-06-11 02:00:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.12 | bwd_microstep: 1428.08 | bwd_inner_microstep: 1428.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3630 [2024-06-11 02:00:26,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.25 | bwd_microstep: 1676.06 | bwd_inner_microstep: 1676.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 02:00:28,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1481.97 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2094 [2024-06-11 02:00:29,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.54 | bwd_microstep: 918.76 | bwd_inner_microstep: 918.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3631 [2024-06-11 02:00:31,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.43 | bwd_microstep: 1269.82 | bwd_inner_microstep: 1269.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-11 02:00:33,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.18 | bwd_microstep: 1613.29 | bwd_inner_microstep: 1613.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-11 02:00:35,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1515.40 | bwd_inner_microstep: 1515.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2090 [2024-06-11 02:00:36,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.78 | bwd_microstep: 918.50 | bwd_inner_microstep: 918.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-11 02:00:37,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.92 | bwd_microstep: 805.94 | bwd_inner_microstep: 805.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-11 02:00:39,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.04 | bwd_microstep: 1290.39 | bwd_inner_microstep: 1290.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 02:00:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.79 | bwd_microstep: 1658.58 | bwd_inner_microstep: 1658.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070 [2024-06-11 02:00:43,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.62 | bwd_microstep: 753.62 | bwd_inner_microstep: 753.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242 [2024-06-11 02:00:44,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.16 | bwd_microstep: 969.18 | bwd_inner_microstep: 969.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3549 [2024-06-11 02:00:46,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.08 | bwd_microstep: 1557.72 | bwd_inner_microstep: 1557.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2953 [2024-06-11 02:00:48,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.87 | bwd_microstep: 1329.45 | bwd_inner_microstep: 1329.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600 [2024-06-11 02:00:50,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.81 | bwd_microstep: 1430.22 | bwd_inner_microstep: 1430.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-11 02:00:52,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.95 | bwd_microstep: 1441.10 | bwd_inner_microstep: 1441.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-11 02:00:54,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.97 | bwd_microstep: 1505.20 | bwd_inner_microstep: 1505.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 02:00:56,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1513.83 | bwd_inner_microstep: 1513.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2019 [2024-06-11 02:00:57,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.39 | bwd_microstep: 839.67 | bwd_inner_microstep: 839.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3576 [2024-06-11 02:01:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 02:01:03,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 5369.07 | bwd_inner_microstep: 1922.71 | bwd_allreduce_microstep: 3446.31 | step_microstep: 37.83 [2024-06-11 02:01:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.65 | bwd: 44996.81 | bwd_inner: 41549.59 | bwd_allreduce: 3446.54 | step: 39.33 {'loss': 1.1548, 'learning_rate': 2.369321514463716e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3393 [2024-06-11 02:01:05,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1335.92 | bwd_inner_microstep: 1335.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891 [2024-06-11 02:01:07,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1583.83 | bwd_inner_microstep: 1583.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 02:01:09,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1240.97 | bwd_inner_microstep: 1240.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513 [2024-06-11 02:01:11,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.68 | bwd_microstep: 1318.54 | bwd_inner_microstep: 1318.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 02:01:13,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.76 | bwd_microstep: 1378.61 | bwd_inner_microstep: 1378.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 02:01:15,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.31 | bwd_microstep: 1382.06 | bwd_inner_microstep: 1382.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3743 [2024-06-11 02:01:17,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.01 | bwd_microstep: 1430.86 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 02:01:18,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-11 02:01:20,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.58 | bwd_microstep: 1149.60 | bwd_inner_microstep: 1149.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434 [2024-06-11 02:01:22,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.37 | bwd_microstep: 1308.71 | bwd_inner_microstep: 1308.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3717 [2024-06-11 02:01:24,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.17 | bwd_microstep: 1729.62 | bwd_inner_microstep: 1729.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 02:01:26,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.27 | bwd_microstep: 1482.13 | bwd_inner_microstep: 1482.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 02:01:28,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.87 | bwd_microstep: 1336.52 | bwd_inner_microstep: 1336.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653 [2024-06-11 02:01:30,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.99 | bwd_microstep: 1609.33 | bwd_inner_microstep: 1609.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3510 [2024-06-11 02:01:32,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.90 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-11 02:01:34,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1395.20 | bwd_inner_microstep: 1395.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 02:01:36,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1288.36 | bwd_inner_microstep: 1288.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914 [2024-06-11 02:01:37,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 717.41 | bwd_inner_microstep: 717.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-11 02:01:39,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1402.73 | bwd_inner_microstep: 1402.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 02:01:41,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.64 | bwd_microstep: 1397.52 | bwd_inner_microstep: 1397.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3532 [2024-06-11 02:01:43,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.14 | bwd_microstep: 1293.11 | bwd_inner_microstep: 1293.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058 [2024-06-11 02:01:44,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 817.54 | bwd_inner_microstep: 817.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 02:01:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1255.54 | bwd_inner_microstep: 1255.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1909 [2024-06-11 02:01:46,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.69 | bwd_microstep: 685.12 | bwd_inner_microstep: 685.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 02:01:49,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.92 | bwd_microstep: 1555.56 | bwd_inner_microstep: 1555.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-11 02:01:50,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.39 | bwd_microstep: 1300.03 | bwd_inner_microstep: 1300.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3435 [2024-06-11 02:01:52,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.48 | bwd_microstep: 1295.23 | bwd_inner_microstep: 1295.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3433 [2024-06-11 02:01:54,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.89 | bwd_microstep: 1443.68 | bwd_inner_microstep: 1443.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 02:01:56,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1351.00 | bwd_inner_microstep: 1350.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-11 02:01:58,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.35 | bwd_microstep: 1436.99 | bwd_inner_microstep: 1436.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3089 [2024-06-11 02:02:00,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.31 | bwd_microstep: 1150.49 | bwd_inner_microstep: 1150.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2978 [2024-06-11 02:02:04,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.06 | optimizer_step: 6.60 [2024-06-11 02:02:04,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.33 | bwd_microstep: 4292.24 | bwd_inner_microstep: 1360.87 | bwd_allreduce_microstep: 2931.32 | step_microstep: 37.95 [2024-06-11 02:02:04,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.06 | bwd: 45040.07 | bwd_inner: 42107.85 | bwd_allreduce: 2931.55 | step: 39.44 ██▍ | 1460/1726 [25:19:36<5:05:39, 68.95s/it] 85%|████████▍ | 1460/1726 [25:19:36<5:05:39, 68.95s/it] 85%|████████▍ | 1461/1726 [25:20:37<4:53:57, 66.56s/it] 85%|████████▍ | 1461/1726 [25:20:37<4:53:57, 66.56s/it] 85%|████████▍ | 1462/1726 [25:21:38<4:45:35, 64.91s/it] 85%|████████▍ | 1462/1726 [25:21:38<4:45:35, 64.91s/it] 85%|████████▍ | 1463/1726 [25:22:39<4:39:29, 63.76s/it] 85%|████████▍ | 1463/1726 [25:22:39<4:39:29, 63.76s/it] 85%|████████▍ | 1464/1726 [25:23:40<4:34:28, 62.86s/it] 85%|████████▍ | 1464/1726 [25:23:40<4:34:28, 62.86s/it] 85%|�{'loss': 1.2122, 'learning_rate': 2.3516320124356186e-06, 'epoch': 0.85} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537 [2024-06-11 02:02:06,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.02 | bwd_microstep: 1441.46 | bwd_inner_microstep: 1441.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 02:02:08,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.31 | bwd_microstep: 1384.71 | bwd_inner_microstep: 1384.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921 [2024-06-11 02:02:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.71 | bwd_microstep: 785.23 | bwd_inner_microstep: 785.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-11 02:02:12,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1653.70 | bwd_inner_microstep: 1653.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 02:02:14,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 1479.33 | bwd_inner_microstep: 1479.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 02:02:15,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1246.84 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-11 02:02:18,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.85 | bwd_microstep: 1642.07 | bwd_inner_microstep: 1642.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711 [2024-06-11 02:02:20,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.53 | bwd_microstep: 1628.99 | bwd_inner_microstep: 1628.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-11 02:02:22,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1247.68 | bwd_inner_microstep: 1247.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3440 [2024-06-11 02:02:24,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.34 | bwd_microstep: 1411.37 | bwd_inner_microstep: 1411.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3460 [2024-06-11 02:02:26,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1427.99 | bwd_inner_microstep: 1427.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3450 [2024-06-11 02:02:27,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.15 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-11 02:02:29,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.99 | bwd_microstep: 1491.78 | bwd_inner_microstep: 1491.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 02:02:31,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.42 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933 [2024-06-11 02:02:32,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.32 | bwd_microstep: 729.62 | bwd_inner_microstep: 729.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 02:02:34,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.21 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668 [2024-06-11 02:02:36,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.34 | bwd_microstep: 1418.65 | bwd_inner_microstep: 1418.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 02:02:38,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1391.24 | bwd_inner_microstep: 1391.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 02:02:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1354.48 | bwd_inner_microstep: 1354.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 02:02:42,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.73 | bwd_microstep: 1253.90 | bwd_inner_microstep: 1253.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3814 [2024-06-11 02:02:44,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.29 | bwd_microstep: 1416.40 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-11 02:02:45,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.46 | bwd_microstep: 1160.48 | bwd_inner_microstep: 1160.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2272 [2024-06-11 02:02:47,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.62 | bwd_microstep: 973.35 | bwd_inner_microstep: 973.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512 [2024-06-11 02:02:48,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.20 | bwd_microstep: 1226.93 | bwd_inner_microstep: 1226.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 02:02:50,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1387.56 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3050 [2024-06-11 02:02:52,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.11 | bwd_microstep: 1136.52 | bwd_inner_microstep: 1136.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-11 02:02:54,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1525.17 | bwd_inner_microstep: 1525.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 02:02:56,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.74 | bwd_microstep: 1489.90 | bwd_inner_microstep: 1489.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2057 [2024-06-11 02:02:57,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.80 | bwd_microstep: 1011.77 | bwd_inner_microstep: 1011.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3540 [2024-06-11 02:02:59,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1415.88 | bwd_inner_microstep: 1415.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 02:03:01,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2570 [2024-06-11 02:03:05,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 02:03:05,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.68 | bwd_microstep: 3113.32 | bwd_inner_microstep: 1216.27 | bwd_allreduce_microstep: 1896.99 | step_microstep: 37.58 [2024-06-11 02:03:05,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15865.54 | bwd: 44401.35 | bwd_inner: 42503.45 | bwd_allreduce: 1897.22 | step: 39.10 {'loss': 1.1704, 'learning_rate': 2.334004667867824e-06, 'epoch': 0.85} dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3523 [2024-06-11 02:03:07,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.84 | bwd_microstep: 1430.27 | bwd_inner_microstep: 1430.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-11 02:03:09,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.03 | bwd_microstep: 1302.64 | bwd_inner_microstep: 1302.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 02:03:11,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.34 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 02:03:13,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1480.77 | bwd_inner_microstep: 1480.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3763 [2024-06-11 02:03:15,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.82 | bwd_microstep: 1470.81 | bwd_inner_microstep: 1470.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-11 02:03:16,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.29 | bwd_microstep: 790.41 | bwd_inner_microstep: 790.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 02:03:18,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.88 | bwd_microstep: 1339.70 | bwd_inner_microstep: 1339.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3712 [2024-06-11 02:03:19,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.61 | bwd_microstep: 1328.46 | bwd_inner_microstep: 1328.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-11 02:03:21,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 793.52 | bwd_inner_microstep: 793.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-11 02:03:23,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1499.06 | bwd_inner_microstep: 1499.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-11 02:03:25,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.30 | bwd_microstep: 1421.35 | bwd_inner_microstep: 1421.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-11 02:03:26,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.44 | bwd_microstep: 1215.12 | bwd_inner_microstep: 1215.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 02:03:28,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1480.44 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-11 02:03:30,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.33 | bwd_microstep: 1417.70 | bwd_inner_microstep: 1417.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3516 [2024-06-11 02:03:32,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.73 | bwd_microstep: 1408.86 | bwd_inner_microstep: 1408.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-11 02:03:34,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3837 [2024-06-11 02:03:36,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.95 | bwd_microstep: 1756.45 | bwd_inner_microstep: 1756.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 02:03:38,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.28 | bwd_microstep: 1485.10 | bwd_inner_microstep: 1485.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 02:03:40,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.39 | bwd_microstep: 1291.34 | bwd_inner_microstep: 1291.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3497 [2024-06-11 02:03:42,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.55 | bwd_microstep: 1187.11 | bwd_inner_microstep: 1187.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3665 [2024-06-11 02:03:44,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.30 | bwd_microstep: 1623.15 | bwd_inner_microstep: 1623.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 02:03:46,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.16 | bwd_microstep: 1553.86 | bwd_inner_microstep: 1553.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-11 02:03:48,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.56 | bwd_microstep: 1310.67 | bwd_inner_microstep: 1310.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-11 02:03:50,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.96 | bwd_microstep: 1357.37 | bwd_inner_microstep: 1357.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2150 [2024-06-11 02:03:51,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.78 | bwd_microstep: 850.35 | bwd_inner_microstep: 850.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 892 [2024-06-11 02:03:52,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.41 | bwd_microstep: 368.64 | bwd_inner_microstep: 368.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2122 [2024-06-11 02:03:53,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.83 | bwd_microstep: 1025.20 | bwd_inner_microstep: 1025.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-11 02:03:55,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.23 | bwd_microstep: 1538.31 | bwd_inner_microstep: 1538.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2955 [2024-06-11 02:03:57,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.12 | bwd_microstep: 1096.25 | bwd_inner_microstep: 1096.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2289 [2024-06-11 02:03:58,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.90 | bwd_microstep: 937.84 | bwd_inner_microstep: 937.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3589 [2024-06-11 02:04:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.63 | bwd_microstep: 1803.04 | bwd_inner_microstep: 1803.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-11 02:04:08,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.58 [2024-06-11 02:04:08,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.36 | bwd_microstep: 6656.84 | bwd_inner_microstep: 1641.87 | bwd_allreduce_microstep: 5014.92 | step_microstep: 37.79 [2024-06-11 02:04:08,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15604.32 | bwd: 46877.53 | bwd_inner: 41861.71 | bwd_allreduce: 5015.15 | step: 39.23 {'loss': 1.1981, 'learning_rate': 2.3164395428437605e-06, 'epoch': 0.85} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1906 [2024-06-11 02:04:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.97 | bwd_microstep: 865.90 | bwd_inner_microstep: 865.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951 [2024-06-11 02:04:11,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.03 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3895 [2024-06-11 02:04:13,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.36 | bwd_microstep: 1683.51 | bwd_inner_microstep: 1683.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3794 [2024-06-11 02:04:16,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.37 | bwd_microstep: 1645.93 | bwd_inner_microstep: 1645.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 02:04:17,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.67 | bwd_microstep: 1279.19 | bwd_inner_microstep: 1279.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 02:04:19,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1248.90 | bwd_inner_microstep: 1248.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 02:04:21,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.03 | bwd_microstep: 1383.81 | bwd_inner_microstep: 1383.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 02:04:23,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.28 | bwd_microstep: 1401.55 | bwd_inner_microstep: 1401.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3764 [2024-06-11 02:04:25,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.23 | bwd_microstep: 1446.76 | bwd_inner_microstep: 1446.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-11 02:04:27,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.18 | bwd_microstep: 1188.27 | bwd_inner_microstep: 1188.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4072 [2024-06-11 02:04:29,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.36 | bwd_microstep: 1626.20 | bwd_inner_microstep: 1626.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3477 [2024-06-11 02:04:31,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.83 | bwd_microstep: 1442.69 | bwd_inner_microstep: 1442.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3717 [2024-06-11 02:04:33,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.71 | bwd_microstep: 1664.71 | bwd_inner_microstep: 1664.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3380 [2024-06-11 02:04:35,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.02 | bwd_microstep: 1386.99 | bwd_inner_microstep: 1386.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 02:04:37,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.55 | bwd_microstep: 1483.15 | bwd_inner_microstep: 1483.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482 [2024-06-11 02:04:39,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1404.89 | bwd_inner_microstep: 1404.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 02:04:41,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.95 | bwd_microstep: 1276.95 | bwd_inner_microstep: 1276.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-11 02:04:43,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.90 | bwd_microstep: 1180.30 | bwd_inner_microstep: 1180.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-11 02:04:44,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.22 | bwd_microstep: 1404.95 | bwd_inner_microstep: 1404.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3476 [2024-06-11 02:04:46,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1328.29 | bwd_inner_microstep: 1328.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 02:04:48,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1375.72 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-11 02:04:50,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.96 | bwd_microstep: 1453.10 | bwd_inner_microstep: 1453.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 02:04:52,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1252.70 | bwd_inner_microstep: 1252.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 02:04:54,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1508.83 | bwd_inner_microstep: 1508.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 02:04:56,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.11 | bwd_microstep: 1568.75 | bwd_inner_microstep: 1568.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3600 [2024-06-11 02:04:58,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.72 | bwd_microstep: 1338.01 | bwd_inner_microstep: 1337.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822 [2024-06-11 02:05:00,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.59 | bwd_microstep: 1588.56 | bwd_inner_microstep: 1588.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3716 [2024-06-11 02:05:02,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.20 | bwd_microstep: 1583.95 | bwd_inner_microstep: 1583.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560 [2024-06-11 02:05:04,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1430.18 | bwd_inner_microstep: 1430.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2234 [2024-06-11 02:05:06,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 390.35 | bwd_microstep: 1064.23 | bwd_inner_microstep: 1064.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-11 02:05:08,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.77 | bwd_microstep: 1646.48 | bwd_inner_microstep: 1646.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644 [2024-06-11 02:05:11,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.65 | optimizer_gradients: 4.03 | optimizer_step: 6.61 [2024-06-11 02:05:11,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 2635.71 | bwd_inner_microstep: 1494.83 | bwd_allreduce_microstep: 1140.83 | step_microstep: 38.35 [2024-06-11 02:05:11,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16854.87 | bwd: 46383.37 | bwd_inner: 45241.62 | bwd_allreduce: 1141.06 | step: 39.87 {'loss': 1.2488, 'learning_rate': 2.2989366992276917e-06, 'epoch': 0.85} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 02:05:13,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.84 | bwd_microstep: 1492.53 | bwd_inner_microstep: 1492.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3949 [2024-06-11 02:05:16,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.06 | bwd_microstep: 1555.40 | bwd_inner_microstep: 1555.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2451 [2024-06-11 02:05:17,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.23 | bwd_microstep: 946.55 | bwd_inner_microstep: 946.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301 [2024-06-11 02:05:18,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.73 | bwd_microstep: 973.52 | bwd_inner_microstep: 973.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-11 02:05:19,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.60 | bwd_microstep: 676.12 | bwd_inner_microstep: 676.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750 [2024-06-11 02:05:21,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.38 | bwd_microstep: 1536.74 | bwd_inner_microstep: 1536.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 02:05:23,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.71 | bwd_microstep: 1300.49 | bwd_inner_microstep: 1300.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-11 02:05:24,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.94 | bwd_microstep: 814.28 | bwd_inner_microstep: 814.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481 [2024-06-11 02:05:26,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.42 | bwd_microstep: 1214.44 | bwd_inner_microstep: 1214.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-11 02:05:27,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 790.37 | bwd_inner_microstep: 790.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-11 02:05:29,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.77 | bwd_microstep: 1422.42 | bwd_inner_microstep: 1422.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 02:05:31,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1254.82 | bwd_inner_microstep: 1254.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3515 [2024-06-11 02:05:32,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1190.34 | bwd_inner_microstep: 1190.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1951 [2024-06-11 02:05:34,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.55 | bwd_microstep: 890.17 | bwd_inner_microstep: 890.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-11 02:05:35,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1410.65 | bwd_inner_microstep: 1410.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-11 02:05:37,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1335.35 | bwd_inner_microstep: 1335.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3689 [2024-06-11 02:05:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.02 | bwd_microstep: 1264.02 | bwd_inner_microstep: 1264.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514 [2024-06-11 02:05:41,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 02:05:43,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.76 | bwd_microstep: 1299.84 | bwd_inner_microstep: 1299.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-11 02:05:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.06 | bwd_microstep: 1447.11 | bwd_inner_microstep: 1447.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 02:05:47,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1551.76 | bwd_inner_microstep: 1551.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535 [2024-06-11 02:05:49,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.22 | bwd_microstep: 1447.71 | bwd_inner_microstep: 1447.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-11 02:05:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.66 | bwd_microstep: 696.51 | bwd_inner_microstep: 696.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-11 02:05:52,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1422.70 | bwd_inner_microstep: 1422.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3716 [2024-06-11 02:05:54,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.77 | bwd_microstep: 1598.51 | bwd_inner_microstep: 1598.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555 [2024-06-11 02:05:56,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.89 | bwd_microstep: 1202.67 | bwd_inner_microstep: 1202.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-11 02:05:57,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.48 | bwd_microstep: 1297.76 | bwd_inner_microstep: 1297.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 02:05:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.24 | bwd_microstep: 1377.71 | bwd_inner_microstep: 1377.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-11 02:06:01,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.36 | bwd_microstep: 803.83 | bwd_inner_microstep: 803.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3807 [2024-06-11 02:06:03,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.01 | bwd_microstep: 1723.82 | bwd_inner_microstep: 1723.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440 [2024-06-11 02:06:05,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.38 | bwd_microstep: 1213.29 | bwd_inner_microstep: 1213.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3766 [2024-06-11 02:06:14,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-11 02:06:14,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 666.46 | bwd_microstep: 8999.02 | bwd_inner_microstep: 2082.58 | bwd_allreduce_microstep: 6916.37 | step_microstep: 39.29 [2024-06-11 02:06:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15153.81 | bwd: 47497.52 | bwd_inner: 40580.24 | bwd_allreduce: 6916.61 | step: 40.75 {'loss': 1.1867, 'learning_rate': 2.2814961986645525e-06, 'epoch': 0.85} �███████▍ | 1465/1726 [25:24:41<4:31:09, 62.34s/it] 85%|████████▍ | 1465/1726 [25:24:41<4:31:09, 62.34s/it] 85%|████████▍ | 1466/1726 [25:25:42<4:27:51, 61.81s/it] 85%|████████▍ | 1466/1726 [25:25:42<4:27:51, 61.81s/it] 85%|████████▍ | 1467/1726 [25:26:44<4:28:06, 62.11s/it] 85%|████████▍ | 1467/1726 [25:26:44<4:28:06, 62.11s/it] 85%|████████▌ | 1468/1726 [25:27:48<4:28:57, 62.55s/it] 85%|████████▌ | 1468/1726 [25:27:48<4:28:57, 62.55s/it] 85%|████████▌ | 1469/1726 [25:28:51<4:28:27, 62.68s/it] 85%|████████▌ | 1469/1726 [25:28:51<4:28:27, 62dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 02:06:16,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.12 | bwd_microstep: 1377.78 | bwd_inner_microstep: 1377.70 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 02:06:18,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 02:06:20,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.56 | bwd_microstep: 1650.94 | bwd_inner_microstep: 1650.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-11 02:06:23,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.27 | bwd_microstep: 1544.31 | bwd_inner_microstep: 1544.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:06:24,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1376.43 | bwd_inner_microstep: 1376.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597 [2024-06-11 02:06:26,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.67 | bwd_microstep: 1408.89 | bwd_inner_microstep: 1408.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1878 [2024-06-11 02:06:27,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.67 | bwd_microstep: 679.86 | bwd_inner_microstep: 679.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 02:06:29,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1248.75 | bwd_inner_microstep: 1248.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948 [2024-06-11 02:06:30,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 826.49 | bwd_inner_microstep: 826.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-11 02:06:32,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.84 | bwd_microstep: 1314.77 | bwd_inner_microstep: 1314.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063 [2024-06-11 02:06:33,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.42 | bwd_microstep: 814.78 | bwd_inner_microstep: 814.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-11 02:06:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.83 | bwd_microstep: 1482.48 | bwd_inner_microstep: 1482.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3442 [2024-06-11 02:06:37,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.94 | bwd_microstep: 1392.50 | bwd_inner_microstep: 1392.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3471 [2024-06-11 02:06:39,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.62 | bwd_microstep: 1441.17 | bwd_inner_microstep: 1441.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3676 [2024-06-11 02:06:41,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.11 | bwd_microstep: 1586.83 | bwd_inner_microstep: 1586.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 02:06:43,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1290.85 | bwd_inner_microstep: 1290.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3841 [2024-06-11 02:06:45,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.59 | bwd_microstep: 1664.55 | bwd_inner_microstep: 1664.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-11 02:06:47,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1430.76 | bwd_inner_microstep: 1430.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3627 [2024-06-11 02:06:49,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.10 | bwd_microstep: 1311.52 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3622 [2024-06-11 02:06:51,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1341.89 | bwd_inner_microstep: 1341.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 02:06:53,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.40 | bwd_microstep: 1379.71 | bwd_inner_microstep: 1379.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744 [2024-06-11 02:06:55,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.15 | bwd_microstep: 1640.30 | bwd_inner_microstep: 1640.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2184 [2024-06-11 02:06:56,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.33 | bwd_microstep: 919.58 | bwd_inner_microstep: 919.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-11 02:06:58,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.77 | bwd_microstep: 1476.71 | bwd_inner_microstep: 1476.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 02:07:00,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3591 [2024-06-11 02:07:02,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.37 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3590 [2024-06-11 02:07:04,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.90 | bwd_microstep: 1309.74 | bwd_inner_microstep: 1309.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3438 [2024-06-11 02:07:06,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1399.53 | bwd_inner_microstep: 1399.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-11 02:07:08,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1601.19 | bwd_inner_microstep: 1601.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 02:07:10,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.64 | bwd_microstep: 1646.67 | bwd_inner_microstep: 1646.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-11 02:07:12,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1401.37 | bwd_inner_microstep: 1401.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3577 [2024-06-11 02:07:16,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 02:07:16,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.29 | bwd_microstep: 3220.90 | bwd_inner_microstep: 1761.94 | bwd_allreduce_microstep: 1458.91 | step_microstep: 37.87 [2024-06-11 02:07:16,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.77 | bwd: 45312.26 | bwd_inner: 43852.39 | bwd_allreduce: 1459.17 | step: 39.32 {'loss': 1.2503, 'learning_rate': 2.264118102579693e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 02:07:18,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.23 | bwd_microstep: 1379.11 | bwd_inner_microstep: 1379.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3946 [2024-06-11 02:07:20,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.07 | bwd_microstep: 1526.05 | bwd_inner_microstep: 1526.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3889 [2024-06-11 02:07:22,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.26 | bwd_microstep: 1582.82 | bwd_inner_microstep: 1582.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1933 [2024-06-11 02:07:23,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.07 | bwd_microstep: 696.17 | bwd_inner_microstep: 696.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 02:07:25,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.89 | bwd_microstep: 1476.60 | bwd_inner_microstep: 1476.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613 [2024-06-11 02:07:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.27 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3489 [2024-06-11 02:07:29,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1955 [2024-06-11 02:07:30,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.61 | bwd_microstep: 730.84 | bwd_inner_microstep: 730.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3746 [2024-06-11 02:07:32,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.88 | bwd_microstep: 1638.91 | bwd_inner_microstep: 1638.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-11 02:07:33,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 701.15 | bwd_inner_microstep: 701.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-11 02:07:35,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.44 | bwd_microstep: 1151.68 | bwd_inner_microstep: 1151.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-11 02:07:36,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.41 | bwd_microstep: 798.08 | bwd_inner_microstep: 798.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 02:07:38,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.97 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3643 [2024-06-11 02:07:40,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.02 | bwd_microstep: 1474.88 | bwd_inner_microstep: 1474.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 02:07:42,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.76 | bwd_microstep: 1485.50 | bwd_inner_microstep: 1485.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-11 02:07:44,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.77 | bwd_microstep: 1521.60 | bwd_inner_microstep: 1521.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2292 [2024-06-11 02:07:45,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.51 | bwd_microstep: 1072.00 | bwd_inner_microstep: 1071.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 02:07:47,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1350.50 | bwd_inner_microstep: 1350.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-11 02:07:49,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1538.25 | bwd_inner_microstep: 1538.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3872 [2024-06-11 02:07:51,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.07 | bwd_microstep: 1372.27 | bwd_inner_microstep: 1372.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 02:07:53,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.34 | bwd_microstep: 1251.73 | bwd_inner_microstep: 1251.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-11 02:07:55,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1440.60 | bwd_inner_microstep: 1440.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2937 [2024-06-11 02:07:57,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.96 | bwd_microstep: 1193.72 | bwd_inner_microstep: 1193.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 02:07:58,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.41 | bwd_microstep: 1284.29 | bwd_inner_microstep: 1284.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 02:08:01,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.97 | bwd_microstep: 1554.99 | bwd_inner_microstep: 1554.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-11 02:08:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1499.39 | bwd_inner_microstep: 1499.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 02:08:05,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.80 | bwd_microstep: 1658.97 | bwd_inner_microstep: 1658.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-11 02:08:07,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-11 02:08:09,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.68 | bwd_microstep: 1612.27 | bwd_inner_microstep: 1612.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3568 [2024-06-11 02:08:11,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.38 | bwd_microstep: 1560.26 | bwd_inner_microstep: 1560.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624 [2024-06-11 02:08:13,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.71 | bwd_microstep: 1544.00 | bwd_inner_microstep: 1543.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3743 [2024-06-11 02:08:19,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.08 | optimizer_step: 6.58 [2024-06-11 02:08:19,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.20 | bwd_microstep: 4821.84 | bwd_inner_microstep: 1925.83 | bwd_allreduce_microstep: 2895.96 | step_microstep: 37.62 [2024-06-11 02:08:19,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16097.61 | bwd: 46192.29 | bwd_inner: 43295.43 | bwd_allreduce: 2896.19 | step: 39.07 {'loss': 1.1697, 'learning_rate': 2.246802472178675e-06, 'epoch': 0.85} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-11 02:08:21,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1467.49 | bwd_inner_microstep: 1467.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3907 [2024-06-11 02:08:23,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.54 | bwd_microstep: 1685.00 | bwd_inner_microstep: 1684.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3879 [2024-06-11 02:08:25,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.32 | bwd_microstep: 1482.54 | bwd_inner_microstep: 1482.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 02:08:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1389.93 | bwd_inner_microstep: 1389.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696 [2024-06-11 02:08:29,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.17 | bwd_microstep: 1518.89 | bwd_inner_microstep: 1518.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3529 [2024-06-11 02:08:31,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1321.44 | bwd_inner_microstep: 1321.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753 [2024-06-11 02:08:33,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.94 | bwd_microstep: 1467.83 | bwd_inner_microstep: 1467.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 02:08:35,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.88 | bwd_microstep: 1243.18 | bwd_inner_microstep: 1243.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3925 [2024-06-11 02:08:37,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.99 | bwd_microstep: 1486.38 | bwd_inner_microstep: 1486.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-11 02:08:39,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.71 | bwd_microstep: 1409.61 | bwd_inner_microstep: 1409.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3557 [2024-06-11 02:08:41,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.47 | bwd_microstep: 1360.03 | bwd_inner_microstep: 1360.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 02:08:43,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1480.50 | bwd_inner_microstep: 1480.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2097 [2024-06-11 02:08:44,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.12 | bwd_microstep: 1013.89 | bwd_inner_microstep: 1013.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 02:08:46,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1344.64 | bwd_inner_microstep: 1344.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1858 [2024-06-11 02:08:47,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.76 | bwd_microstep: 707.59 | bwd_inner_microstep: 707.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 02:08:49,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.61 | bwd_microstep: 1385.40 | bwd_inner_microstep: 1385.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-11 02:08:50,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.18 | bwd_microstep: 892.36 | bwd_inner_microstep: 892.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3829 [2024-06-11 02:08:52,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.51 | bwd_microstep: 1358.00 | bwd_inner_microstep: 1357.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 02:08:54,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.55 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2979 [2024-06-11 02:08:55,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.82 | bwd_microstep: 1105.03 | bwd_inner_microstep: 1105.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 02:08:57,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.34 | bwd_microstep: 1378.27 | bwd_inner_microstep: 1378.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792 [2024-06-11 02:09:00,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.38 | bwd_microstep: 1650.62 | bwd_inner_microstep: 1650.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3609 [2024-06-11 02:09:01,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.13 | bwd_microstep: 1213.35 | bwd_inner_microstep: 1213.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 02:09:03,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1559.61 | bwd_inner_microstep: 1559.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2167 [2024-06-11 02:09:05,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.12 | bwd_microstep: 855.14 | bwd_inner_microstep: 855.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 02:09:06,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.36 | bwd_microstep: 1298.90 | bwd_inner_microstep: 1298.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 02:09:08,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.71 | bwd_microstep: 1261.38 | bwd_inner_microstep: 1261.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3534 [2024-06-11 02:09:10,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.43 | bwd_microstep: 1356.01 | bwd_inner_microstep: 1355.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-11 02:09:12,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.26 | bwd_microstep: 1396.63 | bwd_inner_microstep: 1396.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509 [2024-06-11 02:09:14,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.03 | bwd_microstep: 1431.52 | bwd_inner_microstep: 1431.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3457 [2024-06-11 02:09:16,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1425.26 | bwd_inner_microstep: 1425.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3768 [2024-06-11 02:09:20,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-11 02:09:20,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 3532.47 | bwd_inner_microstep: 1553.53 | bwd_allreduce_microstep: 1978.88 | step_microstep: 38.40 [2024-06-11 02:09:20,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16020.52 | bwd: 44856.96 | bwd_inner: 42877.17 | bwd_allreduce: 1979.11 | step: 39.87 {'loss': 1.1904, 'learning_rate': 2.229549368447057e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 02:09:22,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1332.73 | bwd_inner_microstep: 1332.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4016 [2024-06-11 02:09:24,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1504.42 | bwd_inner_microstep: 1504.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3833 [2024-06-11 02:09:26,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.50 | bwd_microstep: 1513.38 | bwd_inner_microstep: 1513.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 02:09:28,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1377.30 | bwd_inner_microstep: 1377.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-11 02:09:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.11 | bwd_microstep: 1446.15 | bwd_inner_microstep: 1446.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 02:09:31,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 794.43 | bwd_inner_microstep: 794.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 02:09:33,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1285.43 | bwd_inner_microstep: 1285.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 02:09:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 02:09:37,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1392.35 | bwd_inner_microstep: 1392.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 02:09:39,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.61 | bwd_microstep: 1394.24 | bwd_inner_microstep: 1394.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-11 02:09:41,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.61 | bwd_microstep: 1550.15 | bwd_inner_microstep: 1550.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3639 [2024-06-11 02:09:43,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.04 | bwd_microstep: 1375.64 | bwd_inner_microstep: 1375.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655 [2024-06-11 02:09:45,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1575.98 | bwd_inner_microstep: 1575.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-11 02:09:47,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.87 | bwd_microstep: 1275.33 | bwd_inner_microstep: 1275.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498 [2024-06-11 02:09:49,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1582.49 | bwd_inner_microstep: 1582.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 02:09:51,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.00 | bwd_microstep: 1381.15 | bwd_inner_microstep: 1381.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3843 [2024-06-11 02:09:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.61 | bwd_microstep: 1713.03 | bwd_inner_microstep: 1713.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-11 02:09:55,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3460 [2024-06-11 02:09:57,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.45 | bwd_microstep: 1341.77 | bwd_inner_microstep: 1341.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 02:09:59,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.06 | bwd_microstep: 1558.64 | bwd_inner_microstep: 1558.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-11 02:10:01,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.57 | bwd_microstep: 1613.78 | bwd_inner_microstep: 1613.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-11 02:10:03,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3547 [2024-06-11 02:10:05,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.45 | bwd_microstep: 1329.32 | bwd_inner_microstep: 1329.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1938 [2024-06-11 02:10:06,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.30 | bwd_microstep: 728.40 | bwd_inner_microstep: 728.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2026 [2024-06-11 02:10:07,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 806.04 | bwd_inner_microstep: 806.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586 [2024-06-11 02:10:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.40 | bwd_microstep: 1607.35 | bwd_inner_microstep: 1607.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 02:10:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1351.21 | bwd_inner_microstep: 1351.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 02:10:13,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1555.00 | bwd_inner_microstep: 1554.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-11 02:10:15,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.94 | bwd_microstep: 1441.04 | bwd_inner_microstep: 1441.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3420 [2024-06-11 02:10:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.12 | bwd_microstep: 1374.30 | bwd_inner_microstep: 1374.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275 [2024-06-11 02:10:18,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.93 | bwd_microstep: 909.40 | bwd_inner_microstep: 909.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-11 02:10:23,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.58 [2024-06-11 02:10:23,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.16 | bwd_microstep: 3769.87 | bwd_inner_microstep: 1694.40 | bwd_allreduce_microstep: 2075.42 | step_microstep: 37.71 [2024-06-11 02:10:23,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16322.36 | bwd: 45956.31 | bwd_inner: 43879.98 | bwd_allreduce: 2075.65 | step: 39.22 {'loss': 1.2066, 'learning_rate': 2.212358852150187e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 02:10:25,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1364.68 | bwd_inner_microstep: 1364.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-11 02:10:26,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.47 | bwd_microstep: 675.39 | bwd_inner_microstep: 675.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3921 [2024-06-11 02:10:28,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.80 | bwd_microstep: 1488.64 | bwd_inner_microstep: 1488.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 02:10:29,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1246.60 | bwd_inner_microstep: 1246.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2251 [2024-06-11 02:10:31,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.57 | bwd_microstep: 965.45 | bwd_inner_microstep: 965.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-11 02:10:32,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.93 | bwd_microstep: 674.92 | bwd_inner_microstep: 674.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 02:10:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.28 | bwd_microstep: 1482.43 | bwd_inner_microstep: 1482.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 02:10:35,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1291.39 | bwd_inner_microstep: 1291.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 02:10:37,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.66 | bwd_microstep: 1257.81 | bwd_inner_microstep: 1257.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-11 02:10:39,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.66 | bwd_microstep: 1415.99 | bwd_inner_microstep: 1415.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3625 [2024-06-11 02:10:41,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.07 | bwd_microstep: 1562.84 | bwd_inner_microstep: 1562.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3465 [2024-06-11 02:10:43,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.51 | bwd_microstep: 1312.43 | bwd_inner_microstep: 1312.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 02:10:45,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.25 | bwd_microstep: 1454.39 | bwd_inner_microstep: 1454.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 02:10:47,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1283.28 | bwd_inner_microstep: 1283.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3656 [2024-06-11 02:10:49,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.84 | bwd_microstep: 1823.64 | bwd_inner_microstep: 1823.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-11 02:10:51,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1525.59 | bwd_inner_microstep: 1525.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3434 [2024-06-11 02:10:53,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.26 | bwd_microstep: 1215.10 | bwd_inner_microstep: 1215.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3537 [2024-06-11 02:10:55,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.31 | bwd_microstep: 1199.38 | bwd_inner_microstep: 1199.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3548 [2024-06-11 02:10:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1428.44 | bwd_inner_microstep: 1428.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-11 02:10:59,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.81 | bwd_microstep: 1420.08 | bwd_inner_microstep: 1420.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538 [2024-06-11 02:11:01,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1358.05 | bwd_inner_microstep: 1358.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3647 [2024-06-11 02:11:03,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1615.24 | bwd_inner_microstep: 1615.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3617 [2024-06-11 02:11:05,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.99 | bwd_microstep: 1510.11 | bwd_inner_microstep: 1510.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 02:11:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.24 | bwd_microstep: 1569.44 | bwd_inner_microstep: 1569.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-11 02:11:08,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.03 | bwd_microstep: 802.11 | bwd_inner_microstep: 802.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-11 02:11:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.46 | bwd_microstep: 1637.49 | bwd_inner_microstep: 1637.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2193 [2024-06-11 02:11:12,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.19 | bwd_microstep: 910.92 | bwd_inner_microstep: 910.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-11 02:11:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.58 | bwd_microstep: 1348.52 | bwd_inner_microstep: 1348.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-11 02:11:16,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.26 | bwd_microstep: 1639.26 | bwd_inner_microstep: 1639.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3575 [2024-06-11 02:11:18,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1561.23 | bwd_inner_microstep: 1561.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3468 [2024-06-11 02:11:20,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.02 | bwd_microstep: 1505.06 | bwd_inner_microstep: 1505.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 02:11:25,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.19 | optimizer_step: 6.62 [2024-06-11 02:11:25,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 4176.00 | bwd_inner_microstep: 1712.20 | bwd_allreduce_microstep: 2463.75 | step_microstep: 39.08 [2024-06-11 02:11:25,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16108.53 | bwd: 45721.89 | bwd_inner: 43257.23 | bwd_allreduce: 2463.98 | step: 40.58 {'loss': 1.151, 'learning_rate': 2.19523098383297e-06, 'epoch': 0.85} .68s/it] 85%|████████▌ | 1470/1726 [25:29:53<4:26:29, 62.46s/it] 85%|████████▌ | 1470/1726 [25:29:53<4:26:29, 62.46s/it] 85%|████████▌ | 1471/1726 [25:30:56<4:25:39, 62.51s/it] 85%|████████▌ | 1471/1726 [25:30:56<4:25:39, 62.51s/it] 85%|████████▌ | 1472/1726 [25:31:57<4:22:58, 62.12s/it] 85%|████████▌ | 1472/1726 [25:31:57<4:22:58, 62.12s/it] 85%|████████▌ | 1473/1726 [25:32:59<4:22:33, 62.27s/it] 85%|████████▌ | 1473/1726 [25:32:59<4:22:33, 62.27s/it] 85%|████████▌ | 1474/1726 [25:34:02<4:21:24, 62.24s/it] 85%|████████▌ | 1474/1726 [25:3dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-11 02:11:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.55 | bwd_microstep: 782.43 | bwd_inner_microstep: 782.28 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 02:11:28,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.38 | bwd_microstep: 1247.28 | bwd_inner_microstep: 1247.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3977 [2024-06-11 02:11:30,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1408.88 | bwd_inner_microstep: 1408.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 02:11:32,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.96 | bwd_microstep: 1479.27 | bwd_inner_microstep: 1479.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2254 [2024-06-11 02:11:33,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.64 | bwd_microstep: 967.53 | bwd_inner_microstep: 967.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 02:11:35,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1247.57 | bwd_inner_microstep: 1247.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3490 [2024-06-11 02:11:37,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.71 | bwd_microstep: 1333.77 | bwd_inner_microstep: 1333.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 02:11:38,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.82 | bwd_microstep: 1246.54 | bwd_inner_microstep: 1246.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1899 [2024-06-11 02:11:39,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.66 | bwd_microstep: 813.13 | bwd_inner_microstep: 813.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-11 02:11:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.22 | bwd_microstep: 1617.47 | bwd_inner_microstep: 1617.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3654 [2024-06-11 02:11:44,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.14 | bwd_microstep: 1444.61 | bwd_inner_microstep: 1444.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3625 [2024-06-11 02:11:46,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.92 | bwd_microstep: 1812.04 | bwd_inner_microstep: 1812.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3663 [2024-06-11 02:11:49,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.95 | bwd_microstep: 1824.47 | bwd_inner_microstep: 1824.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 02:11:51,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.16 | bwd_microstep: 1481.91 | bwd_inner_microstep: 1481.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3671 [2024-06-11 02:11:53,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.39 | bwd_microstep: 1721.42 | bwd_inner_microstep: 1721.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-11 02:11:55,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1448.13 | bwd_inner_microstep: 1448.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3743 [2024-06-11 02:11:57,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.02 | bwd_microstep: 1565.17 | bwd_inner_microstep: 1565.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-11 02:11:59,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.60 | bwd_microstep: 1294.87 | bwd_inner_microstep: 1294.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 02:12:01,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1401.53 | bwd_inner_microstep: 1401.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 02:12:03,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1396.21 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-11 02:12:05,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.56 | bwd_microstep: 1513.07 | bwd_inner_microstep: 1513.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2141 [2024-06-11 02:12:06,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.91 | bwd_microstep: 835.43 | bwd_inner_microstep: 835.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 02:12:08,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.59 | bwd_microstep: 1560.56 | bwd_inner_microstep: 1560.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2253 [2024-06-11 02:12:10,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.79 | bwd_microstep: 1000.05 | bwd_inner_microstep: 1000.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-11 02:12:12,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.41 | bwd_microstep: 1552.03 | bwd_inner_microstep: 1552.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 02:12:14,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.59 | bwd_microstep: 1661.33 | bwd_inner_microstep: 1661.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3776 [2024-06-11 02:12:16,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.04 | bwd_microstep: 1283.67 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046 [2024-06-11 02:12:17,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.59 | bwd_microstep: 810.39 | bwd_inner_microstep: 810.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1163 [2024-06-11 02:12:18,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 169.82 | bwd_microstep: 435.45 | bwd_inner_microstep: 435.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2030 [2024-06-11 02:12:19,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.89 | bwd_microstep: 783.22 | bwd_inner_microstep: 783.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3792 [2024-06-11 02:12:21,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.26 | bwd_microstep: 1501.54 | bwd_inner_microstep: 1501.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074 [2024-06-11 02:12:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 02:12:26,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.69 | bwd_microstep: 4407.70 | bwd_inner_microstep: 1157.96 | bwd_allreduce_microstep: 3249.69 | step_microstep: 37.82 [2024-06-11 02:12:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15477.95 | bwd: 44878.67 | bwd_inner: 41627.98 | bwd_allreduce: 3249.97 | step: 39.33 {'loss': 1.1548, 'learning_rate': 2.178165823819667e-06, 'epoch': 0.85} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-11 02:12:27,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.20 | bwd_microstep: 1336.16 | bwd_inner_microstep: 1336.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 02:12:29,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.28 | bwd_microstep: 1244.46 | bwd_inner_microstep: 1244.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 02:12:31,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3517 [2024-06-11 02:12:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1250.57 | bwd_inner_microstep: 1250.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-11 02:12:35,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1454.22 | bwd_inner_microstep: 1454.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3542 [2024-06-11 02:12:37,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1439.51 | bwd_inner_microstep: 1439.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 02:12:39,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.65 | bwd_microstep: 1286.47 | bwd_inner_microstep: 1286.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 02:12:40,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 02:12:42,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.34 | bwd_microstep: 793.51 | bwd_inner_microstep: 793.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-11 02:12:43,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.99 | bwd_microstep: 1297.47 | bwd_inner_microstep: 1297.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3678 [2024-06-11 02:12:45,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.22 | bwd_microstep: 1361.67 | bwd_inner_microstep: 1361.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-11 02:12:47,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.52 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 02:12:49,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.79 | bwd_microstep: 1355.32 | bwd_inner_microstep: 1355.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-11 02:12:50,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.55 | bwd_microstep: 897.18 | bwd_inner_microstep: 897.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-11 02:12:53,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.67 | bwd_microstep: 1486.86 | bwd_inner_microstep: 1486.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677 [2024-06-11 02:12:55,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.90 | bwd_microstep: 1628.52 | bwd_inner_microstep: 1628.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2210 [2024-06-11 02:12:56,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.34 | bwd_microstep: 960.18 | bwd_inner_microstep: 960.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-11 02:12:58,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.98 | bwd_microstep: 1191.33 | bwd_inner_microstep: 1191.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 02:13:00,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1380.67 | bwd_inner_microstep: 1380.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 02:13:02,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.92 | bwd_microstep: 1398.07 | bwd_inner_microstep: 1398.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471 [2024-06-11 02:13:03,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.66 | bwd_microstep: 1216.70 | bwd_inner_microstep: 1216.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692 [2024-06-11 02:13:05,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1329.70 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501 [2024-06-11 02:13:07,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1319.53 | bwd_inner_microstep: 1319.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069 [2024-06-11 02:13:08,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.10 | bwd_microstep: 915.25 | bwd_inner_microstep: 915.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-11 02:13:10,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.60 | bwd_microstep: 1451.02 | bwd_inner_microstep: 1450.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3655 [2024-06-11 02:13:12,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.86 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 02:13:14,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3809 [2024-06-11 02:13:17,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.90 | bwd_microstep: 1616.57 | bwd_inner_microstep: 1616.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3767 [2024-06-11 02:13:19,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.03 | bwd_microstep: 1710.51 | bwd_inner_microstep: 1710.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2271 [2024-06-11 02:13:20,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.55 | bwd_microstep: 935.50 | bwd_inner_microstep: 935.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2403 [2024-06-11 02:13:22,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.16 | bwd_microstep: 910.83 | bwd_inner_microstep: 910.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 02:13:28,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.92 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 02:13:28,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.36 | bwd_microstep: 5927.00 | bwd_inner_microstep: 1700.89 | bwd_allreduce_microstep: 4226.06 | step_microstep: 38.98 [2024-06-11 02:13:28,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.77 | bwd: 46475.04 | bwd_inner: 42248.08 | bwd_allreduce: 4226.29 | step: 40.44 {'loss': 1.1534, 'learning_rate': 2.1611634322136934e-06, 'epoch': 0.86} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488 [2024-06-11 02:13:30,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1335.41 | bwd_inner_microstep: 1335.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3937 [2024-06-11 02:13:32,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.77 | bwd_microstep: 1486.90 | bwd_inner_microstep: 1486.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1876 [2024-06-11 02:13:33,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.39 | bwd_microstep: 707.14 | bwd_inner_microstep: 707.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3852 [2024-06-11 02:13:35,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.13 | bwd_microstep: 1626.89 | bwd_inner_microstep: 1626.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4156 [2024-06-11 02:13:37,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.73 | bwd_microstep: 1538.78 | bwd_inner_microstep: 1538.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-11 02:13:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1427.86 | bwd_inner_microstep: 1427.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752 [2024-06-11 02:13:41,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 02:13:43,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3514 [2024-06-11 02:13:45,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1249.39 | bwd_inner_microstep: 1249.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3769 [2024-06-11 02:13:47,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.49 | bwd_microstep: 1639.60 | bwd_inner_microstep: 1639.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3496 [2024-06-11 02:13:49,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.08 | bwd_microstep: 1416.40 | bwd_inner_microstep: 1416.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3452 [2024-06-11 02:13:51,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.15 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3707 [2024-06-11 02:13:53,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.47 | bwd_microstep: 1692.70 | bwd_inner_microstep: 1692.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1905 [2024-06-11 02:13:54,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.14 | bwd_microstep: 716.00 | bwd_inner_microstep: 715.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 02:13:56,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.45 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2095 [2024-06-11 02:13:58,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.32 | bwd_microstep: 947.11 | bwd_inner_microstep: 947.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1944 [2024-06-11 02:13:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.78 | bwd_microstep: 739.95 | bwd_inner_microstep: 739.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 02:14:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.72 | bwd_microstep: 1379.38 | bwd_inner_microstep: 1379.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 02:14:03,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.90 | bwd_microstep: 1646.04 | bwd_inner_microstep: 1646.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-11 02:14:05,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1508.74 | bwd_inner_microstep: 1508.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-11 02:14:07,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.23 | bwd_microstep: 1504.61 | bwd_inner_microstep: 1504.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 02:14:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.85 | bwd_microstep: 1451.41 | bwd_inner_microstep: 1451.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 02:14:11,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.46 | bwd_microstep: 1279.79 | bwd_inner_microstep: 1279.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 02:14:13,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.51 | bwd_microstep: 1391.43 | bwd_inner_microstep: 1391.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 02:14:15,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1457.80 | bwd_inner_microstep: 1457.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2267 [2024-06-11 02:14:16,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.92 | bwd_microstep: 934.27 | bwd_inner_microstep: 934.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-11 02:14:18,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.43 | bwd_microstep: 1646.89 | bwd_inner_microstep: 1646.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3574 [2024-06-11 02:14:20,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.07 | bwd_microstep: 1566.35 | bwd_inner_microstep: 1566.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1946 [2024-06-11 02:14:22,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.20 | bwd_microstep: 777.85 | bwd_inner_microstep: 777.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3799 [2024-06-11 02:14:24,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.20 | bwd_microstep: 1550.97 | bwd_inner_microstep: 1550.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-11 02:14:26,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.97 | bwd_microstep: 1544.30 | bwd_inner_microstep: 1544.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-11 02:14:30,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.28 | optimizer_step: 6.61 [2024-06-11 02:14:30,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.02 | bwd_microstep: 3885.06 | bwd_inner_microstep: 1688.03 | bwd_allreduce_microstep: 2196.97 | step_microstep: 38.94 [2024-06-11 02:14:30,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16165.44 | bwd: 45697.83 | bwd_inner: 43499.95 | bwd_allreduce: 2197.19 | step: 40.50 {'loss': 1.1701, 'learning_rate': 2.1442238688973682e-06, 'epoch': 0.86} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 02:14:32,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1370.85 | bwd_inner_microstep: 1370.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3928 [2024-06-11 02:14:34,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1390.98 | bwd_inner_microstep: 1390.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 02:14:36,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.34 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-11 02:14:38,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1482.74 | bwd_inner_microstep: 1482.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3758 [2024-06-11 02:14:40,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.82 | bwd_microstep: 1640.36 | bwd_inner_microstep: 1640.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 02:14:42,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.44 | bwd_microstep: 1384.77 | bwd_inner_microstep: 1384.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3709 [2024-06-11 02:14:45,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.35 | bwd_microstep: 1531.83 | bwd_inner_microstep: 1531.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3416 [2024-06-11 02:14:46,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.29 | bwd_microstep: 1154.90 | bwd_inner_microstep: 1154.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 02:14:48,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.15 | bwd_microstep: 1289.59 | bwd_inner_microstep: 1289.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 02:14:50,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1485.07 | bwd_inner_microstep: 1485.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 02:14:52,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.03 | bwd_microstep: 1348.23 | bwd_inner_microstep: 1348.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-11 02:14:54,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1251.20 | bwd_inner_microstep: 1251.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439 [2024-06-11 02:14:55,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1191.15 | bwd_inner_microstep: 1191.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3556 [2024-06-11 02:14:57,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.33 | bwd_microstep: 1452.57 | bwd_inner_microstep: 1452.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3662 [2024-06-11 02:15:00,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.85 | bwd_microstep: 1723.96 | bwd_inner_microstep: 1723.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2747 [2024-06-11 02:15:01,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.86 | bwd_microstep: 947.19 | bwd_inner_microstep: 947.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 02:15:03,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1246.42 | bwd_inner_microstep: 1246.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3544 [2024-06-11 02:15:05,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.03 | bwd_microstep: 1694.23 | bwd_inner_microstep: 1694.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3700 [2024-06-11 02:15:07,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1628.62 | bwd_inner_microstep: 1628.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 02:15:09,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.63 | bwd_microstep: 1525.88 | bwd_inner_microstep: 1525.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1941 [2024-06-11 02:15:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.60 | bwd_microstep: 762.65 | bwd_inner_microstep: 762.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-11 02:15:11,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.17 | bwd_microstep: 793.68 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 02:15:13,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 1391.92 | bwd_inner_microstep: 1391.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3825 [2024-06-11 02:15:15,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.70 | bwd_microstep: 1357.71 | bwd_inner_microstep: 1357.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-11 02:15:17,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 978.17 | bwd_inner_microstep: 978.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3758 [2024-06-11 02:15:18,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.08 | bwd_microstep: 1345.73 | bwd_inner_microstep: 1345.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 02:15:21,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1498.32 | bwd_inner_microstep: 1498.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-11 02:15:22,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1415.75 | bwd_inner_microstep: 1415.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 02:15:25,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1476.08 | bwd_inner_microstep: 1476.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-11 02:15:26,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1248.60 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-11 02:15:28,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.81 | bwd_microstep: 1543.69 | bwd_inner_microstep: 1543.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588 [2024-06-11 02:15:30,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.04 | optimizer_step: 6.66 [2024-06-11 02:15:30,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.06 | bwd_microstep: 1377.86 | bwd_inner_microstep: 1370.02 | bwd_allreduce_microstep: 7.80 | step_microstep: 37.40 [2024-06-11 02:15:30,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16249.72 | bwd: 43413.92 | bwd_inner: 43405.22 | bwd_allreduce: 8.02 | step: 38.94 {'loss': 1.1923, 'learning_rate': 2.127347193531757e-06, 'epoch': 0.86} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3507 [2024-06-11 02:15:32,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.89 | bwd_microstep: 1344.08 | bwd_inner_microstep: 1344.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3904 [2024-06-11 02:15:35,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.73 | bwd_microstep: 1718.08 | bwd_inner_microstep: 1718.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 02:15:37,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1553.85 | bwd_inner_microstep: 1553.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-11 02:15:39,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.03 | bwd_microstep: 1436.25 | bwd_inner_microstep: 1436.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-11 02:15:41,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.11 | bwd_microstep: 1541.38 | bwd_inner_microstep: 1541.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-11 02:15:43,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.44 | bwd_microstep: 1278.92 | bwd_inner_microstep: 1278.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 02:15:44,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.36 | bwd_microstep: 1250.71 | bwd_inner_microstep: 1250.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 02:15:46,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425 [2024-06-11 02:15:48,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1155.08 | bwd_inner_microstep: 1155.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079 [2024-06-11 02:15:49,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.01 | bwd_microstep: 822.14 | bwd_inner_microstep: 822.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-11 02:15:50,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.49 | bwd_microstep: 1151.36 | bwd_inner_microstep: 1151.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 02:15:52,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1377.87 | bwd_inner_microstep: 1377.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400 [2024-06-11 02:15:54,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.95 | bwd_microstep: 1438.06 | bwd_inner_microstep: 1438.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3439 [2024-06-11 02:15:56,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1484.09 | bwd_inner_microstep: 1484.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-11 02:15:58,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1480.33 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2022 [2024-06-11 02:16:00,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.59 | bwd_microstep: 903.11 | bwd_inner_microstep: 903.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3411 [2024-06-11 02:16:01,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.79 | bwd_microstep: 1312.62 | bwd_inner_microstep: 1312.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 02:16:04,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.27 | bwd_microstep: 1646.65 | bwd_inner_microstep: 1646.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 02:16:05,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.02 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290 [2024-06-11 02:16:07,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.95 | bwd_microstep: 909.02 | bwd_inner_microstep: 908.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-11 02:16:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.06 | bwd_microstep: 802.84 | bwd_inner_microstep: 802.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3468 [2024-06-11 02:16:10,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.29 | bwd_microstep: 1337.88 | bwd_inner_microstep: 1337.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2140 [2024-06-11 02:16:11,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.94 | bwd_microstep: 737.94 | bwd_inner_microstep: 737.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 02:16:13,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.11 | bwd_microstep: 1405.13 | bwd_inner_microstep: 1405.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3806 [2024-06-11 02:16:15,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.82 | bwd_microstep: 1616.85 | bwd_inner_microstep: 1616.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3546 [2024-06-11 02:16:17,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.40 | bwd_microstep: 1231.50 | bwd_inner_microstep: 1231.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 02:16:19,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.17 | bwd_microstep: 1656.16 | bwd_inner_microstep: 1656.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-11 02:16:21,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.95 | bwd_microstep: 1319.05 | bwd_inner_microstep: 1319.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2192 [2024-06-11 02:16:22,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.26 | bwd_microstep: 859.88 | bwd_inner_microstep: 859.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2172 [2024-06-11 02:16:23,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.57 | bwd_microstep: 953.23 | bwd_inner_microstep: 953.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3590 [2024-06-11 02:16:25,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.91 | bwd_microstep: 1610.05 | bwd_inner_microstep: 1610.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:17:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.35 | optimizer_step: 6.59 [2024-06-11 02:17:15,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 49443.05 | bwd_inner_microstep: 1547.69 | bwd_allreduce_microstep: 47895.29 | step_microstep: 38.93 [2024-06-11 02:17:15,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15461.85 | bwd: 89312.93 | bwd_inner: 41416.66 | bwd_allreduce: 47895.55 | step: 40.52 {'loss': 1.181, 'learning_rate': 2.1105334655564148e-06, 'epoch': 0.86} 4:02<4:21:24, 62.24s/it] 85%|████████▌ | 1475/1726 [25:35:02<4:18:26, 61.78s/it] 85%|████████▌ | 1475/1726 [25:35:02<4:18:26, 61.78s/it] 86%|████████▌ | 1476/1726 [25:36:05<4:18:22, 62.01s/it] 86%|████████▌ | 1476/1726 [25:36:05<4:18:22, 62.01s/it] 86%|████████▌ | 1477/1726 [25:37:07<4:17:35, 62.07s/it] 86%|████████▌ | 1477/1726 [25:37:07<4:17:35, 62.07s/it] 86%|████████▌ | 1478/1726 [25:38:07<4:13:59, 61.45s/it] 86%|████████▌ | 1478/1726 [25:38:07<4:13:59, 61.45s/it] 86%|████████▌ | 1479/1726 [25:39:52<5:06:53, 74.55s/it] 86%|████████▌ |dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-11 02:17:17,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.17 | bwd_microstep: 1433.16 | bwd_inner_microstep: 1433.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-11 02:17:19,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.44 | bwd_microstep: 1338.92 | bwd_inner_microstep: 1338.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 02:17:21,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1365.69 | bwd_inner_microstep: 1365.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 02:17:23,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.05 | bwd_microstep: 1237.21 | bwd_inner_microstep: 1237.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1859 [2024-06-11 02:17:24,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.58 | bwd_microstep: 675.40 | bwd_inner_microstep: 675.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2203 [2024-06-11 02:17:25,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.45 | bwd_microstep: 948.06 | bwd_inner_microstep: 948.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-11 02:17:27,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.82 | bwd_microstep: 1301.89 | bwd_inner_microstep: 1301.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3490 [2024-06-11 02:17:29,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 02:18:33,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.82 | bwd_microstep: 1328.22 | bwd_inner_microstep: 1328.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-11 02:18:35,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.35 | bwd_microstep: 1629.49 | bwd_inner_microstep: 1629.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-11 02:18:37,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 1331.00 | bwd_inner_microstep: 1330.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-11 02:18:39,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.67 | bwd_microstep: 1603.87 | bwd_inner_microstep: 1603.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507 [2024-06-11 02:18:41,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.18 | bwd_microstep: 1400.75 | bwd_inner_microstep: 1400.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3492 [2024-06-11 02:18:43,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 1304.61 | bwd_inner_microstep: 1304.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864 [2024-06-11 02:18:44,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.38 | bwd_microstep: 676.21 | bwd_inner_microstep: 676.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3508 [2024-06-11 02:18:45,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.43 | bwd_microstep: 1310.56 | bwd_inner_microstep: 1310.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3587 [2024-06-11 02:18:47,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.17 | bwd_microstep: 1459.68 | bwd_inner_microstep: 1459.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 02:18:49,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.53 | bwd_microstep: 1387.61 | bwd_inner_microstep: 1387.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3629 [2024-06-11 02:18:52,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.22 | bwd_microstep: 1630.29 | bwd_inner_microstep: 1630.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 02:18:53,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.06 | bwd_microstep: 1379.59 | bwd_inner_microstep: 1379.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 02:18:56,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.35 | bwd_microstep: 1500.90 | bwd_inner_microstep: 1500.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108 [2024-06-11 02:18:57,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.79 | bwd_microstep: 819.82 | bwd_inner_microstep: 819.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-11 02:18:58,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.53 | bwd_microstep: 1292.09 | bwd_inner_microstep: 1292.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-11 02:19:00,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.72 | bwd_microstep: 818.91 | bwd_inner_microstep: 818.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-11 02:19:01,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.01 | bwd_microstep: 1305.89 | bwd_inner_microstep: 1305.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 02:19:04,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.89 | bwd_microstep: 1549.52 | bwd_inner_microstep: 1549.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3771 [2024-06-11 02:19:05,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.15 | bwd_microstep: 1341.11 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3586 [2024-06-11 02:19:07,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.63 | bwd_microstep: 1306.57 | bwd_inner_microstep: 1306.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-11 02:19:09,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1485.05 | bwd_inner_microstep: 1485.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232 [2024-06-11 02:19:11,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.68 | bwd_microstep: 959.93 | bwd_inner_microstep: 959.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 02:19:12,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.93 | bwd_microstep: 971.40 | bwd_inner_microstep: 971.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3447 [2024-06-11 02:19:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-11 02:19:18,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 5764.51 | bwd_inner_microstep: 1639.49 | bwd_allreduce_microstep: 4124.97 | step_microstep: 37.70 [2024-06-11 02:19:18,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15345.71 | bwd: 45265.14 | bwd_inner: 41139.27 | bwd_allreduce: 4125.20 | step: 39.14 {'loss': 1.1794, 'learning_rate': 2.093782744189217e-06, 'epoch': 0.86} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-11 02:19:20,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.52 | bwd_microstep: 1265.45 | bwd_inner_microstep: 1265.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4027 [2024-06-11 02:19:22,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.30 | bwd_microstep: 1609.31 | bwd_inner_microstep: 1609.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3860 [2024-06-11 02:19:25,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.56 | bwd_microstep: 1661.94 | bwd_inner_microstep: 1661.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3577 [2024-06-11 02:19:26,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.25 | bwd_microstep: 1333.13 | bwd_inner_microstep: 1333.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:19:28,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.84 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1960 [2024-06-11 02:19:29,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.48 | bwd_microstep: 826.88 | bwd_inner_microstep: 826.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 02:19:31,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1297.48 | bwd_inner_microstep: 1297.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 02:19:33,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.34 | bwd_microstep: 1278.33 | bwd_inner_microstep: 1278.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3734 [2024-06-11 02:19:35,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.21 | bwd_microstep: 1535.74 | bwd_inner_microstep: 1535.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3695 [2024-06-11 02:19:37,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1528.64 | bwd_inner_microstep: 1528.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-11 02:19:39,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.80 | bwd_microstep: 1282.97 | bwd_inner_microstep: 1282.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3647 [2024-06-11 02:19:41,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.06 | bwd_microstep: 1352.37 | bwd_inner_microstep: 1352.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 02:19:43,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1382.33 | bwd_inner_microstep: 1382.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591 [2024-06-11 02:19:45,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.68 | bwd_microstep: 1309.21 | bwd_inner_microstep: 1309.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 02:19:46,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.98 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 02:19:48,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1388.80 | bwd_inner_microstep: 1388.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-11 02:19:51,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.65 | bwd_microstep: 1713.06 | bwd_inner_microstep: 1713.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-11 02:19:53,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1496.07 | bwd_inner_microstep: 1496.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3626 [2024-06-11 02:19:55,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1561.76 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 02:19:57,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.62 | bwd_microstep: 1350.68 | bwd_inner_microstep: 1350.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 02:19:59,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.60 | bwd_microstep: 1555.80 | bwd_inner_microstep: 1555.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-11 02:20:01,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.30 | bwd_microstep: 1524.21 | bwd_inner_microstep: 1524.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 02:20:03,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1555.39 | bwd_inner_microstep: 1555.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544 [2024-06-11 02:20:05,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.00 | bwd_microstep: 1426.08 | bwd_inner_microstep: 1426.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 02:20:07,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.55 | bwd_microstep: 1256.51 | bwd_inner_microstep: 1256.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 7, images per sample: 1.75, dynamic token length: 1134 [2024-06-11 02:20:07,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 171.76 | bwd_microstep: 447.99 | bwd_inner_microstep: 447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570 [2024-06-11 02:20:09,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1463.24 | bwd_inner_microstep: 1463.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3446 [2024-06-11 02:20:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.56 | bwd_microstep: 1408.48 | bwd_inner_microstep: 1408.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3471 [2024-06-11 02:20:13,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1329.86 | bwd_inner_microstep: 1329.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-11 02:20:15,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1506.35 | bwd_inner_microstep: 1506.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-11 02:20:17,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.28 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3774 [2024-06-11 02:20:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.29 | optimizer_step: 6.63 [2024-06-11 02:20:58,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.32 | bwd_microstep: 40029.47 | bwd_inner_microstep: 1977.36 | bwd_allreduce_microstep: 38052.04 | step_microstep: 38.60 [2024-06-11 02:20:58,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16592.66 | bwd: 82655.66 | bwd_inner: 44602.70 | bwd_allreduce: 38052.28 | step: 40.00 {'loss': 1.1697, 'learning_rate': 2.077095088426102e-06, 'epoch': 0.86} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 02:21:00,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.19 | bwd_microstep: 1354.00 | bwd_inner_microstep: 1353.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 02:21:02,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.52 | bwd_microstep: 1268.59 | bwd_inner_microstep: 1268.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-11 02:21:03,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.27 | bwd_microstep: 1270.97 | bwd_inner_microstep: 1270.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2321 [2024-06-11 02:21:05,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.19 | bwd_microstep: 910.02 | bwd_inner_microstep: 909.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862 [2024-06-11 02:21:07,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.06 | bwd_microstep: 1653.20 | bwd_inner_microstep: 1653.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3761 [2024-06-11 02:21:09,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 02:21:10,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.11 | bwd_microstep: 1277.05 | bwd_inner_microstep: 1277.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-11 02:21:13,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.26 | bwd_microstep: 1520.95 | bwd_inner_microstep: 1520.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 02:21:45,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.91 | bwd_microstep: 1276.18 | bwd_inner_microstep: 1276.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 02:22:02,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.44 | bwd_microstep: 1373.16 | bwd_inner_microstep: 1373.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3702 [2024-06-11 02:22:04,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1410.10 | bwd_inner_microstep: 1410.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-11 02:22:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.41 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-11 02:22:08,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.43 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923 [2024-06-11 02:22:09,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.66 | bwd_microstep: 812.56 | bwd_inner_microstep: 812.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3646 [2024-06-11 02:22:11,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1337.99 | bwd_inner_microstep: 1337.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-11 02:22:13,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1591.40 | bwd_inner_microstep: 1591.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-11 02:22:15,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-11 02:22:16,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.99 | bwd_microstep: 796.13 | bwd_inner_microstep: 796.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-11 02:22:18,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.50 | bwd_microstep: 1176.98 | bwd_inner_microstep: 1176.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-11 02:22:20,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.60 | bwd_microstep: 1648.95 | bwd_inner_microstep: 1648.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918 [2024-06-11 02:22:21,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.09 | bwd_microstep: 685.24 | bwd_inner_microstep: 685.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3670 [2024-06-11 02:22:23,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.66 | bwd_microstep: 1449.15 | bwd_inner_microstep: 1449.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-11 02:22:25,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.57 | bwd_microstep: 1488.62 | bwd_inner_microstep: 1488.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-11 02:22:27,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1340.56 | bwd_inner_microstep: 1340.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2529 [2024-06-11 02:22:29,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.93 | bwd_microstep: 1036.30 | bwd_inner_microstep: 1036.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 02:22:31,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.87 | bwd_microstep: 1379.29 | bwd_inner_microstep: 1379.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 02:22:32,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.80 | bwd_microstep: 1393.76 | bwd_inner_microstep: 1393.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-11 02:22:33,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.43 | bwd_microstep: 697.42 | bwd_inner_microstep: 697.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3623 [2024-06-11 02:22:36,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.83 | bwd_microstep: 1803.77 | bwd_inner_microstep: 1803.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-11 02:22:38,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.65 | bwd_microstep: 1590.15 | bwd_inner_microstep: 1590.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-11 02:22:40,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.78 | bwd_microstep: 1345.51 | bwd_inner_microstep: 1345.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-11 02:22:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 02:22:45,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.92 | bwd_microstep: 4262.10 | bwd_inner_microstep: 1416.22 | bwd_allreduce_microstep: 2845.82 | step_microstep: 38.05 [2024-06-11 02:22:45,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.40 | bwd: 44683.26 | bwd_inner: 41836.52 | bwd_allreduce: 2846.06 | step: 39.52 {'loss': 1.1634, 'learning_rate': 2.0604705570409166e-06, 'epoch': 0.86} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1965 [2024-06-11 02:22:46,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.20 | bwd_microstep: 881.29 | bwd_inner_microstep: 881.15 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-11 02:22:48,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1487.73 | bwd_inner_microstep: 1487.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3857 [2024-06-11 02:22:50,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.22 | bwd_microstep: 1658.63 | bwd_inner_microstep: 1658.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-11 02:22:52,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.94 | bwd_microstep: 1444.60 | bwd_inner_microstep: 1444.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 02:22:54,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1277.06 | bwd_inner_microstep: 1277.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1948 [2024-06-11 02:22:55,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.97 | bwd_microstep: 825.39 | bwd_inner_microstep: 825.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1884 [2024-06-11 02:22:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 714.10 | bwd_inner_microstep: 714.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224 [2024-06-11 02:22:57,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 958.76 | bwd_inner_microstep: 958.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-11 02:22:59,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.55 | bwd_microstep: 779.48 | bwd_inner_microstep: 779.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 02:23:00,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.52 | bwd_microstep: 1282.24 | bwd_inner_microstep: 1282.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 02:23:02,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.22 | bwd_microstep: 1392.84 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-11 02:23:04,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1398.01 | bwd_inner_microstep: 1397.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3578 [2024-06-11 02:23:06,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.97 | bwd_microstep: 1500.19 | bwd_inner_microstep: 1500.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-11 02:23:08,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.07 | bwd_microstep: 1408.25 | bwd_inner_microstep: 1408.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302 [2024-06-11 02:23:10,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.26 | bwd_microstep: 978.79 | bwd_inner_microstep: 978.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-11 02:23:11,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.94 | bwd_microstep: 798.70 | bwd_inner_microstep: 798.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3671 [2024-06-11 02:23:13,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-11 02:23:15,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.26 | bwd_microstep: 1632.02 | bwd_inner_microstep: 1632.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-11 02:23:17,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.48 | bwd_microstep: 1289.97 | bwd_inner_microstep: 1289.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300 [2024-06-11 02:23:18,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.34 | bwd_microstep: 881.79 | bwd_inner_microstep: 881.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 02:23:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1297.00 | bwd_inner_microstep: 1296.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2289 [2024-06-11 02:23:21,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.55 | bwd_microstep: 1022.35 | bwd_inner_microstep: 1022.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3550 [2024-06-11 02:23:23,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.07 | bwd_microstep: 1346.04 | bwd_inner_microstep: 1346.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 02:23:25,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.20 | bwd_microstep: 1401.90 | bwd_inner_microstep: 1401.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 02:23:27,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.39 | bwd_microstep: 1560.40 | bwd_inner_microstep: 1560.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 02:23:29,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1394.19 | bwd_inner_microstep: 1394.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3721 [2024-06-11 02:23:31,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.88 | bwd_microstep: 1601.01 | bwd_inner_microstep: 1600.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3820 [2024-06-11 02:23:34,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.72 | bwd_microstep: 1812.98 | bwd_inner_microstep: 1812.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-11 02:23:36,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1648.10 | bwd_inner_microstep: 1648.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3792 [2024-06-11 02:23:38,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.69 | bwd_microstep: 1748.65 | bwd_inner_microstep: 1748.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-11 02:23:40,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.77 | bwd_microstep: 1306.64 | bwd_inner_microstep: 1306.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 02:24:03,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.24 | optimizer_step: 6.64 [2024-06-11 02:24:03,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 22291.04 | bwd_inner_microstep: 1552.76 | bwd_allreduce_microstep: 20738.21 | step_microstep: 38.84 [2024-06-11 02:24:03,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15525.10 | bwd: 62405.76 | bwd_inner: 41666.53 | bwd_allreduce: 20738.50 | step: 40.34 {'loss': 1.1499, 'learning_rate': 2.0439092085851685e-06, 'epoch': 0.86} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3555 [2024-06-11 02:24:05,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.66 | bwd_microstep: 1576.35 | bwd_inner_microstep: 1576.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 02:24:07,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.31 | bwd_microstep: 1341.64 | bwd_inner_microstep: 1341.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 02:24:09,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1382.12 | bwd_inner_microstep: 1382.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3814 [2024-06-11 02:24:11,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.63 | bwd_microstep: 1476.84 | bwd_inner_microstep: 1476.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-11 02:24:13,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.51 | bwd_microstep: 1334.99 | bwd_inner_microstep: 1334.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-11 02:24:14,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.65 | bwd_microstep: 814.50 | bwd_inner_microstep: 814.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3792 [2024-06-11 02:24:16,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.53 | bwd_microstep: 1794.57 | bwd_inner_microstep: 1794.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3425 [2024-06-11 02:24:18,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.51 | bwd_microstep: 1180.44 | bwd_inner_microstep: 1180.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-11 02:24:41,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.76 | bwd_microstep: 1467.88 | bwd_inner_microstep: 1467.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3477 [2024-06-11 02:24:43,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.96 | bwd_microstep: 1533.32 | bwd_inner_microstep: 1533.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 02:24:45,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1479.74 | bwd_inner_microstep: 1479.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 02:24:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.29 | bwd_microstep: 1374.13 | bwd_inner_microstep: 1374.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 02:24:49,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569 [2024-06-11 02:24:51,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.26 | bwd_microstep: 1201.47 | bwd_inner_microstep: 1201.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 02:24:53,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.84 | bwd_microstep: 1247.75 | bwd_inner_microstep: 1247.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 02:24:54,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1385.79 | bwd_inner_microstep: 1385.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-11 02:24:57,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.97 | bwd_microstep: 1552.52 | bwd_inner_microstep: 1552.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667 [2024-06-11 02:24:59,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.37 | bwd_microstep: 1615.54 | bwd_inner_microstep: 1615.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 02:25:01,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.07 | bwd_microstep: 1287.29 | bwd_inner_microstep: 1287.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 02:25:03,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1656.18 | bwd_inner_microstep: 1656.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3598 [2024-06-11 02:25:05,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1210.96 | bwd_inner_microstep: 1210.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3896 [2024-06-11 02:25:07,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.25 | bwd_microstep: 1489.00 | bwd_inner_microstep: 1488.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-11 02:25:08,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.91 | bwd_microstep: 1185.96 | bwd_inner_microstep: 1185.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 02:25:10,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1412.09 | bwd_inner_microstep: 1412.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755 [2024-06-11 02:25:12,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.40 | bwd_microstep: 1542.44 | bwd_inner_microstep: 1542.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2009 [2024-06-11 02:25:14,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.79 | bwd_microstep: 893.45 | bwd_inner_microstep: 893.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 893 [2024-06-11 02:25:14,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.28 | bwd_microstep: 369.02 | bwd_inner_microstep: 368.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-11 02:25:16,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.88 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2049 [2024-06-11 02:25:17,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.49 | bwd_microstep: 873.92 | bwd_inner_microstep: 873.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 02:25:19,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.06 | bwd_microstep: 1411.27 | bwd_inner_microstep: 1411.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3401 [2024-06-11 02:25:21,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.87 | bwd_microstep: 1396.42 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3581 [2024-06-11 02:25:23,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.05 | optimizer_step: 6.61 [2024-06-11 02:25:23,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.42 | bwd_microstep: 1401.00 | bwd_inner_microstep: 1393.35 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.41 [2024-06-11 02:25:23,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15930.54 | bwd: 42566.13 | bwd_inner: 42557.62 | bwd_allreduce: 7.83 | step: 38.87 {'loss': 1.1997, 'learning_rate': 2.0274111013878418e-06, 'epoch': 0.86} 1479/1726 [25:39:52<5:06:53, 74.55s/it] 86%|████████▌ | 1480/1726 [25:41:55<6:05:04, 89.04s/it] 86%|████████▌ | 1480/1726 [25:41:55<6:05:04, 89.04s/it] 86%|████████▌ | 1481/1726 [25:43:35<6:16:29, 92.20s/it] 86%|████████▌ | 1481/1726 [25:43:35<6:16:29, 92.20s/it] 86%|████████▌ | 1482/1726 [25:45:21<6:32:48, 96.59s/it] 86%|████████▌ | 1482/1726 [25:45:21<6:32:48, 96.59s/it] 86%|████████▌ | 1483/1726 [25:46:40<6:08:55, 91.09s/it] 86%|████████▌ | 1483/1726 [25:46:40<6:08:55, 91.09s/it] 86%|████████▌ | 1484/1726 [25:48:00<5:53:56, 87.76s/it] 86%|████�dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-11 02:25:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1470.85 | bwd_inner_microstep: 1470.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401 [2024-06-11 02:25:27,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.55 | bwd_microstep: 1209.35 | bwd_inner_microstep: 1209.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298 [2024-06-11 02:25:28,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.50 | bwd_microstep: 972.17 | bwd_inner_microstep: 972.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808 [2024-06-11 02:25:30,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.81 | bwd_microstep: 1380.02 | bwd_inner_microstep: 1379.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3478 [2024-06-11 02:25:32,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1184.74 | bwd_inner_microstep: 1184.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3749 [2024-06-11 02:25:33,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.91 | bwd_microstep: 1367.15 | bwd_inner_microstep: 1367.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-11 02:25:35,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.76 | bwd_microstep: 1152.00 | bwd_inner_microstep: 1151.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 02:25:37,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.54 | bwd_microstep: 1387.91 | bwd_inner_microstep: 1387.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-11 02:25:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.98 | bwd_microstep: 1531.94 | bwd_inner_microstep: 1531.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2127 [2024-06-11 02:25:40,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.98 | bwd_microstep: 832.86 | bwd_inner_microstep: 832.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3569 [2024-06-11 02:25:42,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1365.51 | bwd_inner_microstep: 1365.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2900 [2024-06-11 02:25:44,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.57 | bwd_microstep: 1171.86 | bwd_inner_microstep: 1171.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3692 [2024-06-11 02:25:46,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.64 | bwd_microstep: 1458.49 | bwd_inner_microstep: 1458.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 02:25:48,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.07 | bwd_microstep: 1477.67 | bwd_inner_microstep: 1477.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997 [2024-06-11 02:25:49,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.16 | bwd_microstep: 705.81 | bwd_inner_microstep: 705.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519 [2024-06-11 02:25:51,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1419.76 | bwd_inner_microstep: 1419.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 02:25:52,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 791.37 | bwd_inner_microstep: 791.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-11 02:25:53,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.87 | bwd_microstep: 889.55 | bwd_inner_microstep: 889.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669 [2024-06-11 02:25:55,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.85 | bwd_microstep: 1529.52 | bwd_inner_microstep: 1529.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 02:25:57,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.42 | bwd_microstep: 1349.32 | bwd_inner_microstep: 1349.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033 [2024-06-11 02:25:58,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 809.66 | bwd_inner_microstep: 809.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 02:26:00,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.01 | bwd_microstep: 1253.86 | bwd_inner_microstep: 1253.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 02:26:02,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1256.16 | bwd_inner_microstep: 1256.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3530 [2024-06-11 02:26:03,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.46 | bwd_microstep: 1226.42 | bwd_inner_microstep: 1226.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3449 [2024-06-11 02:26:05,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.59 | bwd_microstep: 1189.97 | bwd_inner_microstep: 1189.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3722 [2024-06-11 02:26:07,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1370.80 | bwd_inner_microstep: 1370.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2050 [2024-06-11 02:26:08,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.02 | bwd_microstep: 911.51 | bwd_inner_microstep: 911.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 02:26:10,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.31 | bwd_microstep: 1511.73 | bwd_inner_microstep: 1511.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2025 [2024-06-11 02:26:11,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.53 | bwd_microstep: 777.76 | bwd_inner_microstep: 777.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2272 [2024-06-11 02:26:13,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.79 | bwd_microstep: 1001.20 | bwd_inner_microstep: 1001.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 02:26:15,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.73 | bwd_microstep: 1647.93 | bwd_inner_microstep: 1647.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-11 02:26:47,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.24 | optimizer_step: 6.58 [2024-06-11 02:26:47,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.89 | bwd_microstep: 31329.12 | bwd_inner_microstep: 1629.42 | bwd_allreduce_microstep: 29699.63 | step_microstep: 38.76 [2024-06-11 02:26:47,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14676.68 | bwd: 68933.98 | bwd_inner: 39233.42 | bwd_allreduce: 29699.88 | step: 40.17 {'loss': 1.1422, 'learning_rate': 2.010976293555189e-06, 'epoch': 0.86} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2028 [2024-06-11 02:26:48,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.38 | bwd_microstep: 896.74 | bwd_inner_microstep: 896.56 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955 [2024-06-11 02:26:50,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.01 | bwd_microstep: 1583.07 | bwd_inner_microstep: 1583.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3821 [2024-06-11 02:26:52,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1502.82 | bwd_inner_microstep: 1502.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468 [2024-06-11 02:26:54,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.69 | bwd_microstep: 948.17 | bwd_inner_microstep: 948.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 02:26:55,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.79 | bwd_microstep: 1272.29 | bwd_inner_microstep: 1272.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3420 [2024-06-11 02:26:57,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.64 | bwd_microstep: 1180.27 | bwd_inner_microstep: 1180.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1879 [2024-06-11 02:26:58,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.22 | bwd_microstep: 681.17 | bwd_inner_microstep: 681.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-11 02:27:00,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.13 | bwd_microstep: 1535.09 | bwd_inner_microstep: 1535.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3425 [2024-06-11 02:27:49,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.61 | bwd_microstep: 1146.88 | bwd_inner_microstep: 1146.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3727 [2024-06-11 02:27:51,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.97 | bwd_microstep: 1354.07 | bwd_inner_microstep: 1354.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-11 02:27:53,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.80 | bwd_microstep: 1377.40 | bwd_inner_microstep: 1377.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3583 [2024-06-11 02:27:55,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.10 | bwd_microstep: 1232.20 | bwd_inner_microstep: 1232.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-11 02:27:57,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1281.57 | bwd_inner_microstep: 1281.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453 [2024-06-11 02:27:58,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.96 | bwd_microstep: 1275.16 | bwd_inner_microstep: 1275.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514 [2024-06-11 02:28:00,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.84 | bwd_microstep: 1570.84 | bwd_inner_microstep: 1570.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649 [2024-06-11 02:28:03,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.88 | bwd_microstep: 1535.80 | bwd_inner_microstep: 1535.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 02:28:04,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3627 [2024-06-11 02:28:06,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.06 | bwd_microstep: 1210.64 | bwd_inner_microstep: 1210.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3673 [2024-06-11 02:28:08,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.74 | bwd_microstep: 1449.39 | bwd_inner_microstep: 1449.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-11 02:28:09,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.30 | bwd_microstep: 799.59 | bwd_inner_microstep: 799.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 02:28:11,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1368.10 | bwd_inner_microstep: 1368.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2000 [2024-06-11 02:28:12,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.64 | bwd_microstep: 768.05 | bwd_inner_microstep: 768.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 02:28:14,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1388.54 | bwd_inner_microstep: 1388.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597 [2024-06-11 02:28:16,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.02 | bwd_microstep: 1401.57 | bwd_inner_microstep: 1401.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 02:28:18,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.74 | bwd_microstep: 1248.94 | bwd_inner_microstep: 1248.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2298 [2024-06-11 02:28:19,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.26 | bwd_microstep: 1003.34 | bwd_inner_microstep: 1003.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3877 [2024-06-11 02:28:22,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 650.25 | bwd_microstep: 1779.15 | bwd_inner_microstep: 1779.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2280 [2024-06-11 02:28:23,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.51 | bwd_microstep: 969.19 | bwd_inner_microstep: 969.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 02:28:25,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.04 | bwd_microstep: 1640.70 | bwd_inner_microstep: 1640.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-11 02:28:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.69 | bwd_microstep: 788.39 | bwd_inner_microstep: 788.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-11 02:28:28,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.47 | bwd_microstep: 1428.92 | bwd_inner_microstep: 1428.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3786 [2024-06-11 02:28:37,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-11 02:28:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.25 | bwd_microstep: 8401.36 | bwd_inner_microstep: 2014.68 | bwd_allreduce_microstep: 6386.62 | step_microstep: 38.49 [2024-06-11 02:28:37,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15328.81 | bwd: 47408.52 | bwd_inner: 41020.84 | bwd_allreduce: 6386.92 | step: 39.93 {'loss': 1.1004, 'learning_rate': 1.9946048429705133e-06, 'epoch': 0.86} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3417 [2024-06-11 02:28:39,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.45 | bwd_microstep: 1433.80 | bwd_inner_microstep: 1433.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3951 [2024-06-11 02:28:42,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.12 | bwd_microstep: 1588.48 | bwd_inner_microstep: 1588.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 02:28:43,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.17 | bwd_microstep: 1278.13 | bwd_inner_microstep: 1278.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 02:28:45,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.88 | bwd_microstep: 1378.32 | bwd_inner_microstep: 1378.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-11 02:28:47,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.33 | bwd_microstep: 1147.98 | bwd_inner_microstep: 1147.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-11 02:28:49,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.60 | bwd_microstep: 1276.18 | bwd_inner_microstep: 1276.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 02:28:50,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.97 | bwd_microstep: 1293.20 | bwd_inner_microstep: 1293.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-11 02:28:52,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.73 | bwd_microstep: 1298.81 | bwd_inner_microstep: 1298.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-11 02:28:54,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.82 | bwd_microstep: 1413.98 | bwd_inner_microstep: 1413.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3708 [2024-06-11 02:28:56,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1527.46 | bwd_inner_microstep: 1527.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3632 [2024-06-11 02:28:58,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.31 | bwd_microstep: 1345.16 | bwd_inner_microstep: 1345.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3674 [2024-06-11 02:29:00,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.34 | bwd_microstep: 1562.30 | bwd_inner_microstep: 1562.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-11 02:29:01,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.83 | bwd_microstep: 676.61 | bwd_inner_microstep: 676.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-11 02:29:03,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1520.83 | bwd_inner_microstep: 1520.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-11 02:29:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.93 | bwd_microstep: 1243.90 | bwd_inner_microstep: 1243.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3677 [2024-06-11 02:29:07,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1553.41 | bwd_inner_microstep: 1553.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510 [2024-06-11 02:29:09,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.58 | bwd_microstep: 1587.81 | bwd_inner_microstep: 1587.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3582 [2024-06-11 02:29:11,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.04 | bwd_microstep: 1205.86 | bwd_inner_microstep: 1205.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-11 02:29:13,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1613.35 | bwd_inner_microstep: 1613.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 02:29:15,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.24 | bwd_microstep: 1388.24 | bwd_inner_microstep: 1388.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3709 [2024-06-11 02:29:17,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.06 | bwd_microstep: 1296.73 | bwd_inner_microstep: 1296.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2163 [2024-06-11 02:29:18,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.67 | bwd_microstep: 759.62 | bwd_inner_microstep: 759.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-11 02:29:20,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.68 | bwd_inner_microstep: 1459.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2009 [2024-06-11 02:29:21,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.45 | bwd_microstep: 709.97 | bwd_inner_microstep: 709.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564 [2024-06-11 02:29:23,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.57 | bwd_microstep: 1346.71 | bwd_inner_microstep: 1346.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458 [2024-06-11 02:29:25,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.96 | bwd_microstep: 1342.52 | bwd_inner_microstep: 1342.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3563 [2024-06-11 02:29:27,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1299.73 | bwd_inner_microstep: 1299.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3741 [2024-06-11 02:29:28,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.51 | bwd_microstep: 1401.62 | bwd_inner_microstep: 1401.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 02:29:31,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1647.63 | bwd_inner_microstep: 1647.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917 [2024-06-11 02:29:32,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.18 | bwd_microstep: 779.75 | bwd_inner_microstep: 779.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827 [2024-06-11 02:29:34,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1481.32 | bwd_inner_microstep: 1481.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3811 [2024-06-11 02:29:52,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-11 02:29:52,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.95 | bwd_microstep: 17053.75 | bwd_inner_microstep: 1986.10 | bwd_allreduce_microstep: 15067.58 | step_microstep: 38.64 [2024-06-11 02:29:52,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.52 | bwd: 57912.87 | bwd_inner: 42844.36 | bwd_allreduce: 15067.82 | step: 40.04 {'loss': 1.1487, 'learning_rate': 1.9782968072939803e-06, 'epoch': 0.86} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-11 02:29:54,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.78 | bwd_microstep: 1458.10 | bwd_inner_microstep: 1458.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-11 02:29:55,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1341.42 | bwd_inner_microstep: 1341.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3845 [2024-06-11 02:29:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.40 | bwd_microstep: 1601.39 | bwd_inner_microstep: 1601.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-11 02:29:59,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.20 | bwd_microstep: 1149.05 | bwd_inner_microstep: 1149.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3821 [2024-06-11 02:30:01,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1381.43 | bwd_inner_microstep: 1381.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421 [2024-06-11 02:30:03,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.70 | bwd_microstep: 1181.12 | bwd_inner_microstep: 1181.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-11 02:30:05,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.34 | bwd_microstep: 1283.32 | bwd_inner_microstep: 1283.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-11 02:30:07,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 02:30:08,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.31 | bwd_microstep: 1150.46 | bwd_inner_microstep: 1150.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412 [2024-06-11 02:30:10,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.52 | bwd_microstep: 1306.39 | bwd_inner_microstep: 1306.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520 [2024-06-11 02:30:12,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1221.23 | bwd_inner_microstep: 1221.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417 [2024-06-11 02:30:14,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.98 | bwd_microstep: 1366.56 | bwd_inner_microstep: 1366.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3645 [2024-06-11 02:30:16,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.65 | bwd_microstep: 1510.75 | bwd_inner_microstep: 1510.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3658 [2024-06-11 02:30:18,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 658.04 | bwd_microstep: 1813.63 | bwd_inner_microstep: 1813.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674 [2024-06-11 02:30:21,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.19 | bwd_microstep: 1721.13 | bwd_inner_microstep: 1721.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2298 [2024-06-11 02:30:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.87 | bwd_microstep: 977.29 | bwd_inner_microstep: 977.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-11 02:30:24,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1345.58 | bwd_inner_microstep: 1345.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1851 [2024-06-11 02:30:25,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 259.33 | bwd_microstep: 671.94 | bwd_inner_microstep: 671.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-11 02:30:26,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.04 | bwd_microstep: 1159.72 | bwd_inner_microstep: 1159.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-11 02:30:29,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.49 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-11 02:30:30,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.10 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-11 02:30:32,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.65 | bwd_microstep: 858.51 | bwd_inner_microstep: 858.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-11 02:30:33,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.64 | bwd_microstep: 971.23 | bwd_inner_microstep: 971.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3813 [2024-06-11 02:30:35,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.02 | bwd_microstep: 1413.79 | bwd_inner_microstep: 1413.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-11 02:30:37,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.28 | bwd_microstep: 1649.30 | bwd_inner_microstep: 1649.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-11 02:30:39,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.73 | bwd_microstep: 1526.79 | bwd_inner_microstep: 1526.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3756 [2024-06-11 02:30:41,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.80 | bwd_microstep: 1274.70 | bwd_inner_microstep: 1274.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-11 02:30:42,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.10 | bwd_microstep: 875.86 | bwd_inner_microstep: 875.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3571 [2024-06-11 02:30:44,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1418.12 | bwd_inner_microstep: 1418.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2059 [2024-06-11 02:30:45,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.64 | bwd_microstep: 909.30 | bwd_inner_microstep: 909.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3581 [2024-06-11 02:30:47,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1455.75 | bwd_inner_microstep: 1455.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033 [2024-06-11 02:31:13,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-11 02:31:13,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.33 | bwd_microstep: 25338.52 | bwd_inner_microstep: 964.46 | bwd_allreduce_microstep: 24374.01 | step_microstep: 37.99 [2024-06-11 02:31:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15443.45 | bwd: 65722.44 | bwd_inner: 41347.52 | bwd_allreduce: 24374.24 | step: 39.46 {'loss': 1.1649, 'learning_rate': 1.9620522439624025e-06, 'epoch': 0.86} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:31:15,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.92 | bwd_microstep: 1362.31 | bwd_inner_microstep: 1362.12 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 02:31:17,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.75 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 02:31:19,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1546.34 | bwd_inner_microstep: 1546.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3789 [2024-06-11 02:31:21,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.18 | bwd_microstep: 1541.69 | bwd_inner_microstep: 1541.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404 [2024-06-11 02:31:23,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.10 | bwd_microstep: 1177.74 | bwd_inner_microstep: 1177.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 02:31:25,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.35 | bwd_microstep: 1377.85 | bwd_inner_microstep: 1377.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-11 02:31:26,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.49 | bwd_microstep: 1298.36 | bwd_inner_microstep: 1298.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 02:31:28,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1244.03 | bwd_inner_microstep: 1244.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3498 [2024-06-11 02:32:16,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.18 | bwd_microstep: 1434.66 | bwd_inner_microstep: 1434.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1914 [2024-06-11 02:32:17,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.65 | bwd_microstep: 774.44 | bwd_inner_microstep: 774.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2038 [2024-06-11 02:32:18,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.55 | bwd_microstep: 900.14 | bwd_inner_microstep: 900.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658 [2024-06-11 02:32:20,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.66 | bwd_microstep: 1413.86 | bwd_inner_microstep: 1413.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3410 [2024-06-11 02:32:22,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.36 | bwd_microstep: 1337.30 | bwd_inner_microstep: 1337.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3488 [2024-06-11 02:32:24,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1563.20 | bwd_inner_microstep: 1563.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3624 [2024-06-11 02:32:26,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1342.13 | bwd_inner_microstep: 1342.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 02:32:28,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.76 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1966 [2024-06-11 02:32:29,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.11 | bwd_microstep: 701.53 | bwd_inner_microstep: 701.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1932 [2024-06-11 02:32:30,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.54 | bwd_microstep: 728.62 | bwd_inner_microstep: 728.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514 [2024-06-11 02:32:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.20 | bwd_microstep: 1189.44 | bwd_inner_microstep: 1189.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3637 [2024-06-11 02:32:33,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.06 | bwd_microstep: 1442.15 | bwd_inner_microstep: 1442.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-11 02:32:35,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1509.63 | bwd_inner_microstep: 1509.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1985 [2024-06-11 02:32:36,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.33 | bwd_microstep: 705.64 | bwd_inner_microstep: 705.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-11 02:32:38,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.52 | bwd_microstep: 1252.79 | bwd_inner_microstep: 1252.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-11 02:32:40,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.32 | bwd_microstep: 1248.59 | bwd_inner_microstep: 1248.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2142 [2024-06-11 02:32:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.56 | bwd_microstep: 737.74 | bwd_inner_microstep: 737.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-11 02:32:42,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 877.40 | bwd_inner_microstep: 877.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602 [2024-06-11 02:32:44,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.64 | bwd_microstep: 1535.00 | bwd_inner_microstep: 1534.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2233 [2024-06-11 02:32:46,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.30 | bwd_microstep: 1058.16 | bwd_inner_microstep: 1058.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 02:32:47,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1388.68 | bwd_inner_microstep: 1388.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-11 02:32:50,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1550.86 | bwd_inner_microstep: 1550.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2064 [2024-06-11 02:32:51,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.21 | bwd_microstep: 1010.76 | bwd_inner_microstep: 1010.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3821 [2024-06-11 02:32:58,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-11 02:32:58,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.54 | bwd_microstep: 6458.81 | bwd_inner_microstep: 1810.77 | bwd_allreduce_microstep: 4647.99 | step_microstep: 37.96 [2024-06-11 02:32:58,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14876.35 | bwd: 44439.91 | bwd_inner: 39790.87 | bwd_allreduce: 4648.30 | step: 39.54 {'loss': 1.1968, 'learning_rate': 1.945871210189054e-06, 'epoch': 0.86} ��███▌ | 1484/1726 [25:48:00<5:53:56, 87.76s/it] 86%|████████▌ | 1485/1726 [25:49:24<5:47:52, 86.61s/it] 86%|████████▌ | 1485/1726 [25:49:24<5:47:52, 86.61s/it] 86%|████████▌ | 1486/1726 [25:51:14<6:15:07, 93.78s/it] 86%|████████▌ | 1486/1726 [25:51:14<6:15:07, 93.78s/it] 86%|████████▌ | 1487/1726 [25:52:28<5:50:11, 87.91s/it] 86%|████████▌ | 1487/1726 [25:52:28<5:50:11, 87.91s/it] 86%|████████▌ | 1488/1726 [25:53:50<5:41:04, 85.99s/it] 86%|████████▌ | 1488/1726 [25:53:50<5:41:04, 85.99s/it] 86%|████████▋ | 1489/1726 [25:55:35<6:02:10, 91.69s/it] 8dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-11 02:33:00,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.43 | bwd_microstep: 1145.44 | bwd_inner_microstep: 1145.27 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-11 02:33:01,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.93 | bwd_microstep: 1149.83 | bwd_inner_microstep: 1149.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 02:33:03,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.98 | bwd_microstep: 1269.34 | bwd_inner_microstep: 1269.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-11 02:33:04,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.51 | bwd_microstep: 676.21 | bwd_inner_microstep: 676.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789 [2024-06-11 02:33:06,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.33 | bwd_microstep: 1446.05 | bwd_inner_microstep: 1446.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735 [2024-06-11 02:33:08,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.53 | bwd_microstep: 1528.15 | bwd_inner_microstep: 1528.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 02:33:10,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.63 | bwd_microstep: 1244.68 | bwd_inner_microstep: 1244.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-11 02:33:11,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.90 | bwd_microstep: 1147.68 | bwd_inner_microstep: 1147.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1953 [2024-06-11 02:33:12,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.14 | bwd_microstep: 728.44 | bwd_inner_microstep: 728.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 02:33:14,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1378.38 | bwd_inner_microstep: 1378.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3715 [2024-06-11 02:33:16,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1491.33 | bwd_inner_microstep: 1491.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 02:33:18,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1247.42 | bwd_inner_microstep: 1247.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4186 [2024-06-11 02:33:21,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 685.27 | bwd_microstep: 1853.00 | bwd_inner_microstep: 1852.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2644 [2024-06-11 02:33:22,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.37 | bwd_microstep: 1207.49 | bwd_inner_microstep: 1207.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1878 [2024-06-11 02:33:23,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.02 | bwd_microstep: 769.30 | bwd_inner_microstep: 769.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3628 [2024-06-11 02:33:26,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1607.14 | bwd_inner_microstep: 1607.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 02:33:28,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.43 | bwd_microstep: 1555.83 | bwd_inner_microstep: 1555.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2000 [2024-06-11 02:33:29,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.57 | bwd_microstep: 740.41 | bwd_inner_microstep: 740.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816 [2024-06-11 02:33:31,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.34 | bwd_microstep: 1600.25 | bwd_inner_microstep: 1600.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3517 [2024-06-11 02:33:33,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.48 | bwd_microstep: 1319.40 | bwd_inner_microstep: 1319.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711 [2024-06-11 02:33:35,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1628.95 | bwd_inner_microstep: 1628.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-11 02:33:37,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.30 | bwd_microstep: 1609.23 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-11 02:33:39,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1296.29 | bwd_inner_microstep: 1296.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2043 [2024-06-11 02:33:40,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.05 | bwd_microstep: 885.52 | bwd_inner_microstep: 885.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-11 02:33:42,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.32 | bwd_microstep: 1484.33 | bwd_inner_microstep: 1484.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3843 [2024-06-11 02:33:44,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.79 | bwd_microstep: 1513.28 | bwd_inner_microstep: 1513.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 02:33:46,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.29 | bwd_microstep: 1372.24 | bwd_inner_microstep: 1372.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1947 [2024-06-11 02:33:47,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.54 | bwd_microstep: 697.98 | bwd_inner_microstep: 697.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2646 [2024-06-11 02:33:49,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.65 | bwd_microstep: 1017.36 | bwd_inner_microstep: 1017.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 02:33:51,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.99 | bwd_microstep: 1289.79 | bwd_inner_microstep: 1289.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-11 02:33:53,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1536.76 | bwd_inner_microstep: 1536.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 02:35:07,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-11 02:35:07,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 74202.91 | bwd_inner_microstep: 1696.99 | bwd_allreduce_microstep: 72505.86 | step_microstep: 38.89 [2024-06-11 02:35:07,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15370.78 | bwd: 113640.46 | bwd_inner: 41133.55 | bwd_allreduce: 72506.17 | step: 40.45 {'loss': 1.1356, 'learning_rate': 1.9297537629634486e-06, 'epoch': 0.86} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 02:35:09,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.11 | bwd_microstep: 1227.04 | bwd_inner_microstep: 1227.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-11 02:35:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.97 | bwd_microstep: 785.52 | bwd_inner_microstep: 785.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2354 [2024-06-11 02:35:12,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.72 | bwd_microstep: 916.48 | bwd_inner_microstep: 916.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874 [2024-06-11 02:35:14,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.49 | bwd_microstep: 1572.51 | bwd_inner_microstep: 1572.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480 [2024-06-11 02:35:15,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1304.41 | bwd_inner_microstep: 1304.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 02:35:17,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.56 | bwd_microstep: 1271.23 | bwd_inner_microstep: 1271.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 02:35:19,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1375.25 | bwd_inner_microstep: 1375.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-11 02:35:20,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.69 | bwd_microstep: 793.73 | bwd_inner_microstep: 793.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-11 02:35:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.42 | bwd_microstep: 789.36 | bwd_inner_microstep: 789.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 02:35:23,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1469.93 | bwd_inner_microstep: 1469.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-11 02:35:26,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.53 | bwd_microstep: 1593.37 | bwd_inner_microstep: 1593.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663 [2024-06-11 02:35:28,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.72 | bwd_microstep: 1603.40 | bwd_inner_microstep: 1603.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3658 [2024-06-11 02:36:12,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.87 | bwd_microstep: 1531.40 | bwd_inner_microstep: 1531.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3624 [2024-06-11 02:36:14,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.23 | bwd_microstep: 1525.82 | bwd_inner_microstep: 1525.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3642 [2024-06-11 02:36:17,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1662.50 | bwd_inner_microstep: 1662.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-11 02:36:18,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.52 | bwd_microstep: 1333.33 | bwd_inner_microstep: 1333.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 02:36:21,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.09 | bwd_microstep: 1499.20 | bwd_inner_microstep: 1499.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 02:36:23,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.09 | bwd_microstep: 1441.67 | bwd_inner_microstep: 1441.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 02:36:24,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.64 | bwd_microstep: 1245.60 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436 [2024-06-11 02:36:26,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.50 | bwd_microstep: 1150.81 | bwd_inner_microstep: 1150.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-11 02:36:28,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.97 | bwd_microstep: 1400.65 | bwd_inner_microstep: 1400.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508 [2024-06-11 02:36:29,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.91 | bwd_microstep: 1189.38 | bwd_inner_microstep: 1189.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515 [2024-06-11 02:36:31,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.63 | bwd_microstep: 1313.87 | bwd_inner_microstep: 1313.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3677 [2024-06-11 02:36:33,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.62 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 02:36:35,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.07 | bwd_microstep: 1249.92 | bwd_inner_microstep: 1249.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 02:36:37,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1379.88 | bwd_inner_microstep: 1379.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 02:36:39,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1399.37 | bwd_inner_microstep: 1399.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3454 [2024-06-11 02:36:41,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.08 | bwd_microstep: 1374.72 | bwd_inner_microstep: 1374.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182 [2024-06-11 02:36:42,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.29 | bwd_microstep: 952.98 | bwd_inner_microstep: 952.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 02:36:44,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.11 | bwd_microstep: 1282.83 | bwd_inner_microstep: 1282.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2013 [2024-06-11 02:36:45,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.67 | bwd_microstep: 803.50 | bwd_inner_microstep: 803.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-11 02:36:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-11 02:36:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 6638.26 | bwd_inner_microstep: 1799.51 | bwd_allreduce_microstep: 4838.69 | step_microstep: 38.19 [2024-06-11 02:36:52,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15571.14 | bwd: 46592.78 | bwd_inner: 41753.18 | bwd_allreduce: 4838.92 | step: 39.67 {'loss': 1.21, 'learning_rate': 1.913699959051152e-06, 'epoch': 0.86} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3456 [2024-06-11 02:36:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.58 | bwd_microstep: 1291.14 | bwd_inner_microstep: 1291.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-11 02:36:55,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 774.91 | bwd_inner_microstep: 774.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-11 02:36:56,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.96 | bwd_microstep: 967.33 | bwd_inner_microstep: 967.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795 [2024-06-11 02:36:59,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.96 | bwd_microstep: 1643.95 | bwd_inner_microstep: 1643.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 02:37:01,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.48 | bwd_microstep: 1375.79 | bwd_inner_microstep: 1375.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 02:37:02,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.13 | bwd_microstep: 1245.75 | bwd_inner_microstep: 1245.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-11 02:37:04,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1529.15 | bwd_inner_microstep: 1529.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 02:37:06,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 02:37:08,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1385.55 | bwd_inner_microstep: 1385.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3459 [2024-06-11 02:37:10,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.78 | bwd_microstep: 1308.48 | bwd_inner_microstep: 1308.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 02:37:12,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.20 | bwd_microstep: 1481.64 | bwd_inner_microstep: 1481.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-11 02:37:14,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.28 | bwd_microstep: 1477.76 | bwd_inner_microstep: 1477.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-11 02:37:16,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.68 | bwd_microstep: 1500.13 | bwd_inner_microstep: 1500.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3500 [2024-06-11 02:37:18,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1546.80 | bwd_inner_microstep: 1546.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3441 [2024-06-11 02:37:20,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.05 | bwd_microstep: 1514.21 | bwd_inner_microstep: 1514.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 02:37:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2560 [2024-06-11 02:37:24,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.86 | bwd_microstep: 1062.78 | bwd_inner_microstep: 1062.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 02:37:26,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.47 | bwd_microstep: 1288.64 | bwd_inner_microstep: 1288.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 02:37:27,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1378.71 | bwd_inner_microstep: 1378.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3679 [2024-06-11 02:37:29,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.17 | bwd_microstep: 1292.77 | bwd_inner_microstep: 1292.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-11 02:37:31,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.72 | bwd_microstep: 1278.02 | bwd_inner_microstep: 1277.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934 [2024-06-11 02:37:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.48 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3819 [2024-06-11 02:37:34,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.99 | bwd_microstep: 1582.19 | bwd_inner_microstep: 1582.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-11 02:37:35,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.14 | bwd_microstep: 878.73 | bwd_inner_microstep: 878.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-11 02:37:38,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.88 | bwd_microstep: 1602.21 | bwd_inner_microstep: 1602.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-11 02:37:40,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.06 | bwd_microstep: 1337.91 | bwd_inner_microstep: 1337.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-11 02:37:41,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.40 | bwd_microstep: 808.21 | bwd_inner_microstep: 808.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3695 [2024-06-11 02:37:43,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1461.48 | bwd_inner_microstep: 1461.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3471 [2024-06-11 02:37:44,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.13 | bwd_microstep: 1266.33 | bwd_inner_microstep: 1266.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2560 [2024-06-11 02:37:46,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.70 | bwd_microstep: 872.67 | bwd_inner_microstep: 872.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-11 02:37:48,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1446.37 | bwd_inner_microstep: 1446.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2234 [2024-06-11 02:37:54,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-11 02:37:54,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.66 | bwd_microstep: 6311.38 | bwd_inner_microstep: 1146.09 | bwd_allreduce_microstep: 5165.22 | step_microstep: 38.79 [2024-06-11 02:37:54,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15369.95 | bwd: 46363.31 | bwd_inner: 41197.16 | bwd_allreduce: 5165.46 | step: 40.28 {'loss': 1.1862, 'learning_rate': 1.8977098549935745e-06, 'epoch': 0.86} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 02:37:56,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1363.74 | bwd_inner_microstep: 1363.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-11 02:37:58,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.19 | bwd_microstep: 1340.73 | bwd_inner_microstep: 1340.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3884 [2024-06-11 02:38:00,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.46 | bwd_microstep: 1645.47 | bwd_inner_microstep: 1645.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-11 02:38:02,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1310.83 | bwd_inner_microstep: 1310.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 02:38:04,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.59 | bwd_microstep: 1280.31 | bwd_inner_microstep: 1280.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 02:38:06,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.76 | bwd_microstep: 1345.82 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4066 [2024-06-11 02:38:08,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.52 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 02:38:10,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.49 | bwd_microstep: 1276.55 | bwd_inner_microstep: 1276.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2085 [2024-06-11 02:38:11,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.34 | bwd_microstep: 733.43 | bwd_inner_microstep: 733.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4063 [2024-06-11 02:38:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.58 | bwd_microstep: 1722.68 | bwd_inner_microstep: 1722.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-11 02:38:15,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.99 | bwd_microstep: 1349.21 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-11 02:38:17,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.16 | bwd_microstep: 1628.14 | bwd_inner_microstep: 1628.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-11 02:38:18,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.46 | bwd_microstep: 796.25 | bwd_inner_microstep: 796.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-11 02:38:21,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1576.63 | bwd_inner_microstep: 1576.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-11 02:38:22,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.23 | bwd_microstep: 790.26 | bwd_inner_microstep: 790.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3451 [2024-06-11 02:38:24,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.64 | bwd_microstep: 1381.51 | bwd_inner_microstep: 1381.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 02:38:26,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1491.20 | bwd_inner_microstep: 1491.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-11 02:38:27,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.25 | bwd_microstep: 1345.02 | bwd_inner_microstep: 1344.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2069 [2024-06-11 02:38:29,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 920.21 | bwd_inner_microstep: 920.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 02:38:31,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1299.78 | bwd_inner_microstep: 1299.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-11 02:38:33,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.59 | bwd_microstep: 1405.29 | bwd_inner_microstep: 1405.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3680 [2024-06-11 02:38:34,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.17 | bwd_microstep: 1262.46 | bwd_inner_microstep: 1262.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.99 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 02:38:36,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.30 | bwd_microstep: 1399.25 | bwd_inner_microstep: 1399.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3718 [2024-06-11 02:38:38,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.83 | bwd_microstep: 1564.85 | bwd_inner_microstep: 1564.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3614 [2024-06-11 02:38:40,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.04 | bwd_microstep: 1539.72 | bwd_inner_microstep: 1539.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3446 [2024-06-11 02:38:42,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.99 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 02:38:44,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.30 | bwd_microstep: 1400.78 | bwd_inner_microstep: 1400.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4233 [2024-06-11 02:38:47,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 687.25 | bwd_microstep: 1871.40 | bwd_inner_microstep: 1871.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1962 [2024-06-11 02:38:48,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.23 | bwd_microstep: 794.83 | bwd_inner_microstep: 794.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996 [2024-06-11 02:38:49,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 800.76 | bwd_inner_microstep: 800.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3470 [2024-06-11 02:38:51,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.75 | bwd_microstep: 1523.71 | bwd_inner_microstep: 1523.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819 [2024-06-11 02:38:55,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.09 | optimizer_step: 6.59 [2024-06-11 02:38:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.50 | bwd_microstep: 3624.81 | bwd_inner_microstep: 1559.32 | bwd_allreduce_microstep: 2065.45 | step_microstep: 37.71 [2024-06-11 02:38:55,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15903.68 | bwd: 44720.42 | bwd_inner: 42654.02 | bwd_allreduce: 2065.70 | step: 39.27 {'loss': 1.1798, 'learning_rate': 1.8817835071077882e-06, 'epoch': 0.86} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402 [2024-06-11 02:38:57,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.21 | bwd_microstep: 1435.67 | bwd_inner_microstep: 1435.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-11 02:38:59,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1273.59 | bwd_inner_microstep: 1273.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 839 [2024-06-11 02:39:00,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 132.34 | bwd_microstep: 340.82 | bwd_inner_microstep: 340.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 02:39:02,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1454.23 | bwd_inner_microstep: 1454.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 02:39:03,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.02 | bwd_microstep: 1243.69 | bwd_inner_microstep: 1243.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2705 [2024-06-11 02:39:05,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.31 | bwd_microstep: 1000.63 | bwd_inner_microstep: 1000.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-11 02:39:07,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1349.24 | bwd_inner_microstep: 1349.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-11 02:39:08,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.98 | bwd_microstep: 1150.70 | bwd_inner_microstep: 1150.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 02:39:10,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.39 | bwd_microstep: 1381.82 | bwd_inner_microstep: 1381.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2634 [2024-06-11 02:39:12,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.38 | bwd_microstep: 1112.28 | bwd_inner_microstep: 1112.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2184 [2024-06-11 02:39:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.94 | bwd_microstep: 762.66 | bwd_inner_microstep: 762.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 02:39:15,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.86 | bwd_microstep: 1484.43 | bwd_inner_microstep: 1484.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3674 [2024-06-11 02:39:17,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.17 | bwd_microstep: 1523.98 | bwd_inner_microstep: 1523.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3680 [2024-06-11 02:39:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.16 | bwd_microstep: 1523.87 | bwd_inner_microstep: 1523.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438 [2024-06-11 02:39:20,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.49 | bwd_microstep: 1154.65 | bwd_inner_microstep: 1154.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 02:39:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.18 | bwd_microstep: 1281.98 | bwd_inner_microstep: 1281.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1961 [2024-06-11 02:39:23,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.77 | bwd_microstep: 826.46 | bwd_inner_microstep: 826.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 02:39:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.81 | bwd_microstep: 1380.01 | bwd_inner_microstep: 1379.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-11 02:39:28,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.60 | bwd_microstep: 1608.44 | bwd_inner_microstep: 1608.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 02:39:29,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.41 | bwd_microstep: 1372.63 | bwd_inner_microstep: 1372.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3007 [2024-06-11 02:39:31,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.05 | bwd_microstep: 1300.57 | bwd_inner_microstep: 1300.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528 [2024-06-11 02:39:33,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.25 | bwd_microstep: 1535.49 | bwd_inner_microstep: 1535.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 02:39:35,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.53 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-11 02:39:37,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.30 | bwd_microstep: 975.91 | bwd_inner_microstep: 975.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 02:39:39,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1416.82 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-11 02:39:41,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.71 | bwd_microstep: 1656.50 | bwd_inner_microstep: 1656.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3566 [2024-06-11 02:39:43,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.43 | bwd_microstep: 1599.27 | bwd_inner_microstep: 1599.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3722 [2024-06-11 02:39:45,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1537.08 | bwd_inner_microstep: 1537.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2021 [2024-06-11 02:39:47,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.41 | bwd_microstep: 1726.94 | bwd_inner_microstep: 1726.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3773 [2024-06-11 02:39:50,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.06 | bwd_microstep: 1634.48 | bwd_inner_microstep: 1634.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3385 [2024-06-11 02:39:51,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.24 | bwd_microstep: 1368.84 | bwd_inner_microstep: 1368.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3574 [2024-06-11 02:39:57,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.32 | optimizer_step: 6.63 [2024-06-11 02:39:57,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.34 | bwd_microstep: 5344.15 | bwd_inner_microstep: 1494.02 | bwd_allreduce_microstep: 3850.06 | step_microstep: 38.37 [2024-06-11 02:39:57,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15480.60 | bwd: 46244.48 | bwd_inner: 42393.49 | bwd_allreduce: 3850.31 | step: 39.88 6%|████████▋ | 1489/1726 [25:55:35<6:02:10, 91.69s/it] 86%|████████▋ | 1490/1726 [25:57:44<6:45:05, 102.99s/it] 86%|████████▋ | 1490/1726 [25:57:44<6:45:05, 102.99s/it] 86%|████████▋ | 1491/1726 [25:59:29<6:45:32, 103.54s/it] 86%|████████▋ | 1491/1726 [25:59:29<6:45:32, 103.54s/it] 86%|████████▋ | 1492/1726 [26:00:31<5:55:17, 91.10s/it] 86%|████████▋ | 1492/1726 [26:00:31<5:55:17, 91.10s/it] 87%|████████▋ | 1493/1726 [26:01:32<5:18:38, 82.06s/it] 87%|████████▋ | 1493/1726 [26:01:32<5:18:38, 82.06s/it] 87%|████████▋ | 1494/1726 [26:02:34<4:54:04, 76.06s/it] {'loss': 1.2018, 'learning_rate': 1.8659209714863013e-06, 'epoch': 0.87} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496 [2024-06-11 02:40:00,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.90 | bwd_microstep: 1571.99 | bwd_inner_microstep: 1571.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-11 02:40:01,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.64 | bwd_microstep: 1145.76 | bwd_inner_microstep: 1145.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 02:40:03,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.16 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3474 [2024-06-11 02:40:05,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.68 | bwd_microstep: 1408.13 | bwd_inner_microstep: 1408.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3777 [2024-06-11 02:40:07,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.77 | bwd_microstep: 1643.75 | bwd_inner_microstep: 1643.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 02:40:09,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1253.98 | bwd_inner_microstep: 1253.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 02:40:11,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.74 | bwd_microstep: 1385.18 | bwd_inner_microstep: 1385.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3411 [2024-06-11 02:40:12,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.39 | bwd_microstep: 1152.58 | bwd_inner_microstep: 1152.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3502 [2024-06-11 02:40:14,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.27 | bwd_microstep: 1335.78 | bwd_inner_microstep: 1335.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485 [2024-06-11 02:40:16,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.14 | bwd_microstep: 1185.81 | bwd_inner_microstep: 1185.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501 [2024-06-11 02:40:18,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1317.03 | bwd_inner_microstep: 1317.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2096 [2024-06-11 02:40:19,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 916.57 | bwd_inner_microstep: 916.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3672 [2024-06-11 02:40:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1655.15 | bwd_inner_microstep: 1655.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2730 [2024-06-11 02:40:23,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.64 | bwd_microstep: 1100.24 | bwd_inner_microstep: 1100.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-11 02:40:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1386.64 | bwd_inner_microstep: 1386.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-11 02:40:27,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1473.87 | bwd_inner_microstep: 1473.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3646 [2024-06-11 02:40:29,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.19 | bwd_microstep: 1710.47 | bwd_inner_microstep: 1710.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-11 02:40:31,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.97 | bwd_microstep: 1521.49 | bwd_inner_microstep: 1521.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-11 02:40:33,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 1253.20 | bwd_inner_microstep: 1253.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-11 02:40:35,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.46 | bwd_microstep: 1612.31 | bwd_inner_microstep: 1612.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-11 02:40:37,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.45 | bwd_microstep: 1182.08 | bwd_inner_microstep: 1182.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-11 02:40:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.77 | bwd_microstep: 1428.77 | bwd_inner_microstep: 1428.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2137 [2024-06-11 02:40:40,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.71 | bwd_microstep: 832.05 | bwd_inner_microstep: 832.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1379 [2024-06-11 02:40:41,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.62 | bwd_microstep: 524.79 | bwd_inner_microstep: 524.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3722 [2024-06-11 02:40:43,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.98 | bwd_microstep: 1482.22 | bwd_inner_microstep: 1482.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 02:40:45,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-11 02:40:47,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.96 | bwd_microstep: 1502.58 | bwd_inner_microstep: 1502.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 02:40:49,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.61 | bwd_microstep: 1455.69 | bwd_inner_microstep: 1455.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821 [2024-06-11 02:40:51,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.20 | bwd_microstep: 1418.55 | bwd_inner_microstep: 1418.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3808 [2024-06-11 02:40:53,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.84 | bwd_microstep: 1751.02 | bwd_inner_microstep: 1751.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576 [2024-06-11 02:40:55,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.23 | bwd_microstep: 1300.23 | bwd_inner_microstep: 1300.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3588 [2024-06-11 02:41:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 02:41:00,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 4056.27 | bwd_inner_microstep: 1649.25 | bwd_allreduce_microstep: 2406.96 | step_microstep: 37.87 [2024-06-11 02:41:00,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16202.05 | bwd: 45898.69 | bwd_inner: 43490.82 | bwd_allreduce: 2407.19 | step: 39.37 {'loss': 1.1887, 'learning_rate': 1.850122303996882e-06, 'epoch': 0.87} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 02:41:02,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.15 | bwd_microstep: 1478.39 | bwd_inner_microstep: 1478.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-11 02:41:03,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 787.37 | bwd_inner_microstep: 787.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3902 [2024-06-11 02:41:05,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.83 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3831 [2024-06-11 02:41:07,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1387.75 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4185 [2024-06-11 02:41:09,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.19 | bwd_microstep: 1650.14 | bwd_inner_microstep: 1650.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943 [2024-06-11 02:41:10,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.01 | bwd_microstep: 759.88 | bwd_inner_microstep: 759.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-11 02:41:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.86 | bwd_microstep: 802.56 | bwd_inner_microstep: 802.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705 [2024-06-11 02:41:13,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.78 | bwd_microstep: 1526.70 | bwd_inner_microstep: 1526.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1967 [2024-06-11 02:41:14,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.82 | bwd_microstep: 796.34 | bwd_inner_microstep: 796.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3482 [2024-06-11 02:41:16,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.57 | bwd_microstep: 1312.30 | bwd_inner_microstep: 1312.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 02:41:18,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1494.13 | bwd_inner_microstep: 1494.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-11 02:41:21,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.03 | bwd_microstep: 1619.70 | bwd_inner_microstep: 1619.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1894 [2024-06-11 02:41:22,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.43 | bwd_microstep: 836.38 | bwd_inner_microstep: 836.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3652 [2024-06-11 02:41:24,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.69 | bwd_microstep: 1719.90 | bwd_inner_microstep: 1719.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 02:41:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1389.42 | bwd_inner_microstep: 1389.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3636 [2024-06-11 02:41:28,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.22 | bwd_microstep: 1709.40 | bwd_inner_microstep: 1709.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 02:41:30,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.42 | bwd_microstep: 1281.19 | bwd_inner_microstep: 1281.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-11 02:41:32,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.66 | bwd_microstep: 1415.44 | bwd_inner_microstep: 1415.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3533 [2024-06-11 02:41:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1294.99 | bwd_inner_microstep: 1294.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 02:41:36,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3618 [2024-06-11 02:41:38,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.68 | bwd_microstep: 1472.76 | bwd_inner_microstep: 1472.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3725 [2024-06-11 02:41:40,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.96 | bwd_microstep: 1240.78 | bwd_inner_microstep: 1240.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 02:41:42,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.83 | bwd_microstep: 1660.81 | bwd_inner_microstep: 1660.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2195 [2024-06-11 02:41:43,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.92 | bwd_microstep: 861.62 | bwd_inner_microstep: 861.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3552 [2024-06-11 02:41:45,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.78 | bwd_microstep: 1439.82 | bwd_inner_microstep: 1439.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3590 [2024-06-11 02:41:47,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1531.68 | bwd_inner_microstep: 1531.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-11 02:41:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1399.47 | bwd_inner_microstep: 1399.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-11 02:41:51,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.22 | bwd_microstep: 1334.22 | bwd_inner_microstep: 1334.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3800 [2024-06-11 02:41:53,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.16 | bwd_microstep: 1621.32 | bwd_inner_microstep: 1621.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757 [2024-06-11 02:41:55,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.48 | bwd_microstep: 1632.85 | bwd_inner_microstep: 1632.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813 [2024-06-11 02:41:58,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.94 | bwd_microstep: 1619.67 | bwd_inner_microstep: 1619.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3803 [2024-06-11 02:42:00,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.01 | optimizer_step: 6.59 [2024-06-11 02:42:00,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 2060.95 | bwd_inner_microstep: 1685.24 | bwd_allreduce_microstep: 375.66 | step_microstep: 37.43 [2024-06-11 02:42:00,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16235.79 | bwd: 43922.02 | bwd_inner: 43545.45 | bwd_allreduce: 375.89 | step: 38.83 {'loss': 1.1628, 'learning_rate': 1.8343875602823558e-06, 'epoch': 0.87} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-11 02:42:01,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 785.39 | bwd_inner_microstep: 785.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-11 02:42:03,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.47 | bwd_microstep: 1210.52 | bwd_inner_microstep: 1210.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 02:42:05,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1375.87 | bwd_inner_microstep: 1375.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-11 02:42:07,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1298.44 | bwd_inner_microstep: 1298.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 02:42:09,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1278.03 | bwd_inner_microstep: 1278.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3787 [2024-06-11 02:42:11,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.68 | bwd_microstep: 1443.71 | bwd_inner_microstep: 1443.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1911 [2024-06-11 02:42:12,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.05 | bwd_microstep: 780.04 | bwd_inner_microstep: 780.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3698 [2024-06-11 02:42:14,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1578.20 | bwd_inner_microstep: 1578.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1925 [2024-06-11 02:42:15,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 819.78 | bwd_inner_microstep: 819.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-11 02:42:17,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1335.04 | bwd_inner_microstep: 1335.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3440 [2024-06-11 02:42:19,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.46 | bwd_microstep: 1311.83 | bwd_inner_microstep: 1311.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2648 [2024-06-11 02:42:20,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.56 | bwd_microstep: 957.07 | bwd_inner_microstep: 957.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3501 [2024-06-11 02:42:22,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.49 | bwd_microstep: 1550.29 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3657 [2024-06-11 02:42:24,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.54 | bwd_microstep: 1621.73 | bwd_inner_microstep: 1621.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-11 02:42:26,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.07 | bwd_microstep: 1253.47 | bwd_inner_microstep: 1253.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3632 [2024-06-11 02:42:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.65 | bwd_microstep: 1544.91 | bwd_inner_microstep: 1544.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3660 [2024-06-11 02:42:30,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1426.82 | bwd_inner_microstep: 1426.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-11 02:42:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.83 | bwd_microstep: 1399.45 | bwd_inner_microstep: 1399.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-11 02:42:34,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1458.46 | bwd_inner_microstep: 1458.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 02:42:36,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1290.02 | bwd_inner_microstep: 1290.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-11 02:42:37,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.40 | bwd_microstep: 800.86 | bwd_inner_microstep: 800.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 02:42:39,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.34 | bwd_microstep: 1378.21 | bwd_inner_microstep: 1378.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997 [2024-06-11 02:42:40,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.91 | bwd_microstep: 738.24 | bwd_inner_microstep: 738.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2188 [2024-06-11 02:42:41,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.45 | bwd_microstep: 766.45 | bwd_inner_microstep: 766.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 02:42:43,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1396.89 | bwd_inner_microstep: 1396.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1996 [2024-06-11 02:42:44,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.13 | bwd_microstep: 802.55 | bwd_inner_microstep: 802.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3826 [2024-06-11 02:42:46,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.15 | bwd_microstep: 1605.03 | bwd_inner_microstep: 1605.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 02:42:48,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.91 | bwd_microstep: 1395.60 | bwd_inner_microstep: 1395.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614 [2024-06-11 02:42:50,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.44 | bwd_microstep: 1604.49 | bwd_inner_microstep: 1604.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 02:42:52,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1556.07 | bwd_inner_microstep: 1556.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089 [2024-06-11 02:42:54,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 915.97 | bwd_inner_microstep: 915.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3585 [2024-06-11 02:43:01,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.62 [2024-06-11 02:43:01,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.55 | bwd_microstep: 6955.24 | bwd_inner_microstep: 1817.03 | bwd_allreduce_microstep: 5138.17 | step_microstep: 37.98 [2024-06-11 02:43:01,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.61 | bwd: 45634.72 | bwd_inner: 40495.61 | bwd_allreduce: 5138.41 | step: 39.58 {'loss': 1.1346, 'learning_rate': 1.8187167957604047e-06, 'epoch': 0.87} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490 [2024-06-11 02:43:04,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1572.49 | bwd_inner_microstep: 1572.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3936 [2024-06-11 02:43:06,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.63 | bwd_microstep: 1589.30 | bwd_inner_microstep: 1589.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 02:43:07,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.01 | bwd_microstep: 1245.21 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 02:43:09,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.77 | bwd_microstep: 1245.58 | bwd_inner_microstep: 1245.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-11 02:43:10,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.18 | bwd_microstep: 793.49 | bwd_inner_microstep: 793.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3781 [2024-06-11 02:43:12,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.78 | bwd_microstep: 1443.62 | bwd_inner_microstep: 1443.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2238 [2024-06-11 02:43:14,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.07 | bwd_microstep: 927.37 | bwd_inner_microstep: 927.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3721 [2024-06-11 02:43:16,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.53 | bwd_microstep: 1559.17 | bwd_inner_microstep: 1559.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2086 [2024-06-11 02:43:17,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.01 | bwd_microstep: 818.75 | bwd_inner_microstep: 818.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3403 [2024-06-11 02:43:19,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.63 | bwd_microstep: 1212.38 | bwd_inner_microstep: 1212.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 02:43:20,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.58 | bwd_microstep: 1279.53 | bwd_inner_microstep: 1279.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1869 [2024-06-11 02:43:21,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.40 | bwd_microstep: 803.47 | bwd_inner_microstep: 803.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-11 02:43:23,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1443.31 | bwd_inner_microstep: 1443.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-11 02:43:26,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1583.85 | bwd_inner_microstep: 1583.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2954 [2024-06-11 02:43:27,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.76 | bwd_microstep: 1012.42 | bwd_inner_microstep: 1012.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3541 [2024-06-11 02:43:29,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.62 | bwd_microstep: 1376.53 | bwd_inner_microstep: 1376.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178 [2024-06-11 02:43:30,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.42 | bwd_microstep: 951.70 | bwd_inner_microstep: 951.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3458 [2024-06-11 02:43:32,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.66 | bwd_microstep: 1570.19 | bwd_inner_microstep: 1570.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-11 02:43:34,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1407.23 | bwd_inner_microstep: 1407.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 02:43:36,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.20 | bwd_microstep: 1294.57 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-11 02:43:38,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.63 | bwd_microstep: 1191.55 | bwd_inner_microstep: 1191.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3580 [2024-06-11 02:43:40,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1632.96 | bwd_inner_microstep: 1632.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 02:43:42,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.80 | bwd_microstep: 1352.81 | bwd_inner_microstep: 1352.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-11 02:43:44,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.67 | bwd_microstep: 1508.40 | bwd_inner_microstep: 1508.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-11 02:43:46,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.56 | bwd_microstep: 1503.10 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-11 02:43:48,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1608.89 | bwd_inner_microstep: 1608.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-11 02:43:50,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.67 | bwd_microstep: 1438.32 | bwd_inner_microstep: 1438.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 02:43:52,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.88 | bwd_microstep: 1459.19 | bwd_inner_microstep: 1459.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-11 02:43:54,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.64 | bwd_microstep: 1501.84 | bwd_inner_microstep: 1501.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-11 02:43:56,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1543.04 | bwd_inner_microstep: 1543.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3581 [2024-06-11 02:43:59,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.21 | bwd_microstep: 1563.75 | bwd_inner_microstep: 1563.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1921 [2024-06-11 02:44:03,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.19 | optimizer_step: 6.58 [2024-06-11 02:44:03,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.88 | bwd_microstep: 3615.91 | bwd_inner_microstep: 926.54 | bwd_allreduce_microstep: 2689.32 | step_microstep: 37.81 [2024-06-11 02:44:03,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15813.45 | bwd: 45049.92 | bwd_inner: 42359.70 | bwd_allreduce: 2689.55 | step: 39.41 {'loss': 1.1652, 'learning_rate': 1.803110065623388e-06, 'epoch': 0.87} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 5207 [2024-06-11 02:44:05,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.68 | bwd_microstep: 1819.01 | bwd_inner_microstep: 1818.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3515 [2024-06-11 02:44:07,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.87 | bwd_microstep: 1336.24 | bwd_inner_microstep: 1336.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480 [2024-06-11 02:44:09,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.10 | bwd_microstep: 1438.77 | bwd_inner_microstep: 1438.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 02:44:11,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1278.75 | bwd_inner_microstep: 1278.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 02:44:12,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1250.39 | bwd_inner_microstep: 1250.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 02:44:14,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.16 | bwd_microstep: 1338.96 | bwd_inner_microstep: 1338.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3417 [2024-06-11 02:44:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.64 | bwd_microstep: 1151.77 | bwd_inner_microstep: 1151.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-11 02:44:18,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1344.56 | bwd_inner_microstep: 1344.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 02:44:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1249.02 | bwd_inner_microstep: 1248.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-11 02:44:22,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.60 | bwd_microstep: 1526.75 | bwd_inner_microstep: 1526.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 02:44:24,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.16 | bwd_microstep: 1485.77 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-11 02:44:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1347.53 | bwd_inner_microstep: 1347.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 02:44:27,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.99 | bwd_microstep: 1297.87 | bwd_inner_microstep: 1297.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-11 02:44:29,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1576.56 | bwd_inner_microstep: 1576.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939 [2024-06-11 02:44:30,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.21 | bwd_microstep: 703.65 | bwd_inner_microstep: 703.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 02:44:32,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.96 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 02:44:34,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1389.90 | bwd_inner_microstep: 1389.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3672 [2024-06-11 02:44:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.92 | bwd_microstep: 1424.64 | bwd_inner_microstep: 1424.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690 [2024-06-11 02:44:38,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.96 | bwd_microstep: 1330.46 | bwd_inner_microstep: 1330.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 02:44:40,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.65 | bwd_microstep: 1411.39 | bwd_inner_microstep: 1411.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-11 02:44:42,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.10 | bwd_microstep: 1404.17 | bwd_inner_microstep: 1404.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2149 [2024-06-11 02:44:43,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.25 | bwd_microstep: 852.42 | bwd_inner_microstep: 852.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752 [2024-06-11 02:44:45,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.76 | bwd_microstep: 1376.27 | bwd_inner_microstep: 1376.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3627 [2024-06-11 02:44:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.84 | bwd_microstep: 1540.25 | bwd_inner_microstep: 1540.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 02:44:49,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.30 | bwd_microstep: 1558.58 | bwd_inner_microstep: 1558.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3435 [2024-06-11 02:44:51,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.97 | bwd_microstep: 1188.82 | bwd_inner_microstep: 1188.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819 [2024-06-11 02:44:53,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.52 | bwd_microstep: 1522.30 | bwd_inner_microstep: 1522.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-11 02:44:55,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.98 | bwd_microstep: 1443.37 | bwd_inner_microstep: 1443.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3445 [2024-06-11 02:44:57,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.48 | bwd_microstep: 1546.91 | bwd_inner_microstep: 1546.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3732 [2024-06-11 02:44:59,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.00 | bwd_microstep: 1300.38 | bwd_inner_microstep: 1300.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3819 [2024-06-11 02:45:01,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.56 | bwd_microstep: 1487.49 | bwd_inner_microstep: 1487.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 02:45:03,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.31 | optimizer_step: 6.60 [2024-06-11 02:45:03,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1839.99 | bwd_inner_microstep: 1451.56 | bwd_allreduce_microstep: 388.38 | step_microstep: 37.70 [2024-06-11 02:45:03,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16398.70 | bwd: 44153.17 | bwd_inner: 43763.90 | bwd_allreduce: 388.61 | step: 39.19 87%|████████▋ | 1494/1726 [26:02:34<4:54:04, 76.06s/it] 87%|████████▋ | 1495/1726 [26:03:37<4:37:04, 71.97s/it] 87%|████████▋ | 1495/1726 [26:03:37<4:37:04, 71.97s/it] 87%|████████▋ | 1496/1726 [26:04:37<4:22:40, 68.52s/it] 87%|████████▋ | 1496/1726 [26:04:37<4:22:40, 68.52s/it] 87%|████████▋ | 1497/1726 [26:05:38<4:13:00, 66.29s/it] 87%|████████▋ | 1497/1726 [26:05:38<4:13:00, 66.29s/it] 87%|████████▋ | 1498/1726 [26:06:39<4:06:06, 64.77s/it] 87%|████████▋ | 1498/1726 [26:06:39<4:06:06, 64.77s/it] 87%|████████▋ | 1499/1726 [26:07:40<4:00:37, 63.60s/it] {'loss': 1.1638, 'learning_rate': 1.7875674248381237e-06, 'epoch': 0.87} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 02:45:06,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1487.95 | bwd_inner_microstep: 1487.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915 [2024-06-11 02:45:08,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.01 | bwd_microstep: 1689.92 | bwd_inner_microstep: 1689.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-11 02:45:10,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.57 | bwd_microstep: 1286.74 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2310 [2024-06-11 02:45:11,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.13 | bwd_microstep: 818.26 | bwd_inner_microstep: 818.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 02:45:13,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.55 | bwd_microstep: 1281.26 | bwd_inner_microstep: 1281.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3902 [2024-06-11 02:45:15,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.55 | bwd_microstep: 1519.11 | bwd_inner_microstep: 1519.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 02:45:17,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.19 | bwd_microstep: 1384.80 | bwd_inner_microstep: 1384.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 02:45:18,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.97 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1885 [2024-06-11 02:45:19,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.62 | bwd_microstep: 712.90 | bwd_inner_microstep: 712.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 02:45:21,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.21 | bwd_microstep: 1389.27 | bwd_inner_microstep: 1389.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2111 [2024-06-11 02:45:22,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.00 | bwd_microstep: 762.22 | bwd_inner_microstep: 762.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1953 [2024-06-11 02:45:24,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.46 | bwd_microstep: 857.26 | bwd_inner_microstep: 857.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2978 [2024-06-11 02:45:25,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.05 | bwd_microstep: 1102.77 | bwd_inner_microstep: 1102.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 02:45:27,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-11 02:45:29,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.59 | bwd_microstep: 1293.08 | bwd_inner_microstep: 1293.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 02:45:31,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.45 | bwd_microstep: 1483.77 | bwd_inner_microstep: 1483.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3498 [2024-06-11 02:45:33,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.87 | bwd_microstep: 1548.77 | bwd_inner_microstep: 1548.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3665 [2024-06-11 02:45:35,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1426.98 | bwd_inner_microstep: 1426.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-11 02:45:36,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.92 | bwd_microstep: 795.19 | bwd_inner_microstep: 795.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3507 [2024-06-11 02:45:38,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.66 | bwd_microstep: 1189.84 | bwd_inner_microstep: 1189.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 02:45:40,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.20 | bwd_microstep: 1653.53 | bwd_inner_microstep: 1653.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-11 02:45:42,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.06 | bwd_microstep: 1524.68 | bwd_inner_microstep: 1524.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 02:45:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1286.59 | bwd_inner_microstep: 1286.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-11 02:45:46,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.81 | bwd_microstep: 1658.82 | bwd_inner_microstep: 1658.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-11 02:45:48,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1438.10 | bwd_inner_microstep: 1438.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2018 [2024-06-11 02:45:49,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.50 | bwd_microstep: 839.59 | bwd_inner_microstep: 839.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 02:45:51,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.00 | bwd_microstep: 1659.10 | bwd_inner_microstep: 1659.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-11 02:45:53,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.36 | bwd_microstep: 1439.43 | bwd_inner_microstep: 1439.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2060 [2024-06-11 02:45:55,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.61 | bwd_microstep: 846.06 | bwd_inner_microstep: 846.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 02:45:57,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1410.53 | bwd_inner_microstep: 1410.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 02:45:58,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1378.52 | bwd_inner_microstep: 1378.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3585 [2024-06-11 02:46:05,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.21 | optimizer_step: 6.59 [2024-06-11 02:46:05,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.74 | bwd_microstep: 5873.01 | bwd_inner_microstep: 1928.12 | bwd_allreduce_microstep: 3944.82 | step_microstep: 38.74 [2024-06-11 02:46:05,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15536.30 | bwd: 45706.84 | bwd_inner: 41761.09 | bwd_allreduce: 3945.06 | step: 40.15 {'loss': 1.188, 'learning_rate': 1.7720889281457121e-06, 'epoch': 0.87} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3480 [2024-06-11 02:46:07,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.89 | bwd_microstep: 1568.00 | bwd_inner_microstep: 1567.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4362 [2024-06-11 02:46:09,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.09 | bwd_microstep: 1507.60 | bwd_inner_microstep: 1507.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2349 [2024-06-11 02:46:11,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.26 | bwd_microstep: 919.91 | bwd_inner_microstep: 919.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 02:46:13,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.16 | bwd_microstep: 1494.38 | bwd_inner_microstep: 1494.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3776 [2024-06-11 02:46:15,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.27 | bwd_microstep: 1541.85 | bwd_inner_microstep: 1541.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 02:46:16,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 02:46:18,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1380.95 | bwd_inner_microstep: 1380.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916 [2024-06-11 02:46:19,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.87 | bwd_microstep: 717.60 | bwd_inner_microstep: 717.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3502 [2024-06-11 02:46:21,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.57 | bwd_microstep: 1187.96 | bwd_inner_microstep: 1187.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907 [2024-06-11 02:46:22,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.69 | bwd_microstep: 777.42 | bwd_inner_microstep: 777.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-11 02:46:24,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.33 | bwd_microstep: 1346.89 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2950 [2024-06-11 02:46:26,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.66 | bwd_microstep: 1195.46 | bwd_inner_microstep: 1195.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710 [2024-06-11 02:46:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.48 | bwd_microstep: 1520.01 | bwd_inner_microstep: 1519.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-11 02:46:30,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.90 | bwd_microstep: 1511.86 | bwd_inner_microstep: 1511.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 02:46:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.77 | bwd_microstep: 1295.22 | bwd_inner_microstep: 1295.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3501 [2024-06-11 02:46:34,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.58 | bwd_microstep: 1576.27 | bwd_inner_microstep: 1576.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2125 [2024-06-11 02:46:35,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.73 | bwd_microstep: 828.96 | bwd_inner_microstep: 828.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-11 02:46:37,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1478.89 | bwd_inner_microstep: 1478.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3677 [2024-06-11 02:46:39,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.19 | bwd_microstep: 1453.86 | bwd_inner_microstep: 1453.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473 [2024-06-11 02:46:41,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.75 | bwd_microstep: 1185.92 | bwd_inner_microstep: 1185.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2212 [2024-06-11 02:46:42,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.35 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3644 [2024-06-11 02:46:44,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.27 | bwd_microstep: 1616.60 | bwd_inner_microstep: 1616.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715 [2024-06-11 02:46:46,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1432.80 | bwd_inner_microstep: 1432.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2022 [2024-06-11 02:46:47,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.32 | bwd_microstep: 745.01 | bwd_inner_microstep: 744.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 02:46:49,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-11 02:46:50,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.98 | bwd_microstep: 916.45 | bwd_inner_microstep: 916.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553 [2024-06-11 02:46:52,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.50 | bwd_microstep: 1330.21 | bwd_inner_microstep: 1330.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805 [2024-06-11 02:46:54,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.93 | bwd_microstep: 1553.57 | bwd_inner_microstep: 1553.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-11 02:46:56,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.40 | bwd_microstep: 1359.43 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487 [2024-06-11 02:46:58,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.52 | bwd_microstep: 1346.65 | bwd_inner_microstep: 1346.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3597 [2024-06-11 02:47:00,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.68 | bwd_microstep: 1706.26 | bwd_inner_microstep: 1706.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229 [2024-06-11 02:47:07,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.10 | optimizer_step: 6.59 [2024-06-11 02:47:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.59 | bwd_microstep: 6307.21 | bwd_inner_microstep: 1088.77 | bwd_allreduce_microstep: 5218.38 | step_microstep: 37.94 [2024-06-11 02:47:07,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15323.19 | bwd: 46229.61 | bwd_inner: 41010.32 | bwd_allreduce: 5218.61 | step: 39.46 {'loss': 1.1947, 'learning_rate': 1.7566746300613325e-06, 'epoch': 0.87} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474 [2024-06-11 02:47:09,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.53 | bwd_microstep: 1568.62 | bwd_inner_microstep: 1568.50 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 02:47:11,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.10 | bwd_microstep: 1242.79 | bwd_inner_microstep: 1242.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2336 [2024-06-11 02:47:12,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.19 | bwd_microstep: 982.17 | bwd_inner_microstep: 982.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3463 [2024-06-11 02:47:14,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1237.60 | bwd_inner_microstep: 1237.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2175 [2024-06-11 02:47:15,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.22 | bwd_microstep: 852.50 | bwd_inner_microstep: 852.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-11 02:47:17,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.94 | bwd_microstep: 1248.02 | bwd_inner_microstep: 1247.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 02:47:19,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.27 | bwd_microstep: 1380.62 | bwd_inner_microstep: 1380.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-11 02:47:20,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.14 | bwd_microstep: 1283.85 | bwd_inner_microstep: 1283.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1977 [2024-06-11 02:47:22,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.76 | bwd_microstep: 827.43 | bwd_inner_microstep: 827.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 02:47:23,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 787.63 | bwd_inner_microstep: 787.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3717 [2024-06-11 02:47:25,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1475.51 | bwd_inner_microstep: 1475.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-11 02:47:27,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1301.05 | bwd_inner_microstep: 1301.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2404 [2024-06-11 02:47:28,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.76 | bwd_microstep: 1036.37 | bwd_inner_microstep: 1036.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3427 [2024-06-11 02:47:30,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.02 | bwd_microstep: 1369.69 | bwd_inner_microstep: 1369.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3636 [2024-06-11 02:47:32,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.63 | bwd_microstep: 1603.85 | bwd_inner_microstep: 1603.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3670 [2024-06-11 02:47:34,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.50 | bwd_microstep: 1625.04 | bwd_inner_microstep: 1625.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 02:47:36,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.49 | bwd_microstep: 1339.73 | bwd_inner_microstep: 1339.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3449 [2024-06-11 02:47:38,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1300.80 | bwd_inner_microstep: 1300.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2419 [2024-06-11 02:47:39,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.95 | bwd_microstep: 939.14 | bwd_inner_microstep: 939.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3830 [2024-06-11 02:47:41,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.90 | bwd_microstep: 1460.97 | bwd_inner_microstep: 1460.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3900 [2024-06-11 02:47:43,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.70 | bwd_microstep: 1395.96 | bwd_inner_microstep: 1395.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3684 [2024-06-11 02:47:45,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1234.20 | bwd_inner_microstep: 1234.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 02:47:47,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1355.16 | bwd_inner_microstep: 1355.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 02:47:49,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1393.50 | bwd_inner_microstep: 1393.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-11 02:47:50,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.63 | bwd_microstep: 711.08 | bwd_inner_microstep: 711.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 02:47:52,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1396.20 | bwd_inner_microstep: 1396.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3814 [2024-06-11 02:47:54,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.97 | bwd_microstep: 1581.79 | bwd_inner_microstep: 1581.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-11 02:47:56,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.27 | bwd_microstep: 1348.44 | bwd_inner_microstep: 1348.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-11 02:47:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 1304.44 | bwd_inner_microstep: 1304.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2071 [2024-06-11 02:47:59,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.43 | bwd_microstep: 913.35 | bwd_inner_microstep: 913.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3483 [2024-06-11 02:48:01,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.59 | bwd_microstep: 1343.90 | bwd_inner_microstep: 1343.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2261 [2024-06-11 02:48:10,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.21 | optimizer_step: 6.62 [2024-06-11 02:48:10,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.66 | bwd_microstep: 9228.76 | bwd_inner_microstep: 1102.37 | bwd_allreduce_microstep: 8126.32 | step_microstep: 38.77 [2024-06-11 02:48:10,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14937.80 | bwd: 48070.22 | bwd_inner: 39942.87 | bwd_allreduce: 8126.62 | step: 40.36 {'loss': 1.1484, 'learning_rate': 1.7413245848740734e-06, 'epoch': 0.87} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3499 [2024-06-11 02:48:12,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1498.34 | bwd_inner_microstep: 1498.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 02:48:14,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1393.02 | bwd_inner_microstep: 1392.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-11 02:48:16,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.86 | bwd_microstep: 1143.37 | bwd_inner_microstep: 1143.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3881 [2024-06-11 02:48:18,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.34 | bwd_microstep: 1443.86 | bwd_inner_microstep: 1443.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1937 [2024-06-11 02:48:19,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.69 | bwd_microstep: 759.28 | bwd_inner_microstep: 759.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3485 [2024-06-11 02:48:21,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.96 | bwd_microstep: 1216.67 | bwd_inner_microstep: 1216.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615 [2024-06-11 02:48:22,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.62 | bwd_microstep: 1311.94 | bwd_inner_microstep: 1311.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420 [2024-06-11 02:48:24,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.79 | bwd_microstep: 1152.26 | bwd_inner_microstep: 1152.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3713 [2024-06-11 02:48:26,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.13 | bwd_microstep: 1528.07 | bwd_inner_microstep: 1528.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-11 02:48:28,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.72 | bwd_microstep: 1634.35 | bwd_inner_microstep: 1634.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 02:48:30,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1479.86 | bwd_inner_microstep: 1479.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 02:48:32,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.94 | bwd_microstep: 1482.64 | bwd_inner_microstep: 1482.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3717 [2024-06-11 02:48:35,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1560.33 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3515 [2024-06-11 02:48:37,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1410.83 | bwd_inner_microstep: 1410.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1946 [2024-06-11 02:48:38,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.29 | bwd_microstep: 730.40 | bwd_inner_microstep: 730.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-11 02:48:39,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.08 | bwd_microstep: 1398.57 | bwd_inner_microstep: 1398.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3951 [2024-06-11 02:48:42,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.96 | bwd_microstep: 1670.58 | bwd_inner_microstep: 1670.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640 [2024-06-11 02:48:44,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.47 | bwd_microstep: 1579.79 | bwd_inner_microstep: 1579.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 02:48:46,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1396.16 | bwd_inner_microstep: 1396.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-11 02:48:48,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668 [2024-06-11 02:48:50,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.46 | bwd_microstep: 1421.33 | bwd_inner_microstep: 1421.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 02:48:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.51 | bwd_microstep: 1410.99 | bwd_inner_microstep: 1410.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756 [2024-06-11 02:48:54,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.43 | bwd_microstep: 1546.94 | bwd_inner_microstep: 1546.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729 [2024-06-11 02:48:56,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.30 | bwd_microstep: 1533.35 | bwd_inner_microstep: 1533.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 02:48:58,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.52 | bwd_microstep: 1252.47 | bwd_inner_microstep: 1252.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 02:49:00,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.75 | bwd_microstep: 1555.87 | bwd_inner_microstep: 1555.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 02:49:02,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1375.82 | bwd_inner_microstep: 1375.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3560 [2024-06-11 02:49:04,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-11 02:49:06,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.94 | bwd_inner_microstep: 1479.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820 [2024-06-11 02:49:09,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.15 | bwd_microstep: 1856.48 | bwd_inner_microstep: 1856.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3807 [2024-06-11 02:49:11,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.70 | bwd_microstep: 1820.45 | bwd_inner_microstep: 1820.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3849 [2024-06-11 02:49:13,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.05 | optimizer_step: 6.61 [2024-06-11 02:49:13,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1536.21 | bwd_inner_microstep: 1528.46 | bwd_allreduce_microstep: 7.70 | step_microstep: 37.51 [2024-06-11 02:49:13,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16985.83 | bwd: 45564.46 | bwd_inner: 45555.83 | bwd_allreduce: 7.92 | step: 39.07 {'loss': 1.1815, 'learning_rate': 1.726038846646707e-06, 'epoch': 0.87} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-11 02:49:15,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1337.71 | bwd_inner_microstep: 1337.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3602 [2024-06-11 02:49:17,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1242.11 | bwd_inner_microstep: 1242.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 02:49:18,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.19 | bwd_microstep: 1246.58 | bwd_inner_microstep: 1246.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-11 02:49:20,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1345.08 | bwd_inner_microstep: 1345.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-11 02:49:22,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.97 | bwd_microstep: 1445.74 | bwd_inner_microstep: 1445.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3779 [2024-06-11 02:49:24,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-11 02:49:25,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.19 | bwd_microstep: 856.62 | bwd_inner_microstep: 856.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 02:49:27,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.40 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-11 02:49:28,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.96 | bwd_microstep: 796.46 | bwd_inner_microstep: 796.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2052 [2024-06-11 02:49:29,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.20 | bwd_microstep: 816.14 | bwd_inner_microstep: 816.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 02:49:32,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1497.78 | bwd_inner_microstep: 1497.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402 [2024-06-11 02:49:33,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.20 | bwd_microstep: 1366.48 | bwd_inner_microstep: 1366.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3498 [2024-06-11 02:49:35,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.27 | bwd_microstep: 1429.35 | bwd_inner_microstep: 1429.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099 [2024-06-11 02:49:37,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.77 | bwd_microstep: 920.88 | bwd_inner_microstep: 920.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-11 02:49:39,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1348.92 | bwd_inner_microstep: 1348.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 02:49:40,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.94 | bwd_microstep: 1280.01 | bwd_inner_microstep: 1279.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 02:49:42,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.41 | bwd_microstep: 1487.39 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-11 02:49:44,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1556.91 | bwd_inner_microstep: 1556.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 02:49:46,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.50 | bwd_microstep: 1287.31 | bwd_inner_microstep: 1287.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 02:49:48,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.79 | bwd_microstep: 1412.94 | bwd_inner_microstep: 1412.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 02:49:50,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.40 | bwd_microstep: 1396.74 | bwd_inner_microstep: 1396.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3517 [2024-06-11 02:49:52,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.16 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3699 [2024-06-11 02:49:54,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.79 | bwd_microstep: 1333.28 | bwd_inner_microstep: 1333.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 02:49:56,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.56 | bwd_microstep: 1656.94 | bwd_inner_microstep: 1656.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521 [2024-06-11 02:49:58,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.04 | bwd_microstep: 1323.83 | bwd_inner_microstep: 1323.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-11 02:50:00,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.56 | bwd_microstep: 1432.72 | bwd_inner_microstep: 1432.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2276 [2024-06-11 02:50:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.75 | bwd_microstep: 1071.71 | bwd_inner_microstep: 1071.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3595 [2024-06-11 02:50:03,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.50 | bwd_microstep: 1554.49 | bwd_inner_microstep: 1554.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-11 02:50:05,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1508.88 | bwd_inner_microstep: 1508.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2221 [2024-06-11 02:50:07,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.55 | bwd_microstep: 862.02 | bwd_inner_microstep: 862.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 02:50:08,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.38 | bwd_microstep: 1253.28 | bwd_inner_microstep: 1253.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2273 [2024-06-11 02:50:13,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-11 02:50:13,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.05 | bwd_microstep: 4120.99 | bwd_inner_microstep: 1104.01 | bwd_allreduce_microstep: 3016.93 | step_microstep: 38.16 [2024-06-11 02:50:13,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15355.61 | bwd: 44060.41 | bwd_inner: 41042.56 | bwd_allreduce: 3017.16 | step: 39.73 87%|████████▋ | 1499/1726 [26:07:40<4:00:37, 63.60s/it] 87%|████████▋ | 1500/1726 [26:08:42<3:57:16, 62.99s/it] 87%|████████▋ | 1500/1726 [26:08:42<3:57:16, 62.99s/it] 87%|████████▋ | 1501/1726 [26:09:44<3:54:58, 62.66s/it] 87%|████████▋ | 1501/1726 [26:09:44<3:54:58, 62.66s/it] 87%|████████▋ | 1502/1726 [26:10:47<3:54:41, 62.87s/it] 87%|████████▋ | 1502/1726 [26:10:47<3:54:41, 62.87s/it] 87%|████████▋ | 1503/1726 [26:11:50<3:53:40, 62.87s/it] 87%|████████▋ | 1503/1726 [26:11:50<3:53:40, 62.87s/it] 87%|████████▋ | 1504/1726 [26:12:50<3:49:10, 61.94s/it] {'loss': 1.1971, 'learning_rate': 1.7108174692155266e-06, 'epoch': 0.87} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3379 [2024-06-11 02:50:14,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.89 | bwd_microstep: 1141.12 | bwd_inner_microstep: 1141.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 02:50:16,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.71 | bwd_microstep: 1241.55 | bwd_inner_microstep: 1241.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3866 [2024-06-11 02:50:18,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.26 | bwd_microstep: 1659.41 | bwd_inner_microstep: 1659.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-11 02:50:21,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.89 | bwd_microstep: 1556.42 | bwd_inner_microstep: 1556.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789 [2024-06-11 02:50:23,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.05 | bwd_microstep: 1454.36 | bwd_inner_microstep: 1454.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-11 02:50:24,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.20 | bwd_microstep: 1251.14 | bwd_inner_microstep: 1251.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 02:50:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.31 | bwd_microstep: 1387.39 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-11 02:50:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.02 | bwd_microstep: 1344.82 | bwd_inner_microstep: 1344.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-11 02:50:29,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.81 | bwd_microstep: 800.37 | bwd_inner_microstep: 800.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-11 02:50:31,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.97 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 02:50:33,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.45 | bwd_microstep: 1253.12 | bwd_inner_microstep: 1253.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2001 [2024-06-11 02:50:34,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.03 | bwd_microstep: 831.29 | bwd_inner_microstep: 831.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2092 [2024-06-11 02:50:35,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.62 | bwd_microstep: 1015.53 | bwd_inner_microstep: 1015.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3649 [2024-06-11 02:50:38,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.98 | bwd_microstep: 1517.15 | bwd_inner_microstep: 1517.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3514 [2024-06-11 02:50:40,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1581.11 | bwd_inner_microstep: 1581.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2008 [2024-06-11 02:50:41,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.91 | bwd_microstep: 738.62 | bwd_inner_microstep: 738.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 02:50:43,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 1392.96 | bwd_inner_microstep: 1392.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3539 [2024-06-11 02:50:45,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1426.26 | bwd_inner_microstep: 1426.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-11 02:50:46,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.77 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 02:50:48,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.26 | bwd_microstep: 1460.22 | bwd_inner_microstep: 1460.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 02:50:50,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1396.24 | bwd_inner_microstep: 1396.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-11 02:50:51,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.50 | bwd_microstep: 1152.53 | bwd_inner_microstep: 1152.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 02:50:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1358.21 | bwd_inner_microstep: 1358.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-11 02:50:55,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1550.17 | bwd_inner_microstep: 1550.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-11 02:50:57,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.95 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3568 [2024-06-11 02:50:59,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1360.63 | bwd_inner_microstep: 1360.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2233 [2024-06-11 02:51:01,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.59 | bwd_microstep: 963.23 | bwd_inner_microstep: 963.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2474 [2024-06-11 02:51:02,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.46 | bwd_microstep: 1124.36 | bwd_inner_microstep: 1124.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-11 02:51:04,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.89 | bwd_microstep: 1750.03 | bwd_inner_microstep: 1750.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-11 02:51:06,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.17 | bwd_microstep: 976.79 | bwd_inner_microstep: 976.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3815 [2024-06-11 02:51:08,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1355.82 | bwd_inner_microstep: 1355.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-11 02:51:14,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.23 | optimizer_step: 6.56 [2024-06-11 02:51:14,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.41 | bwd_microstep: 6089.66 | bwd_inner_microstep: 994.50 | bwd_allreduce_microstep: 5095.09 | step_microstep: 39.05 [2024-06-11 02:51:14,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15185.17 | bwd: 45764.10 | bwd_inner: 40667.99 | bwd_allreduce: 5095.37 | step: 40.69 {'loss': 1.182, 'learning_rate': 1.6956605061901377e-06, 'epoch': 0.87} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 02:51:16,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.10 | bwd_microstep: 1466.87 | bwd_inner_microstep: 1466.72 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3946 [2024-06-11 02:51:18,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.88 | bwd_microstep: 1594.22 | bwd_inner_microstep: 1594.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 02:51:20,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.73 | bwd_microstep: 1341.10 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 02:51:23,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.66 | bwd_microstep: 1650.02 | bwd_inner_microstep: 1649.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 02:51:24,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1239.70 | bwd_inner_microstep: 1239.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2205 [2024-06-11 02:51:25,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.66 | bwd_microstep: 859.13 | bwd_inner_microstep: 859.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-11 02:51:26,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.80 | bwd_microstep: 676.34 | bwd_inner_microstep: 676.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 02:51:28,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.59 | bwd_microstep: 1299.92 | bwd_inner_microstep: 1299.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 02:51:30,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1247.27 | bwd_inner_microstep: 1247.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 02:51:32,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1251.07 | bwd_inner_microstep: 1251.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1954 [2024-06-11 02:51:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.51 | bwd_microstep: 732.18 | bwd_inner_microstep: 732.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2101 [2024-06-11 02:51:34,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.71 | bwd_microstep: 822.01 | bwd_inner_microstep: 821.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 02:51:36,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1376.83 | bwd_inner_microstep: 1376.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-11 02:51:37,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1242.73 | bwd_inner_microstep: 1242.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3641 [2024-06-11 02:51:40,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.62 | bwd_microstep: 1542.43 | bwd_inner_microstep: 1542.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911 [2024-06-11 02:51:41,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.50 | bwd_microstep: 686.33 | bwd_inner_microstep: 686.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3672 [2024-06-11 02:51:42,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1232.05 | bwd_inner_microstep: 1232.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 02:51:44,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.82 | bwd_microstep: 1394.93 | bwd_inner_microstep: 1394.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 02:51:46,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.89 | bwd_microstep: 1611.73 | bwd_inner_microstep: 1611.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-11 02:51:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.44 | bwd_microstep: 1556.54 | bwd_inner_microstep: 1556.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 02:51:50,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1392.81 | bwd_inner_microstep: 1392.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034 [2024-06-11 02:51:51,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.09 | bwd_microstep: 715.35 | bwd_inner_microstep: 715.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3597 [2024-06-11 02:51:54,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1643.65 | bwd_inner_microstep: 1643.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3443 [2024-06-11 02:51:56,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.99 | bwd_microstep: 1355.64 | bwd_inner_microstep: 1355.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 02:51:58,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.24 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3751 [2024-06-11 02:52:00,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.62 | bwd_microstep: 1378.56 | bwd_inner_microstep: 1378.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3806 [2024-06-11 02:52:02,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.64 | bwd_microstep: 1483.34 | bwd_inner_microstep: 1483.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3721 [2024-06-11 02:52:04,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.08 | bwd_microstep: 1334.29 | bwd_inner_microstep: 1334.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765 [2024-06-11 02:52:06,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.00 | bwd_microstep: 1636.45 | bwd_inner_microstep: 1636.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2274 [2024-06-11 02:52:07,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.42 | bwd_microstep: 814.59 | bwd_inner_microstep: 814.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3803 [2024-06-11 02:52:09,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.22 | bwd_microstep: 1668.75 | bwd_inner_microstep: 1668.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3568 [2024-06-11 02:52:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 02:52:15,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.46 | bwd_microstep: 4827.08 | bwd_inner_microstep: 2015.20 | bwd_allreduce_microstep: 2811.82 | step_microstep: 37.64 [2024-06-11 02:52:15,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15493.48 | bwd: 44629.20 | bwd_inner: 41816.37 | bwd_allreduce: 2812.11 | step: 39.21 {'loss': 1.1976, 'learning_rate': 1.6805680109532962e-06, 'epoch': 0.87} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 02:52:17,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.32 | bwd_microstep: 1365.78 | bwd_inner_microstep: 1365.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-11 02:52:18,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.61 | bwd_microstep: 1150.48 | bwd_inner_microstep: 1150.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 02:52:20,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.93 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2643 [2024-06-11 02:52:22,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.54 | bwd_microstep: 1150.79 | bwd_inner_microstep: 1150.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-11 02:52:23,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 790.78 | bwd_inner_microstep: 790.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3788 [2024-06-11 02:52:25,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1451.41 | bwd_inner_microstep: 1451.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-11 02:52:26,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.97 | bwd_microstep: 797.40 | bwd_inner_microstep: 797.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 02:52:28,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1249.80 | bwd_inner_microstep: 1249.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 02:52:29,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.67 | bwd_microstep: 1389.28 | bwd_inner_microstep: 1389.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-11 02:52:31,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.48 | bwd_microstep: 1158.76 | bwd_inner_microstep: 1158.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507 [2024-06-11 02:52:33,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.38 | bwd_microstep: 1317.49 | bwd_inner_microstep: 1317.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 02:52:35,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.50 | bwd_microstep: 1373.64 | bwd_inner_microstep: 1373.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3410 [2024-06-11 02:52:37,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.49 | bwd_microstep: 1438.75 | bwd_inner_microstep: 1438.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655 [2024-06-11 02:52:39,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.17 | bwd_microstep: 1612.90 | bwd_inner_microstep: 1612.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2744 [2024-06-11 02:52:41,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.67 | bwd_microstep: 1076.13 | bwd_inner_microstep: 1076.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3513 [2024-06-11 02:52:43,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.88 | bwd_microstep: 1448.51 | bwd_inner_microstep: 1448.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3515 [2024-06-11 02:52:44,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.42 | bwd_microstep: 1252.00 | bwd_inner_microstep: 1251.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3518 [2024-06-11 02:52:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.27 | bwd_microstep: 1440.06 | bwd_inner_microstep: 1440.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855 [2024-06-11 02:52:48,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.47 | bwd_microstep: 1563.46 | bwd_inner_microstep: 1563.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3552 [2024-06-11 02:52:50,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.31 | bwd_microstep: 1233.91 | bwd_inner_microstep: 1233.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 02:52:52,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.93 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-11 02:52:54,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.91 | bwd_microstep: 1319.38 | bwd_inner_microstep: 1319.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-11 02:52:56,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.89 | bwd_microstep: 1461.30 | bwd_inner_microstep: 1461.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956 [2024-06-11 02:52:57,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.01 | bwd_microstep: 702.96 | bwd_inner_microstep: 702.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444 [2024-06-11 02:52:59,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.09 | bwd_microstep: 1299.92 | bwd_inner_microstep: 1299.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-11 02:53:01,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1496.55 | bwd_inner_microstep: 1496.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2070 [2024-06-11 02:53:02,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.51 | bwd_microstep: 850.24 | bwd_inner_microstep: 850.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805 [2024-06-11 02:53:04,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.02 | bwd_microstep: 1550.76 | bwd_inner_microstep: 1550.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734 [2024-06-11 02:53:06,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.98 | bwd_microstep: 1431.15 | bwd_inner_microstep: 1431.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3689 [2024-06-11 02:53:08,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.56 | bwd_microstep: 1521.89 | bwd_inner_microstep: 1521.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3602 [2024-06-11 02:53:10,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778 [2024-06-11 02:53:17,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.10 | optimizer_step: 6.61 [2024-06-11 02:53:17,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 6562.55 | bwd_inner_microstep: 1874.84 | bwd_allreduce_microstep: 4687.65 | step_microstep: 37.97 [2024-06-11 02:53:17,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15682.12 | bwd: 46738.45 | bwd_inner: 42049.89 | bwd_allreduce: 4687.88 | step: 39.54 {'loss': 1.1316, 'learning_rate': 1.6655400366606867e-06, 'epoch': 0.87} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3470 [2024-06-11 02:53:20,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.82 | bwd_microstep: 1562.46 | bwd_inner_microstep: 1562.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3903 [2024-06-11 02:53:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.34 | bwd_microstep: 1680.78 | bwd_inner_microstep: 1680.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3849 [2024-06-11 02:53:24,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.23 | bwd_microstep: 1658.94 | bwd_inner_microstep: 1658.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:53:26,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.97 | bwd_microstep: 1380.08 | bwd_inner_microstep: 1380.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 02:53:28,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.39 | bwd_microstep: 1246.49 | bwd_inner_microstep: 1246.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 02:53:30,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.01 | bwd_microstep: 1481.86 | bwd_inner_microstep: 1481.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 02:53:32,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.57 | bwd_microstep: 1283.43 | bwd_inner_microstep: 1283.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 02:53:33,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.04 | bwd_microstep: 1282.74 | bwd_inner_microstep: 1282.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 02:53:35,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1246.65 | bwd_inner_microstep: 1246.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649 [2024-06-11 02:53:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.91 | bwd_microstep: 1423.03 | bwd_inner_microstep: 1423.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 02:53:39,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1285.65 | bwd_inner_microstep: 1285.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-11 02:53:41,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.27 | bwd_microstep: 1321.02 | bwd_inner_microstep: 1320.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 02:53:43,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.20 | bwd_microstep: 1486.27 | bwd_inner_microstep: 1486.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-11 02:53:45,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.04 | bwd_microstep: 1348.02 | bwd_inner_microstep: 1347.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3703 [2024-06-11 02:53:47,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.69 | bwd_microstep: 1654.71 | bwd_inner_microstep: 1654.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 02:53:49,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.70 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-11 02:53:51,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1512.34 | bwd_inner_microstep: 1512.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 02:53:52,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1252.43 | bwd_inner_microstep: 1252.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 02:53:54,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1349.73 | bwd_inner_microstep: 1349.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1925 [2024-06-11 02:53:55,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.87 | bwd_microstep: 726.75 | bwd_inner_microstep: 726.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-11 02:53:57,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.12 | bwd_microstep: 1524.58 | bwd_inner_microstep: 1524.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-11 02:53:59,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.14 | bwd_microstep: 1295.90 | bwd_inner_microstep: 1295.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 02:54:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.68 | bwd_microstep: 1298.24 | bwd_inner_microstep: 1298.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-11 02:54:03,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.61 | bwd_microstep: 1314.95 | bwd_inner_microstep: 1314.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3439 [2024-06-11 02:54:05,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1299.02 | bwd_inner_microstep: 1298.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3563 [2024-06-11 02:54:06,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1264.63 | bwd_inner_microstep: 1264.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3834 [2024-06-11 02:54:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.96 | bwd_microstep: 1753.53 | bwd_inner_microstep: 1753.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2227 [2024-06-11 02:54:10,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.69 | bwd_microstep: 863.08 | bwd_inner_microstep: 863.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3835 [2024-06-11 02:54:12,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.28 | bwd_microstep: 1690.44 | bwd_inner_microstep: 1690.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3801 [2024-06-11 02:54:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.40 | bwd_microstep: 1750.64 | bwd_inner_microstep: 1750.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-11 02:54:17,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.25 | bwd_microstep: 1406.80 | bwd_inner_microstep: 1406.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2015 [2024-06-11 02:54:19,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.02 | optimizer_step: 6.63 [2024-06-11 02:54:19,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 1898.40 | bwd_inner_microstep: 991.03 | bwd_allreduce_microstep: 907.32 | step_microstep: 37.32 [2024-06-11 02:54:19,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16381.50 | bwd: 44826.70 | bwd_inner: 43918.49 | bwd_allreduce: 907.55 | step: 38.74 {'loss': 1.1546, 'learning_rate': 1.6505766362407571e-06, 'epoch': 0.87} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-11 02:54:21,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1470.43 | bwd_inner_microstep: 1470.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 02:54:23,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1380.69 | bwd_inner_microstep: 1380.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2384 [2024-06-11 02:54:24,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.56 | bwd_microstep: 997.74 | bwd_inner_microstep: 997.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823 [2024-06-11 02:54:26,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.30 | bwd_microstep: 1483.56 | bwd_inner_microstep: 1483.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3803 [2024-06-11 02:54:29,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1650.68 | bwd_inner_microstep: 1650.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4179 [2024-06-11 02:54:31,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.49 | bwd_microstep: 1483.53 | bwd_inner_microstep: 1483.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-11 02:54:33,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.18 | bwd_microstep: 1638.27 | bwd_inner_microstep: 1638.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-11 02:54:35,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.62 | bwd_microstep: 1542.49 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178 [2024-06-11 02:54:36,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.22 | bwd_microstep: 950.75 | bwd_inner_microstep: 950.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 02:54:38,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.64 | bwd_microstep: 1252.56 | bwd_inner_microstep: 1252.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 02:54:40,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.36 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-11 02:54:42,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.76 | bwd_microstep: 1278.43 | bwd_inner_microstep: 1278.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3665 [2024-06-11 02:54:44,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1403.95 | bwd_inner_microstep: 1403.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-11 02:54:46,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1342.86 | bwd_inner_microstep: 1342.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2930 [2024-06-11 02:54:47,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.23 | bwd_microstep: 1240.05 | bwd_inner_microstep: 1240.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-11 02:54:48,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.97 | bwd_microstep: 792.42 | bwd_inner_microstep: 792.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 02:54:50,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1252.22 | bwd_inner_microstep: 1252.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012 [2024-06-11 02:54:51,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.03 | bwd_microstep: 835.41 | bwd_inner_microstep: 835.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3461 [2024-06-11 02:54:53,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.51 | bwd_microstep: 1325.73 | bwd_inner_microstep: 1325.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3538 [2024-06-11 02:54:55,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.77 | bwd_microstep: 1356.28 | bwd_inner_microstep: 1356.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 02:54:57,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1660.20 | bwd_inner_microstep: 1660.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-11 02:54:59,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.43 | bwd_microstep: 1433.68 | bwd_inner_microstep: 1433.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692 [2024-06-11 02:55:01,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1432.08 | bwd_inner_microstep: 1432.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3812 [2024-06-11 02:55:03,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1402.53 | bwd_inner_microstep: 1402.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 02:55:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.80 | bwd_microstep: 1657.56 | bwd_inner_microstep: 1657.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 02:55:07,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1392.83 | bwd_inner_microstep: 1392.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570 [2024-06-11 02:55:09,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1528.28 | bwd_inner_microstep: 1528.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3694 [2024-06-11 02:55:11,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.85 | bwd_microstep: 1325.77 | bwd_inner_microstep: 1325.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-11 02:55:13,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.67 | bwd_microstep: 1405.82 | bwd_inner_microstep: 1405.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3750 [2024-06-11 02:55:16,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.81 | bwd_microstep: 1840.94 | bwd_inner_microstep: 1840.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-11 02:55:18,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1493.56 | bwd_inner_microstep: 1493.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-11 02:55:21,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.01 | optimizer_step: 6.62 [2024-06-11 02:55:21,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.18 | bwd_microstep: 3050.97 | bwd_inner_microstep: 1665.67 | bwd_allreduce_microstep: 1385.25 | step_microstep: 37.52 [2024-06-11 02:55:21,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16498.64 | bwd: 45692.02 | bwd_inner: 44305.84 | bwd_allreduce: 1385.48 | step: 39.02 87%|████████▋ | 1504/1726 [26:12:50<3:49:10, 61.94s/it] 87%|████████▋ | 1505/1726 [26:13:51<3:47:24, 61.74s/it] 87%|████████▋ | 1505/1726 [26:13:51<3:47:24, 61.74s/it] 87%|████████▋ | 1506/1726 [26:14:51<3:44:57, 61.35s/it] 87%|████████▋ | 1506/1726 [26:14:51<3:44:57, 61.35s/it] 87%|████████▋ | 1507/1726 [26:15:54<3:45:28, 61.77s/it] 87%|████████▋ | 1507/1726 [26:15:54<3:45:28, 61.77s/it] 87%|████████▋ | 1508/1726 [26:16:56<3:44:11, 61.70s/it] 87%|████████▋ | 1508/1726 [26:16:56<3:44:11, 61.70s/it] 87%|████████▋ | 1509/1726 [26:17:58<3:44:03, {'loss': 1.2004, 'learning_rate': 1.6356778623945223e-06, 'epoch': 0.87} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 02:55:23,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.45 | bwd_microstep: 1372.21 | bwd_inner_microstep: 1372.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4205 [2024-06-11 02:55:26,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.77 | bwd_microstep: 1751.27 | bwd_inner_microstep: 1751.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-11 02:55:28,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.30 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3778 [2024-06-11 02:55:30,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1639.99 | bwd_inner_microstep: 1639.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1964 [2024-06-11 02:55:31,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.06 | bwd_microstep: 794.06 | bwd_inner_microstep: 794.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-11 02:55:33,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.14 | bwd_microstep: 1214.33 | bwd_inner_microstep: 1214.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 02:55:35,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.65 | bwd_microstep: 1384.84 | bwd_inner_microstep: 1384.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3486 [2024-06-11 02:55:37,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1332.33 | bwd_inner_microstep: 1332.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 02:55:39,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.01 | bwd_microstep: 1388.16 | bwd_inner_microstep: 1388.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-11 02:55:41,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.21 | bwd_microstep: 1310.82 | bwd_inner_microstep: 1310.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3675 [2024-06-11 02:55:43,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.20 | bwd_microstep: 1615.18 | bwd_inner_microstep: 1615.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500 [2024-06-11 02:55:45,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.26 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 02:55:47,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.86 | bwd_inner_microstep: 1485.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431 [2024-06-11 02:55:49,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1397.49 | bwd_inner_microstep: 1397.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-11 02:55:51,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1709.34 | bwd_inner_microstep: 1709.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391 [2024-06-11 02:55:53,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.75 | bwd_microstep: 1337.50 | bwd_inner_microstep: 1337.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-11 02:55:55,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1513.87 | bwd_inner_microstep: 1513.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-11 02:55:57,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1510.72 | bwd_inner_microstep: 1510.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3630 [2024-06-11 02:55:59,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.98 | bwd_microstep: 1646.35 | bwd_inner_microstep: 1646.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2287 [2024-06-11 02:56:01,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 880.72 | bwd_inner_microstep: 880.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3516 [2024-06-11 02:56:03,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.29 | bwd_microstep: 1581.42 | bwd_inner_microstep: 1581.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-11 02:56:04,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.44 | bwd_microstep: 714.97 | bwd_inner_microstep: 714.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-11 02:56:06,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.74 | bwd_microstep: 1410.35 | bwd_inner_microstep: 1410.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3613 [2024-06-11 02:56:08,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1569.47 | bwd_inner_microstep: 1569.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-11 02:56:10,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1489.88 | bwd_inner_microstep: 1489.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 02:56:12,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 02:56:14,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1255.23 | bwd_inner_microstep: 1255.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2179 [2024-06-11 02:56:15,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.07 | bwd_microstep: 954.72 | bwd_inner_microstep: 954.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-11 02:56:17,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1389.43 | bwd_inner_microstep: 1389.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-11 02:56:19,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.03 | bwd_microstep: 1537.93 | bwd_inner_microstep: 1537.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 02:56:21,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.39 | bwd_microstep: 1554.48 | bwd_inner_microstep: 1554.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772 [2024-06-11 02:56:23,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.14 | optimizer_step: 6.62 [2024-06-11 02:56:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.66 | bwd_microstep: 1365.39 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 87.08 | step_microstep: 37.57 [2024-06-11 02:56:23,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16581.09 | bwd: 44542.11 | bwd_inner: 44454.13 | bwd_allreduce: 87.31 | step: 39.06 {'loss': 1.2242, 'learning_rate': 1.620843767595388e-06, 'epoch': 0.87} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1864 [2024-06-11 02:56:24,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.27 | bwd_microstep: 670.20 | bwd_inner_microstep: 670.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-11 02:56:25,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.27 | bwd_microstep: 788.23 | bwd_inner_microstep: 788.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862 [2024-06-11 02:56:27,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 1564.63 | bwd_inner_microstep: 1564.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1871 [2024-06-11 02:56:28,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.45 | bwd_microstep: 743.15 | bwd_inner_microstep: 743.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3564 [2024-06-11 02:56:30,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.67 | bwd_microstep: 1350.39 | bwd_inner_microstep: 1350.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 02:56:32,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.37 | bwd_microstep: 1386.37 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 02:56:34,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.16 | bwd_microstep: 1486.26 | bwd_inner_microstep: 1486.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 02:56:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1285.62 | bwd_inner_microstep: 1285.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-11 02:56:38,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1963 [2024-06-11 02:56:39,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.66 | bwd_microstep: 734.22 | bwd_inner_microstep: 734.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2008 [2024-06-11 02:56:40,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.60 | bwd_microstep: 741.89 | bwd_inner_microstep: 741.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3925 [2024-06-11 02:56:42,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.51 | bwd_microstep: 1430.26 | bwd_inner_microstep: 1430.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 02:56:43,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.33 | bwd_microstep: 1281.17 | bwd_inner_microstep: 1281.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3626 [2024-06-11 02:56:46,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.65 | bwd_microstep: 1538.87 | bwd_inner_microstep: 1538.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3671 [2024-06-11 02:56:47,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.33 | bwd_microstep: 1328.59 | bwd_inner_microstep: 1328.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3656 [2024-06-11 02:56:50,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1656.84 | bwd_inner_microstep: 1656.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 02:56:51,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.70 | bwd_microstep: 1255.46 | bwd_inner_microstep: 1255.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 02:56:53,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.36 | bwd_microstep: 1260.82 | bwd_inner_microstep: 1260.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2105 [2024-06-11 02:56:55,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.70 | bwd_microstep: 965.37 | bwd_inner_microstep: 965.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3526 [2024-06-11 02:56:57,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.87 | bwd_microstep: 1634.34 | bwd_inner_microstep: 1634.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1893 [2024-06-11 02:56:58,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.01 | bwd_microstep: 776.06 | bwd_inner_microstep: 776.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3819 [2024-06-11 02:57:00,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.33 | bwd_microstep: 1617.11 | bwd_inner_microstep: 1617.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-11 02:57:02,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1394.72 | bwd_inner_microstep: 1394.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1896 [2024-06-11 02:57:03,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 810.92 | bwd_inner_microstep: 810.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 02:57:05,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.44 | bwd_microstep: 1548.77 | bwd_inner_microstep: 1548.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3446 [2024-06-11 02:57:07,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.67 | bwd_microstep: 1286.79 | bwd_inner_microstep: 1286.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-11 02:57:09,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.56 | bwd_microstep: 1617.26 | bwd_inner_microstep: 1617.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 02:57:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.12 | bwd_microstep: 1650.22 | bwd_inner_microstep: 1650.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 899 [2024-06-11 02:57:12,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.03 | bwd_microstep: 371.53 | bwd_inner_microstep: 371.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3555 [2024-06-11 02:57:14,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.24 | bwd_microstep: 1203.52 | bwd_inner_microstep: 1203.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 02:57:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.49 | bwd_microstep: 1399.31 | bwd_inner_microstep: 1399.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2007 [2024-06-11 02:57:25,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-11 02:57:25,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.35 | bwd_microstep: 8521.41 | bwd_inner_microstep: 801.20 | bwd_allreduce_microstep: 7720.15 | step_microstep: 38.20 [2024-06-11 02:57:25,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14583.87 | bwd: 46695.97 | bwd_inner: 38974.87 | bwd_allreduce: 7720.39 | step: 39.67 {'loss': 1.1542, 'learning_rate': 1.606074404088962e-06, 'epoch': 0.88} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389 [2024-06-11 02:57:26,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.59 | bwd_microstep: 1295.19 | bwd_inner_microstep: 1295.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3943 [2024-06-11 02:57:28,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.08 | bwd_microstep: 1590.12 | bwd_inner_microstep: 1590.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-11 02:57:31,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.32 | bwd_microstep: 1555.09 | bwd_inner_microstep: 1555.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2884 [2024-06-11 02:57:32,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.60 | bwd_microstep: 1182.50 | bwd_inner_microstep: 1182.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3784 [2024-06-11 02:57:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.28 | bwd_microstep: 1645.30 | bwd_inner_microstep: 1645.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 02:57:36,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-11 02:57:38,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1293.82 | bwd_inner_microstep: 1293.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 02:57:40,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.46 | bwd_microstep: 1384.96 | bwd_inner_microstep: 1384.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3484 [2024-06-11 02:57:42,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.63 | bwd_microstep: 1311.24 | bwd_inner_microstep: 1311.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-11 02:57:44,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.96 | bwd_microstep: 1337.96 | bwd_inner_microstep: 1337.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3708 [2024-06-11 02:57:46,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.71 | bwd_microstep: 1718.89 | bwd_inner_microstep: 1718.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3420 [2024-06-11 02:57:48,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1357.56 | bwd_inner_microstep: 1357.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3119 [2024-06-11 02:57:50,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.63 | bwd_microstep: 1247.54 | bwd_inner_microstep: 1247.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-11 02:57:52,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.99 | bwd_microstep: 1623.08 | bwd_inner_microstep: 1623.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 02:57:54,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1345.69 | bwd_inner_microstep: 1345.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-11 02:57:56,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1189.87 | bwd_inner_microstep: 1189.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3473 [2024-06-11 02:57:57,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.44 | bwd_microstep: 1182.81 | bwd_inner_microstep: 1182.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565 [2024-06-11 02:57:59,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1333.15 | bwd_inner_microstep: 1333.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 02:58:01,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.61 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-11 02:58:02,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.96 | bwd_microstep: 696.07 | bwd_inner_microstep: 696.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-11 02:58:03,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.46 | bwd_microstep: 1188.93 | bwd_inner_microstep: 1188.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 02:58:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1488.33 | bwd_inner_microstep: 1488.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 02:58:07,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.53 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 02:58:10,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.33 | bwd_microstep: 1656.63 | bwd_inner_microstep: 1656.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-11 02:58:12,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.42 | bwd_microstep: 1559.20 | bwd_inner_microstep: 1559.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477 [2024-06-11 02:58:13,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.08 | bwd_microstep: 1184.51 | bwd_inner_microstep: 1184.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-11 02:58:15,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1300.93 | bwd_inner_microstep: 1300.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-11 02:58:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.09 | bwd_microstep: 1304.24 | bwd_inner_microstep: 1304.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3567 [2024-06-11 02:58:19,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.37 | bwd_microstep: 1431.25 | bwd_inner_microstep: 1431.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3803 [2024-06-11 02:58:21,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.46 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 02:58:23,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1350.36 | bwd_inner_microstep: 1350.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592 [2024-06-11 02:58:25,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.01 | optimizer_step: 6.61 [2024-06-11 02:58:25,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.77 | bwd_microstep: 1862.42 | bwd_inner_microstep: 1756.09 | bwd_allreduce_microstep: 106.29 | step_microstep: 37.34 [2024-06-11 02:58:25,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16490.11 | bwd: 44141.17 | bwd_inner: 44033.99 | bwd_allreduce: 106.51 | step: 38.81 {'loss': 1.1846, 'learning_rate': 1.5913698238928632e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 02:58:27,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-11 02:58:29,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1405.46 | bwd_inner_microstep: 1405.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2432 [2024-06-11 02:58:31,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.44 | bwd_microstep: 1006.39 | bwd_inner_microstep: 1006.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3537 [2024-06-11 02:58:33,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.12 | bwd_microstep: 1456.41 | bwd_inner_microstep: 1456.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3402 [2024-06-11 02:58:34,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.05 | bwd_microstep: 1181.34 | bwd_inner_microstep: 1181.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1957 [2024-06-11 02:58:35,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.59 | bwd_inner_microstep: 701.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3714 [2024-06-11 02:58:37,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.86 | bwd_microstep: 1530.87 | bwd_inner_microstep: 1530.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 02:58:39,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.44 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 02:58:41,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1247.52 | bwd_inner_microstep: 1247.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-11 02:58:43,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1347.77 | bwd_inner_microstep: 1347.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 02:58:45,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 1389.57 | bwd_inner_microstep: 1389.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 02:58:47,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.37 | bwd_microstep: 1282.65 | bwd_inner_microstep: 1282.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-11 02:58:48,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.12 | bwd_microstep: 889.45 | bwd_inner_microstep: 889.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 02:58:50,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1485.55 | bwd_inner_microstep: 1485.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3634 [2024-06-11 02:58:52,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.73 | bwd_microstep: 1463.10 | bwd_inner_microstep: 1463.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518 [2024-06-11 02:58:54,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.26 | bwd_microstep: 1193.66 | bwd_inner_microstep: 1193.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 02:58:55,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1284.18 | bwd_inner_microstep: 1284.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1988 [2024-06-11 02:58:57,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.66 | bwd_microstep: 833.93 | bwd_inner_microstep: 833.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 02:58:59,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.69 | bwd_microstep: 1611.77 | bwd_inner_microstep: 1611.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-11 02:59:01,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.80 | bwd_microstep: 1454.92 | bwd_inner_microstep: 1454.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-11 02:59:03,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.46 | bwd_microstep: 1355.28 | bwd_inner_microstep: 1355.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 02:59:05,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1563.56 | bwd_inner_microstep: 1563.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545 [2024-06-11 02:59:07,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.85 | bwd_microstep: 1358.40 | bwd_inner_microstep: 1358.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3807 [2024-06-11 02:59:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.56 | bwd_microstep: 1686.38 | bwd_inner_microstep: 1686.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 02:59:11,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.83 | bwd_microstep: 1383.11 | bwd_inner_microstep: 1383.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3551 [2024-06-11 02:59:13,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1295.82 | bwd_inner_microstep: 1295.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2144 [2024-06-11 02:59:14,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.56 | bwd_microstep: 833.62 | bwd_inner_microstep: 833.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-11 02:59:16,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.67 | bwd_microstep: 1642.53 | bwd_inner_microstep: 1642.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-11 02:59:18,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.40 | bwd_microstep: 1603.89 | bwd_inner_microstep: 1603.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-11 02:59:21,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.67 | bwd_microstep: 1627.72 | bwd_inner_microstep: 1627.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-11 02:59:23,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1417.09 | bwd_inner_microstep: 1417.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2275 [2024-06-11 02:59:30,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 02:59:30,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.43 | bwd_microstep: 7294.98 | bwd_inner_microstep: 1418.06 | bwd_allreduce_microstep: 5876.87 | step_microstep: 37.81 [2024-06-11 02:59:30,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15853.38 | bwd: 48587.21 | bwd_inner: 42709.43 | bwd_allreduce: 5877.10 | step: 39.28 {'loss': 1.2078, 'learning_rate': 1.5767300787965512e-06, 'epoch': 0.88} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-11 02:59:32,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1465.37 | bwd_inner_microstep: 1465.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4348 [2024-06-11 02:59:35,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.19 | bwd_microstep: 1696.23 | bwd_inner_microstep: 1696.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3865 [2024-06-11 02:59:37,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.98 | bwd_microstep: 1519.97 | bwd_inner_microstep: 1519.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 02:59:39,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.86 | bwd_microstep: 1396.09 | bwd_inner_microstep: 1396.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2693 [2024-06-11 02:59:40,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.11 | bwd_microstep: 1027.82 | bwd_inner_microstep: 1027.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 02:59:42,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.79 | bwd_microstep: 1246.29 | bwd_inner_microstep: 1246.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 02:59:44,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.68 | bwd_microstep: 1244.78 | bwd_inner_microstep: 1244.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3745 [2024-06-11 02:59:45,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.38 | bwd_microstep: 1337.10 | bwd_inner_microstep: 1337.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3743 [2024-06-11 02:59:48,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.12 | bwd_microstep: 1625.29 | bwd_inner_microstep: 1625.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404 [2024-06-11 02:59:49,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.23 | bwd_microstep: 1309.39 | bwd_inner_microstep: 1309.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-11 02:59:51,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3408 [2024-06-11 02:59:53,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.68 | bwd_microstep: 1404.92 | bwd_inner_microstep: 1404.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-11 02:59:55,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.57 | bwd_microstep: 1547.19 | bwd_inner_microstep: 1547.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 02:59:57,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.99 | bwd_microstep: 1558.99 | bwd_inner_microstep: 1558.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389 [2024-06-11 02:59:59,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.16 | bwd_microstep: 1301.97 | bwd_inner_microstep: 1301.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 03:00:01,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.21 | bwd_microstep: 1342.60 | bwd_inner_microstep: 1342.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 03:00:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.27 | bwd_microstep: 1291.26 | bwd_inner_microstep: 1291.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-11 03:00:05,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1490.37 | bwd_inner_microstep: 1490.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:00:07,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.57 | bwd_inner_microstep: 1377.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605 [2024-06-11 03:00:09,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1536.52 | bwd_inner_microstep: 1536.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-11 03:00:11,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.57 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2000 [2024-06-11 03:00:12,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.94 | bwd_microstep: 705.64 | bwd_inner_microstep: 705.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3695 [2024-06-11 03:00:14,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.17 | bwd_microstep: 1332.56 | bwd_inner_microstep: 1332.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-11 03:00:16,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1653.36 | bwd_inner_microstep: 1653.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-11 03:00:18,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.92 | bwd_microstep: 1402.19 | bwd_inner_microstep: 1402.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 03:00:20,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.36 | bwd_inner_microstep: 1251.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2677 [2024-06-11 03:00:22,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.49 | bwd_microstep: 1218.81 | bwd_inner_microstep: 1218.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-11 03:00:24,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.72 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3560 [2024-06-11 03:00:26,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.71 | bwd_microstep: 1539.06 | bwd_inner_microstep: 1539.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 03:00:28,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.46 | bwd_microstep: 1653.11 | bwd_inner_microstep: 1653.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-11 03:00:30,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1496.21 | bwd_inner_microstep: 1496.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-11 03:00:32,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.05 | optimizer_step: 6.63 [2024-06-11 03:00:32,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1508.73 | bwd_inner_microstep: 1443.23 | bwd_allreduce_microstep: 65.45 | step_microstep: 37.53 [2024-06-11 03:00:32,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16702.82 | bwd: 44840.73 | bwd_inner: 44774.38 | bwd_allreduce: 65.68 | step: 39.06 61.95s/it] 87%|████████▋ | 1509/1726 [26:17:58<3:44:03, 61.95s/it] 87%|████████▋ | 1510/1726 [26:19:00<3:42:29, 61.80s/it] 87%|████████▋ | 1510/1726 [26:19:00<3:42:29, 61.80s/it] 88%|████████▊ | 1511/1726 [26:20:01<3:41:14, 61.74s/it] 88%|████████▊ | 1511/1726 [26:20:01<3:41:14, 61.74s/it] 88%|████████▊ | 1512/1726 [26:21:02<3:39:22, 61.51s/it] 88%|████████▊ | 1512/1726 [26:21:02<3:39:22, 61.51s/it] 88%|████████▊ | 1513/1726 [26:22:07<3:41:49, 62.49s/it] 88%|████████▊ | 1513/1726 [26:22:07<3:41:49, 62.49s/it] 88%|████████▊ | 1514/1726 [26{'loss': 1.1792, 'learning_rate': 1.5621552203611234e-06, 'epoch': 0.88} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 03:00:34,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.59 | bwd_microstep: 1240.12 | bwd_inner_microstep: 1240.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3509 [2024-06-11 03:00:36,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1318.16 | bwd_inner_microstep: 1318.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446 [2024-06-11 03:00:37,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1185.05 | bwd_inner_microstep: 1185.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3858 [2024-06-11 03:00:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.65 | bwd_microstep: 1663.05 | bwd_inner_microstep: 1663.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-11 03:00:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.65 | bwd_microstep: 1641.56 | bwd_inner_microstep: 1641.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 03:00:44,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.62 | bwd_microstep: 1387.57 | bwd_inner_microstep: 1387.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-11 03:00:46,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.80 | bwd_microstep: 1446.25 | bwd_inner_microstep: 1446.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144 [2024-06-11 03:00:47,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.05 | bwd_microstep: 926.85 | bwd_inner_microstep: 926.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3679 [2024-06-11 03:00:49,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.89 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 03:00:51,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1283.54 | bwd_inner_microstep: 1283.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1947 [2024-06-11 03:00:52,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.87 | bwd_microstep: 789.91 | bwd_inner_microstep: 789.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2145 [2024-06-11 03:00:53,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.21 | bwd_microstep: 945.09 | bwd_inner_microstep: 945.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3469 [2024-06-11 03:00:55,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.20 | bwd_microstep: 1425.72 | bwd_inner_microstep: 1425.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 03:00:57,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.05 | bwd_microstep: 1374.11 | bwd_inner_microstep: 1374.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-11 03:00:58,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.42 | bwd_microstep: 791.79 | bwd_inner_microstep: 791.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3641 [2024-06-11 03:01:00,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.44 | bwd_microstep: 1709.62 | bwd_inner_microstep: 1709.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3625 [2024-06-11 03:01:02,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1441.25 | bwd_inner_microstep: 1441.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 03:01:05,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1499.75 | bwd_inner_microstep: 1499.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-11 03:01:06,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.60 | bwd_microstep: 1164.91 | bwd_inner_microstep: 1164.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-11 03:01:07,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.89 | bwd_microstep: 805.53 | bwd_inner_microstep: 805.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1965 [2024-06-11 03:01:08,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 796.10 | bwd_inner_microstep: 796.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668 [2024-06-11 03:01:10,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.15 | bwd_microstep: 1428.56 | bwd_inner_microstep: 1428.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 03:01:13,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.22 | bwd_microstep: 1656.60 | bwd_inner_microstep: 1656.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3540 [2024-06-11 03:01:15,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.24 | bwd_microstep: 1496.26 | bwd_inner_microstep: 1496.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3720 [2024-06-11 03:01:17,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1368.51 | bwd_inner_microstep: 1368.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3605 [2024-06-11 03:01:18,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1341.24 | bwd_inner_microstep: 1341.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468 [2024-06-11 03:01:20,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.01 | bwd_microstep: 1242.55 | bwd_inner_microstep: 1242.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3817 [2024-06-11 03:01:22,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.84 | bwd_microstep: 1604.54 | bwd_inner_microstep: 1604.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 03:01:25,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.79 | bwd_microstep: 1646.88 | bwd_inner_microstep: 1646.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3030 [2024-06-11 03:01:26,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.39 | bwd_microstep: 1070.61 | bwd_inner_microstep: 1070.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 03:01:28,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.21 | bwd_microstep: 1447.60 | bwd_inner_microstep: 1447.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 03:01:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 03:01:35,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.35 | bwd_microstep: 5854.40 | bwd_inner_microstep: 1541.38 | bwd_allreduce_microstep: 4312.97 | step_microstep: 37.79 [2024-06-11 03:01:35,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15694.78 | bwd: 46418.85 | bwd_inner: 42104.98 | bwd_allreduce: 4313.20 | step: 39.28 {'loss': 1.1518, 'learning_rate': 1.5476452999191626e-06, 'epoch': 0.88} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-11 03:01:36,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.62 | bwd_microstep: 886.20 | bwd_inner_microstep: 886.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-11 03:01:38,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1299.32 | bwd_inner_microstep: 1299.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3908 [2024-06-11 03:01:40,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.94 | bwd_microstep: 1588.43 | bwd_inner_microstep: 1588.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-11 03:01:42,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.33 | bwd_microstep: 1306.65 | bwd_inner_microstep: 1306.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 03:01:44,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.81 | bwd_microstep: 1377.94 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1939 [2024-06-11 03:01:45,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.70 | bwd_microstep: 759.07 | bwd_inner_microstep: 759.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 03:01:46,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.24 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-11 03:01:47,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.09 | bwd_microstep: 791.10 | bwd_inner_microstep: 791.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-11 03:01:49,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.23 | bwd_microstep: 1279.05 | bwd_inner_microstep: 1279.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2245 [2024-06-11 03:01:50,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.78 | bwd_microstep: 901.40 | bwd_inner_microstep: 901.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504 [2024-06-11 03:01:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.77 | bwd_microstep: 1442.78 | bwd_inner_microstep: 1442.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 4023 [2024-06-11 03:01:55,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 682.59 | bwd_microstep: 1875.48 | bwd_inner_microstep: 1875.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 03:01:57,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1283.80 | bwd_inner_microstep: 1283.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2392 [2024-06-11 03:01:58,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.01 | bwd_microstep: 1028.98 | bwd_inner_microstep: 1028.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3694 [2024-06-11 03:02:00,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.96 | bwd_microstep: 1548.04 | bwd_inner_microstep: 1548.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 03:02:02,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1288.17 | bwd_inner_microstep: 1288.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3514 [2024-06-11 03:02:04,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.46 | bwd_microstep: 1448.80 | bwd_inner_microstep: 1448.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2093 [2024-06-11 03:02:05,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.07 | bwd_microstep: 916.82 | bwd_inner_microstep: 916.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-11 03:02:07,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.32 | bwd_microstep: 1443.34 | bwd_inner_microstep: 1443.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3521 [2024-06-11 03:02:09,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.19 | bwd_microstep: 1516.84 | bwd_inner_microstep: 1516.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-11 03:02:11,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.08 | bwd_microstep: 788.85 | bwd_inner_microstep: 788.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-11 03:02:13,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1429.47 | bwd_inner_microstep: 1429.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2098 [2024-06-11 03:02:14,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.48 | bwd_microstep: 886.31 | bwd_inner_microstep: 886.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1940 [2024-06-11 03:02:15,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.21 | bwd_microstep: 696.75 | bwd_inner_microstep: 696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 03:02:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1385.28 | bwd_inner_microstep: 1385.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 03:02:19,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.18 | bwd_microstep: 1492.93 | bwd_inner_microstep: 1492.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-11 03:02:20,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.84 | bwd_microstep: 1279.19 | bwd_inner_microstep: 1279.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-11 03:02:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.08 | bwd_microstep: 1608.42 | bwd_inner_microstep: 1608.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 03:02:24,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.24 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 03:02:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1552.37 | bwd_inner_microstep: 1552.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3458 [2024-06-11 03:02:28,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.72 | bwd_microstep: 1310.54 | bwd_inner_microstep: 1310.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769 [2024-06-11 03:02:36,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 03:02:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.00 | bwd_microstep: 7171.55 | bwd_inner_microstep: 1862.77 | bwd_allreduce_microstep: 5308.73 | step_microstep: 37.73 [2024-06-11 03:02:36,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15154.61 | bwd: 46152.26 | bwd_inner: 40842.62 | bwd_allreduce: 5308.95 | step: 39.18 {'loss': 1.1708, 'learning_rate': 1.5332003685745279e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 03:02:38,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1366.93 | bwd_inner_microstep: 1366.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955 [2024-06-11 03:02:40,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.53 | bwd_microstep: 1591.57 | bwd_inner_microstep: 1591.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 03:02:42,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1292.02 | bwd_inner_microstep: 1291.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 03:02:44,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1240.98 | bwd_inner_microstep: 1240.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 03:02:46,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1383.61 | bwd_inner_microstep: 1383.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-11 03:02:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1250.45 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422 [2024-06-11 03:02:49,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.23 | bwd_microstep: 1184.11 | bwd_inner_microstep: 1184.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-11 03:02:51,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.88 | bwd_microstep: 1302.39 | bwd_inner_microstep: 1302.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065 [2024-06-11 03:02:52,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.80 | bwd_microstep: 817.81 | bwd_inner_microstep: 817.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-11 03:02:54,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.58 | bwd_microstep: 1529.62 | bwd_inner_microstep: 1529.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2128 [2024-06-11 03:02:55,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.46 | bwd_microstep: 927.34 | bwd_inner_microstep: 927.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3439 [2024-06-11 03:02:57,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.21 | bwd_microstep: 1311.02 | bwd_inner_microstep: 1310.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 03:02:59,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.64 | bwd_microstep: 1469.26 | bwd_inner_microstep: 1469.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-11 03:03:01,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3619 [2024-06-11 03:03:03,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.46 | bwd_microstep: 1432.11 | bwd_inner_microstep: 1432.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2875 [2024-06-11 03:03:05,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.57 | bwd_microstep: 1116.73 | bwd_inner_microstep: 1116.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990 [2024-06-11 03:03:06,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.75 | bwd_microstep: 861.77 | bwd_inner_microstep: 861.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3531 [2024-06-11 03:03:08,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.59 | bwd_microstep: 1455.07 | bwd_inner_microstep: 1455.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972 [2024-06-11 03:03:09,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.19 | bwd_microstep: 828.04 | bwd_inner_microstep: 828.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3511 [2024-06-11 03:03:11,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.48 | bwd_microstep: 1438.52 | bwd_inner_microstep: 1438.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-11 03:03:13,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.24 | bwd_microstep: 1424.38 | bwd_inner_microstep: 1424.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3685 [2024-06-11 03:03:15,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.40 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3470 [2024-06-11 03:03:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.83 | bwd_microstep: 1247.43 | bwd_inner_microstep: 1247.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-11 03:03:18,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.39 | bwd_microstep: 1411.37 | bwd_inner_microstep: 1411.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 03:03:20,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1300.62 | bwd_inner_microstep: 1300.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 03:03:22,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.42 | bwd_microstep: 1546.73 | bwd_inner_microstep: 1546.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-11 03:03:24,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.06 | bwd_microstep: 1433.25 | bwd_inner_microstep: 1433.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3592 [2024-06-11 03:03:26,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.45 | bwd_microstep: 1462.80 | bwd_inner_microstep: 1462.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-11 03:03:28,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1501.99 | bwd_inner_microstep: 1501.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3569 [2024-06-11 03:03:31,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1528.10 | bwd_inner_microstep: 1528.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-11 03:03:33,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.25 | bwd_microstep: 1402.42 | bwd_inner_microstep: 1402.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-11 03:03:38,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.61 | optimizer_gradients: 4.27 | optimizer_step: 6.60 [2024-06-11 03:03:38,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 4579.89 | bwd_inner_microstep: 1585.05 | bwd_allreduce_microstep: 2994.75 | step_microstep: 39.81 [2024-06-11 03:03:38,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15837.07 | bwd: 45315.01 | bwd_inner: 42319.31 | bwd_allreduce: 2995.00 | step: 41.26 {'loss': 1.1462, 'learning_rate': 1.518820477202203e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 03:03:40,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1347.76 | bwd_inner_microstep: 1347.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2380 [2024-06-11 03:03:41,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.57 | bwd_microstep: 947.59 | bwd_inner_microstep: 947.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 03:03:43,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.65 | bwd_microstep: 1651.48 | bwd_inner_microstep: 1651.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 03:03:45,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1380.28 | bwd_inner_microstep: 1380.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2244 [2024-06-11 03:03:46,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.00 | bwd_microstep: 899.46 | bwd_inner_microstep: 899.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279 [2024-06-11 03:03:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.94 | bwd_microstep: 906.58 | bwd_inner_microstep: 906.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 03:03:49,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.51 | bwd_microstep: 1350.57 | bwd_inner_microstep: 1350.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 03:03:51,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.48 | bwd_microstep: 1399.73 | bwd_inner_microstep: 1399.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1974 [2024-06-11 03:03:52,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.82 | bwd_microstep: 750.08 | bwd_inner_microstep: 750.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1944 [2024-06-11 03:03:54,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.43 | bwd_microstep: 840.30 | bwd_inner_microstep: 840.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 03:03:55,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.59 | bwd_microstep: 1393.93 | bwd_inner_microstep: 1393.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3431 [2024-06-11 03:03:57,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.86 | bwd_microstep: 1370.62 | bwd_inner_microstep: 1370.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2226 [2024-06-11 03:03:59,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.50 | bwd_microstep: 992.60 | bwd_inner_microstep: 992.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3399 [2024-06-11 03:04:01,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.85 | bwd_microstep: 1434.56 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3419 [2024-06-11 03:04:03,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 1439.33 | bwd_inner_microstep: 1439.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655 [2024-06-11 03:04:05,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.49 | bwd_microstep: 1617.39 | bwd_inner_microstep: 1617.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3004 [2024-06-11 03:04:07,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.74 | bwd_microstep: 1298.81 | bwd_inner_microstep: 1298.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3624 [2024-06-11 03:04:09,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1508.94 | bwd_inner_microstep: 1508.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 03:04:11,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.61 | bwd_microstep: 1612.18 | bwd_inner_microstep: 1612.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 03:04:13,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.33 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3464 [2024-06-11 03:04:15,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1310.08 | bwd_inner_microstep: 1310.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3461 [2024-06-11 03:04:17,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.02 | bwd_microstep: 1213.75 | bwd_inner_microstep: 1213.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-11 03:04:18,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.66 | bwd_microstep: 1292.93 | bwd_inner_microstep: 1292.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 03:04:20,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.10 | bwd_microstep: 1352.55 | bwd_inner_microstep: 1352.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2062 [2024-06-11 03:04:21,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.40 | bwd_microstep: 894.83 | bwd_inner_microstep: 894.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 03:04:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.89 | bwd_microstep: 1393.26 | bwd_inner_microstep: 1393.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-11 03:04:25,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.24 | bwd_microstep: 1339.07 | bwd_inner_microstep: 1339.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039 [2024-06-11 03:04:26,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.21 | bwd_microstep: 903.94 | bwd_inner_microstep: 903.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3811 [2024-06-11 03:04:28,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.31 | bwd_microstep: 1388.46 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-11 03:04:30,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.79 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3434 [2024-06-11 03:04:32,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.94 | bwd_microstep: 1373.78 | bwd_inner_microstep: 1373.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-11 03:04:39,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-11 03:04:39,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 6574.05 | bwd_inner_microstep: 1684.88 | bwd_allreduce_microstep: 4889.10 | step_microstep: 38.59 [2024-06-11 03:04:39,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15344.94 | bwd: 46040.24 | bwd_inner: 41150.21 | bwd_allreduce: 4889.34 | step: 40.07 {'loss': 1.1446, 'learning_rate': 1.504505676448076e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-11 03:04:41,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.65 | bwd_microstep: 1331.47 | bwd_inner_microstep: 1331.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851 [2024-06-11 03:04:43,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1454.58 | bwd_inner_microstep: 1454.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 03:04:45,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.60 | bwd_microstep: 1378.10 | bwd_inner_microstep: 1378.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 03:04:47,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1341.11 | bwd_inner_microstep: 1341.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 03:04:49,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.19 | bwd_microstep: 1549.82 | bwd_inner_microstep: 1549.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3471 [2024-06-11 03:04:51,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.47 | bwd_microstep: 1214.19 | bwd_inner_microstep: 1214.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 03:04:53,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1278.90 | bwd_inner_microstep: 1278.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-11 03:04:54,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.56 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 03:04:56,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.53 | bwd_microstep: 1281.46 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 03:04:58,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.96 | bwd_microstep: 1351.44 | bwd_inner_microstep: 1351.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 03:05:00,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.04 | bwd_microstep: 1261.07 | bwd_inner_microstep: 1261.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-11 03:05:02,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.32 | bwd_microstep: 1440.63 | bwd_inner_microstep: 1440.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-11 03:05:04,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.47 | bwd_microstep: 1568.53 | bwd_inner_microstep: 1568.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2046 [2024-06-11 03:05:05,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.29 | bwd_microstep: 838.04 | bwd_inner_microstep: 838.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 03:05:07,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.33 | bwd_microstep: 1386.46 | bwd_inner_microstep: 1386.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 03:05:09,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1400.88 | bwd_inner_microstep: 1400.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945 [2024-06-11 03:05:10,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.13 | bwd_microstep: 696.27 | bwd_inner_microstep: 696.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1952 [2024-06-11 03:05:11,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.83 | bwd_microstep: 837.50 | bwd_inner_microstep: 837.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3518 [2024-06-11 03:05:13,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1488.41 | bwd_inner_microstep: 1488.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3660 [2024-06-11 03:05:15,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.35 | bwd_microstep: 1451.25 | bwd_inner_microstep: 1451.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615 [2024-06-11 03:05:15,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.61 | bwd_microstep: 262.89 | bwd_inner_microstep: 262.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 03:05:17,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.91 | bwd_microstep: 1373.36 | bwd_inner_microstep: 1373.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 03:05:19,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.27 | bwd_microstep: 1285.39 | bwd_inner_microstep: 1285.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2004 [2024-06-11 03:05:20,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.64 | bwd_microstep: 800.70 | bwd_inner_microstep: 800.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 03:05:22,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.51 | bwd_microstep: 1559.94 | bwd_inner_microstep: 1559.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3690 [2024-06-11 03:05:24,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.94 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2053 [2024-06-11 03:05:26,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 369.29 | bwd_microstep: 1008.67 | bwd_inner_microstep: 1008.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2075 [2024-06-11 03:05:27,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.84 | bwd_microstep: 945.62 | bwd_inner_microstep: 945.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-11 03:05:28,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.49 | bwd_microstep: 792.08 | bwd_inner_microstep: 792.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3698 [2024-06-11 03:05:30,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.45 | bwd_microstep: 1424.06 | bwd_inner_microstep: 1424.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3806 [2024-06-11 03:05:32,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.11 | bwd_microstep: 1755.64 | bwd_inner_microstep: 1755.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3576 [2024-06-11 03:05:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-11 03:05:42,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.66 | bwd_microstep: 9138.22 | bwd_inner_microstep: 1455.96 | bwd_allreduce_microstep: 7682.19 | step_microstep: 38.71 [2024-06-11 03:05:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14902.84 | bwd: 47538.10 | bwd_inner: 39854.98 | bwd_allreduce: 7682.43 | step: 40.26 :23:09<3:40:08, 62.31s/it] 88%|████████▊ | 1514/1726 [26:23:09<3:40:08, 62.31s/it] 88%|████████▊ | 1515/1726 [26:24:11<3:39:15, 62.35s/it] 88%|████████▊ | 1515/1726 [26:24:11<3:39:15, 62.35s/it] 88%|████████▊ | 1516/1726 [26:25:13<3:37:27, 62.13s/it] 88%|████████▊ | 1516/1726 [26:25:13<3:37:27, 62.13s/it] 88%|████████▊ | 1517/1726 [26:26:14<3:35:44, 61.94s/it] 88%|████████▊ | 1517/1726 [26:26:14<3:35:44, 61.94s/it] 88%|████████▊ | 1518/1726 [26:27:16<3:34:28, 61.87s/it] 88%|████████▊ | 1518/1726 [26:27:16<3:34:28, 61.87s/it] 88%|████████▊{'loss': 1.2086, 'learning_rate': 1.4902560167288105e-06, 'epoch': 0.88} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3457 [2024-06-11 03:05:44,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1488.85 | bwd_inner_microstep: 1488.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3872 [2024-06-11 03:05:46,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.10 | bwd_microstep: 1657.64 | bwd_inner_microstep: 1657.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 03:05:48,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.99 | bwd_microstep: 1387.44 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-11 03:05:50,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.55 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 03:05:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.91 | bwd_microstep: 1248.98 | bwd_inner_microstep: 1248.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-11 03:05:54,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.16 | bwd_microstep: 1526.71 | bwd_inner_microstep: 1526.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478 [2024-06-11 03:05:56,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.43 | bwd_microstep: 1247.20 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3427 [2024-06-11 03:05:58,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.01 | bwd_microstep: 1277.06 | bwd_inner_microstep: 1277.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-11 03:05:59,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1276.87 | bwd_inner_microstep: 1276.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3680 [2024-06-11 03:06:01,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.21 | bwd_microstep: 1444.63 | bwd_inner_microstep: 1444.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-11 03:06:03,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.41 | bwd_microstep: 1402.11 | bwd_inner_microstep: 1402.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 03:06:05,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.35 | bwd_microstep: 1342.83 | bwd_inner_microstep: 1342.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 03:06:07,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.70 | bwd_microstep: 1479.98 | bwd_inner_microstep: 1479.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-11 03:06:09,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1215.09 | bwd_inner_microstep: 1215.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 03:06:11,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.77 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 03:06:12,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1274.39 | bwd_inner_microstep: 1274.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3629 [2024-06-11 03:06:14,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.29 | bwd_microstep: 1339.95 | bwd_inner_microstep: 1339.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-11 03:06:16,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.65 | bwd_microstep: 1286.50 | bwd_inner_microstep: 1286.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 03:06:18,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.54 | bwd_microstep: 1394.22 | bwd_inner_microstep: 1394.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3827 [2024-06-11 03:06:20,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1479.84 | bwd_inner_microstep: 1479.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-11 03:06:22,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.76 | bwd_microstep: 1493.97 | bwd_inner_microstep: 1493.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 03:06:24,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.02 | bwd_microstep: 1389.01 | bwd_inner_microstep: 1388.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 03:06:26,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1296.38 | bwd_inner_microstep: 1296.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-11 03:06:28,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.14 | bwd_microstep: 1445.45 | bwd_inner_microstep: 1445.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 03:06:30,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.24 | bwd_microstep: 1388.37 | bwd_inner_microstep: 1388.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3629 [2024-06-11 03:06:32,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.94 | bwd_microstep: 1391.67 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2060 [2024-06-11 03:06:33,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.02 | bwd_microstep: 813.36 | bwd_inner_microstep: 813.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3708 [2024-06-11 03:06:35,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.26 | bwd_microstep: 1590.38 | bwd_inner_microstep: 1590.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3563 [2024-06-11 03:06:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.46 | bwd_microstep: 1586.74 | bwd_inner_microstep: 1586.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3525 [2024-06-11 03:06:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.77 | bwd_microstep: 1659.46 | bwd_inner_microstep: 1659.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3707 [2024-06-11 03:06:41,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1457.54 | bwd_inner_microstep: 1457.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 03:06:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.04 | optimizer_step: 6.63 [2024-06-11 03:06:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1449.70 | bwd_inner_microstep: 1442.04 | bwd_allreduce_microstep: 7.61 | step_microstep: 37.45 [2024-06-11 03:06:43,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16608.62 | bwd: 44302.04 | bwd_inner: 44293.54 | bwd_allreduce: 7.83 | step: 38.92 {'loss': 1.1973, 'learning_rate': 1.4760715482316301e-06, 'epoch': 0.88} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-11 03:06:45,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1485.83 | bwd_inner_microstep: 1485.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4151 [2024-06-11 03:06:48,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.16 | bwd_microstep: 1570.24 | bwd_inner_microstep: 1570.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3903 [2024-06-11 03:06:50,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.40 | bwd_microstep: 1484.67 | bwd_inner_microstep: 1484.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862 [2024-06-11 03:06:52,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.59 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3822 [2024-06-11 03:06:54,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.04 | bwd_microstep: 1290.89 | bwd_inner_microstep: 1290.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492 [2024-06-11 03:06:55,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.74 | bwd_microstep: 1186.91 | bwd_inner_microstep: 1186.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 03:06:57,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.43 | bwd_microstep: 1389.38 | bwd_inner_microstep: 1389.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2243 [2024-06-11 03:06:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.19 | bwd_microstep: 965.59 | bwd_inner_microstep: 965.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 03:07:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1390.42 | bwd_inner_microstep: 1390.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 03:07:02,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.57 | bwd_microstep: 1391.85 | bwd_inner_microstep: 1391.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3607 [2024-06-11 03:07:04,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1438.34 | bwd_inner_microstep: 1438.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3491 [2024-06-11 03:07:06,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1248.61 | bwd_inner_microstep: 1248.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-11 03:07:08,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.12 | bwd_microstep: 1277.76 | bwd_inner_microstep: 1277.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2939 [2024-06-11 03:07:10,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.71 | bwd_microstep: 1284.82 | bwd_inner_microstep: 1284.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 03:07:11,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.22 | bwd_microstep: 1348.05 | bwd_inner_microstep: 1348.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3654 [2024-06-11 03:07:14,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.90 | bwd_microstep: 1519.48 | bwd_inner_microstep: 1519.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3682 [2024-06-11 03:07:16,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.61 | bwd_microstep: 1521.42 | bwd_inner_microstep: 1521.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998 [2024-06-11 03:07:17,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.22 | bwd_microstep: 898.08 | bwd_inner_microstep: 898.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 03:07:19,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1311.35 | bwd_inner_microstep: 1311.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 03:07:20,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 788.19 | bwd_inner_microstep: 788.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-11 03:07:22,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.53 | bwd_microstep: 1290.70 | bwd_inner_microstep: 1290.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 03:07:24,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1403.52 | bwd_inner_microstep: 1403.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3714 [2024-06-11 03:07:25,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1333.75 | bwd_inner_microstep: 1333.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 03:07:28,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1558.19 | bwd_inner_microstep: 1558.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-11 03:07:29,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 03:07:31,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.35 | bwd_microstep: 1283.71 | bwd_inner_microstep: 1283.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3566 [2024-06-11 03:07:33,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.92 | bwd_microstep: 1527.60 | bwd_inner_microstep: 1527.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3557 [2024-06-11 03:07:36,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1588.03 | bwd_inner_microstep: 1588.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-11 03:07:38,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.95 | bwd_microstep: 1652.60 | bwd_inner_microstep: 1652.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-11 03:07:40,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.68 | bwd_microstep: 1630.20 | bwd_inner_microstep: 1630.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814 [2024-06-11 03:07:42,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.89 | bwd_microstep: 1753.30 | bwd_inner_microstep: 1753.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2810 [2024-06-11 03:07:46,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.07 | optimizer_step: 6.59 [2024-06-11 03:07:46,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 430.49 | bwd_microstep: 3150.16 | bwd_inner_microstep: 1301.22 | bwd_allreduce_microstep: 1848.89 | step_microstep: 37.64 [2024-06-11 03:07:46,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16449.05 | bwd: 45926.83 | bwd_inner: 44077.04 | bwd_allreduce: 1849.11 | step: 39.11 {'loss': 1.1417, 'learning_rate': 1.4619523209141573e-06, 'epoch': 0.88} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 03:07:48,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.02 | bwd_microstep: 1472.81 | bwd_inner_microstep: 1472.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4669 [2024-06-11 03:07:51,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.99 | bwd_microstep: 1777.70 | bwd_inner_microstep: 1777.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3856 [2024-06-11 03:07:53,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.50 | bwd_microstep: 1463.79 | bwd_inner_microstep: 1463.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 03:07:55,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.27 | bwd_microstep: 1447.32 | bwd_inner_microstep: 1447.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-11 03:07:56,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.29 | bwd_microstep: 1342.06 | bwd_inner_microstep: 1342.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-11 03:07:58,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.81 | bwd_microstep: 1278.85 | bwd_inner_microstep: 1278.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 03:08:00,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1244.23 | bwd_inner_microstep: 1244.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 03:08:02,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1386.99 | bwd_inner_microstep: 1386.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-11 03:08:04,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.36 | bwd_microstep: 1638.99 | bwd_inner_microstep: 1638.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-11 03:08:06,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.18 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 03:08:08,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.43 | bwd_microstep: 1381.06 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2151 [2024-06-11 03:08:09,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.27 | bwd_microstep: 880.17 | bwd_inner_microstep: 880.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3536 [2024-06-11 03:08:11,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.08 | bwd_microstep: 1197.55 | bwd_inner_microstep: 1197.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-11 03:08:12,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 887.41 | bwd_inner_microstep: 887.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836 [2024-06-11 03:08:14,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.86 | bwd_microstep: 1752.53 | bwd_inner_microstep: 1752.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 03:08:16,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1547.91 | bwd_inner_microstep: 1547.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2108 [2024-06-11 03:08:18,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.88 | bwd_microstep: 918.82 | bwd_inner_microstep: 918.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 03:08:20,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.40 | bwd_microstep: 1355.76 | bwd_inner_microstep: 1355.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-11 03:08:21,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.68 | bwd_microstep: 1291.83 | bwd_inner_microstep: 1291.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-11 03:08:23,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1248.40 | bwd_inner_microstep: 1248.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 03:08:25,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.11 | bwd_microstep: 1553.33 | bwd_inner_microstep: 1553.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 03:08:27,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1554.70 | bwd_inner_microstep: 1554.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 03:08:29,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.11 | bwd_microstep: 1389.15 | bwd_inner_microstep: 1389.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-11 03:08:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.56 | bwd_microstep: 1555.05 | bwd_inner_microstep: 1555.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145 [2024-06-11 03:08:33,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.91 | bwd_microstep: 850.28 | bwd_inner_microstep: 850.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3618 [2024-06-11 03:08:35,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.65 | bwd_microstep: 1372.05 | bwd_inner_microstep: 1372.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3564 [2024-06-11 03:08:37,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.57 | bwd_microstep: 1459.00 | bwd_inner_microstep: 1458.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3783 [2024-06-11 03:08:39,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.50 | bwd_microstep: 1446.79 | bwd_inner_microstep: 1446.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 03:08:40,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.45 | bwd_microstep: 1348.78 | bwd_inner_microstep: 1348.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3400 [2024-06-11 03:08:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.37 | bwd_microstep: 1435.96 | bwd_inner_microstep: 1435.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3455 [2024-06-11 03:08:44,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.14 | bwd_microstep: 1503.45 | bwd_inner_microstep: 1503.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3589 [2024-06-11 03:08:48,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.05 | optimizer_step: 6.60 [2024-06-11 03:08:48,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.66 | bwd_microstep: 3255.68 | bwd_inner_microstep: 1731.54 | bwd_allreduce_microstep: 1524.09 | step_microstep: 37.56 [2024-06-11 03:08:48,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16379.00 | bwd: 45484.68 | bwd_inner: 43959.67 | bwd_allreduce: 1524.32 | step: 39.04 {'loss': 1.1692, 'learning_rate': 1.4478983845042493e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 03:08:50,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.50 | bwd_microstep: 1370.19 | bwd_inner_microstep: 1370.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 03:08:52,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1389.39 | bwd_inner_microstep: 1389.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3826 [2024-06-11 03:08:54,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.53 | bwd_microstep: 1511.28 | bwd_inner_microstep: 1511.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-11 03:08:56,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.25 | bwd_microstep: 1445.61 | bwd_inner_microstep: 1445.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795 [2024-06-11 03:08:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.90 | bwd_microstep: 1644.74 | bwd_inner_microstep: 1644.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405 [2024-06-11 03:09:00,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.84 | bwd_microstep: 1175.44 | bwd_inner_microstep: 1175.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-11 03:09:01,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.19 | bwd_microstep: 791.28 | bwd_inner_microstep: 791.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3495 [2024-06-11 03:09:03,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.51 | bwd_microstep: 1247.82 | bwd_inner_microstep: 1247.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-11 03:09:05,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.40 | bwd_microstep: 1501.67 | bwd_inner_microstep: 1501.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421 [2024-06-11 03:09:07,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.66 | bwd_microstep: 1182.94 | bwd_inner_microstep: 1182.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-11 03:09:08,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.17 | bwd_microstep: 1288.71 | bwd_inner_microstep: 1288.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3690 [2024-06-11 03:09:11,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.48 | bwd_microstep: 1613.57 | bwd_inner_microstep: 1613.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-11 03:09:13,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.41 | bwd_microstep: 1486.68 | bwd_inner_microstep: 1486.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3676 [2024-06-11 03:09:15,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1620.42 | bwd_inner_microstep: 1620.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721 [2024-06-11 03:09:17,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.48 | bwd_microstep: 1627.23 | bwd_inner_microstep: 1627.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3522 [2024-06-11 03:09:19,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.71 | bwd_microstep: 1487.71 | bwd_inner_microstep: 1487.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1981 [2024-06-11 03:09:20,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.50 | bwd_microstep: 856.14 | bwd_inner_microstep: 856.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3518 [2024-06-11 03:09:22,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1415.11 | bwd_inner_microstep: 1415.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-11 03:09:24,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.75 | bwd_microstep: 1407.58 | bwd_inner_microstep: 1407.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3629 [2024-06-11 03:09:26,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1415.41 | bwd_inner_microstep: 1415.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687 [2024-06-11 03:09:28,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1431.05 | bwd_inner_microstep: 1431.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:09:30,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1380.42 | bwd_inner_microstep: 1380.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-11 03:09:32,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-11 03:09:34,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.02 | bwd_microstep: 1465.74 | bwd_inner_microstep: 1465.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-11 03:09:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1615.99 | bwd_inner_microstep: 1615.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 03:09:38,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1553.91 | bwd_inner_microstep: 1553.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-11 03:09:40,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.81 | bwd_microstep: 1345.02 | bwd_inner_microstep: 1344.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3605 [2024-06-11 03:09:42,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1533.91 | bwd_inner_microstep: 1533.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2401 [2024-06-11 03:09:44,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.11 | bwd_microstep: 1034.25 | bwd_inner_microstep: 1034.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764 [2024-06-11 03:09:46,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.23 | bwd_microstep: 1536.32 | bwd_inner_microstep: 1536.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 03:09:48,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.80 | bwd_microstep: 1649.91 | bwd_inner_microstep: 1649.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3587 [2024-06-11 03:09:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.98 | optimizer_gradients: 4.02 | optimizer_step: 6.60 [2024-06-11 03:09:51,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 2328.28 | bwd_inner_microstep: 1666.03 | bwd_allreduce_microstep: 662.20 | step_microstep: 37.61 [2024-06-11 03:09:51,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16777.80 | bwd: 45785.31 | bwd_inner: 45122.21 | bwd_allreduce: 662.43 | step: 39.05 {'loss': 1.2033, 'learning_rate': 1.4339097884997787e-06, 'epoch': 0.88} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-11 03:09:53,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.47 | bwd_microstep: 1358.69 | bwd_inner_microstep: 1358.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 03:09:55,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.67 | bwd_microstep: 1240.41 | bwd_inner_microstep: 1240.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 03:09:57,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.04 | bwd_microstep: 1282.40 | bwd_inner_microstep: 1282.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465 [2024-06-11 03:09:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.13 | bwd_microstep: 1337.83 | bwd_inner_microstep: 1337.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 03:10:00,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1247.91 | bwd_inner_microstep: 1247.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 03:10:02,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.54 | bwd_microstep: 1280.89 | bwd_inner_microstep: 1280.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 03:10:04,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.48 | bwd_microstep: 1292.20 | bwd_inner_microstep: 1292.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-11 03:10:05,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.89 | bwd_microstep: 790.01 | bwd_inner_microstep: 789.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 03:10:06,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.98 | bwd_microstep: 1149.89 | bwd_inner_microstep: 1149.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1921 [2024-06-11 03:10:08,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.78 | bwd_microstep: 789.59 | bwd_inner_microstep: 789.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3701 [2024-06-11 03:10:09,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1290.36 | bwd_inner_microstep: 1290.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2479 [2024-06-11 03:10:11,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.77 | bwd_microstep: 980.70 | bwd_inner_microstep: 980.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-11 03:10:12,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.06 | bwd_microstep: 1240.25 | bwd_inner_microstep: 1240.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-11 03:10:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.37 | bwd_microstep: 1353.20 | bwd_inner_microstep: 1353.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 03:10:16,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1389.94 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2120 [2024-06-11 03:10:18,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.84 | bwd_microstep: 1022.87 | bwd_inner_microstep: 1022.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3599 [2024-06-11 03:10:20,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1554.46 | bwd_inner_microstep: 1554.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-11 03:10:21,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.21 | bwd_microstep: 1159.33 | bwd_inner_microstep: 1159.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 03:10:23,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1560.47 | bwd_inner_microstep: 1560.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2104 [2024-06-11 03:10:25,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 821.65 | bwd_inner_microstep: 821.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-11 03:10:27,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.10 | bwd_microstep: 1451.01 | bwd_inner_microstep: 1450.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 950 [2024-06-11 03:10:27,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.22 | bwd_microstep: 379.32 | bwd_inner_microstep: 379.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2289 [2024-06-11 03:10:28,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.74 | bwd_microstep: 910.96 | bwd_inner_microstep: 910.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 03:10:31,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.14 | bwd_microstep: 1658.72 | bwd_inner_microstep: 1658.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 03:10:33,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.22 | bwd_microstep: 1654.05 | bwd_inner_microstep: 1654.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3561 [2024-06-11 03:10:35,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.51 | bwd_microstep: 1266.38 | bwd_inner_microstep: 1266.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3438 [2024-06-11 03:10:37,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.52 | bwd_microstep: 1378.17 | bwd_inner_microstep: 1378.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 03:10:39,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.25 | bwd_microstep: 1394.26 | bwd_inner_microstep: 1394.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-11 03:10:41,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1452.59 | bwd_inner_microstep: 1452.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-11 03:10:42,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.40 | bwd_microstep: 877.04 | bwd_inner_microstep: 877.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3575 [2024-06-11 03:10:44,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.49 | bwd_microstep: 1693.70 | bwd_inner_microstep: 1693.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2118 [2024-06-11 03:10:51,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.09 | optimizer_step: 6.58 [2024-06-11 03:10:51,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.11 | bwd_microstep: 6830.19 | bwd_inner_microstep: 1056.32 | bwd_allreduce_microstep: 5773.82 | step_microstep: 37.94 [2024-06-11 03:10:51,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14735.32 | bwd: 45089.47 | bwd_inner: 39314.75 | bwd_allreduce: 5774.05 | step: 39.43 | 1519/1726 [26:28:19<3:34:22, 62.14s/it] 88%|████████▊ | 1519/1726 [26:28:19<3:34:22, 62.14s/it] 88%|████████▊ | 1520/1726 [26:29:20<3:32:25, 61.87s/it] 88%|████████▊ | 1520/1726 [26:29:20<3:32:25, 61.87s/it] 88%|████████▊ | 1521/1726 [26:30:23<3:32:14, 62.12s/it] 88%|████████▊ | 1521/1726 [26:30:23<3:32:14, 62.12s/it] 88%|████████▊ | 1522/1726 [26:31:25<3:31:17, 62.14s/it] 88%|████████▊ | 1522/1726 [26:31:25<3:31:17, 62.14s/it] 88%|████████▊ | 1523/1726 [26:32:28<3:31:01, 62.37s/it] 88%|████████▊ | 1523/1726 [26:32:28<3:31:01, 62.37s/it] 88%|███�{'loss': 1.1447, 'learning_rate': 1.419986582168522e-06, 'epoch': 0.88} dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3474 [2024-06-11 03:10:53,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.46 | bwd_microstep: 1495.22 | bwd_inner_microstep: 1495.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-11 03:10:55,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.06 | bwd_microstep: 1487.70 | bwd_inner_microstep: 1487.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3492 [2024-06-11 03:10:57,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.32 | bwd_microstep: 1345.64 | bwd_inner_microstep: 1345.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3883 [2024-06-11 03:10:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1444.85 | bwd_inner_microstep: 1444.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3792 [2024-06-11 03:11:02,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.44 | bwd_microstep: 1647.11 | bwd_inner_microstep: 1647.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080 [2024-06-11 03:11:03,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 820.08 | bwd_inner_microstep: 820.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3524 [2024-06-11 03:11:05,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.64 | bwd_microstep: 1352.42 | bwd_inner_microstep: 1352.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430 [2024-06-11 03:11:06,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.43 | bwd_microstep: 1152.94 | bwd_inner_microstep: 1152.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3414 [2024-06-11 03:11:08,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1213.29 | bwd_inner_microstep: 1213.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 03:11:10,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.19 | bwd_microstep: 1403.36 | bwd_inner_microstep: 1403.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3776 [2024-06-11 03:11:12,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.68 | bwd_microstep: 1437.28 | bwd_inner_microstep: 1437.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3529 [2024-06-11 03:11:14,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.87 | bwd_microstep: 1291.27 | bwd_inner_microstep: 1291.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-11 03:11:16,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.91 | bwd_microstep: 1417.43 | bwd_inner_microstep: 1417.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 03:11:17,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.06 | bwd_microstep: 1381.79 | bwd_inner_microstep: 1381.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 03:11:19,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.19 | bwd_microstep: 1378.30 | bwd_inner_microstep: 1378.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3653 [2024-06-11 03:11:22,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.59 | bwd_microstep: 1584.12 | bwd_inner_microstep: 1584.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297 [2024-06-11 03:11:23,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.01 | bwd_microstep: 878.06 | bwd_inner_microstep: 878.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-11 03:11:25,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.49 | bwd_microstep: 1590.94 | bwd_inner_microstep: 1590.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2391 [2024-06-11 03:11:26,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.04 | bwd_microstep: 935.28 | bwd_inner_microstep: 935.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3627 [2024-06-11 03:11:28,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1507.88 | bwd_inner_microstep: 1507.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2088 [2024-06-11 03:11:30,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.13 | bwd_microstep: 1014.30 | bwd_inner_microstep: 1014.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2933 [2024-06-11 03:11:31,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.28 | bwd_microstep: 1187.89 | bwd_inner_microstep: 1187.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2139 [2024-06-11 03:11:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.68 | bwd_microstep: 1025.19 | bwd_inner_microstep: 1025.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3615 [2024-06-11 03:11:35,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.44 | bwd_microstep: 1340.63 | bwd_inner_microstep: 1340.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-11 03:11:37,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.56 | bwd_microstep: 1635.13 | bwd_inner_microstep: 1635.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 03:11:39,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.40 | bwd_microstep: 1279.89 | bwd_inner_microstep: 1279.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 03:11:41,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.07 | bwd_microstep: 1352.77 | bwd_inner_microstep: 1352.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429 [2024-06-11 03:11:42,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.46 | bwd_microstep: 1153.31 | bwd_inner_microstep: 1153.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3628 [2024-06-11 03:11:44,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.55 | bwd_microstep: 1461.75 | bwd_inner_microstep: 1461.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497 [2024-06-11 03:11:46,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.88 | bwd_microstep: 1509.32 | bwd_inner_microstep: 1509.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641 [2024-06-11 03:11:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.94 | bwd_microstep: 1312.45 | bwd_inner_microstep: 1312.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3852 [2024-06-11 03:11:52,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.23 | optimizer_step: 6.61 [2024-06-11 03:11:52,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.89 | bwd_microstep: 3610.59 | bwd_inner_microstep: 1770.20 | bwd_allreduce_microstep: 1840.34 | step_microstep: 37.77 [2024-06-11 03:11:52,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15951.30 | bwd: 44648.22 | bwd_inner: 42806.98 | bwd_allreduce: 1840.56 | step: 39.19 {'loss': 1.1345, 'learning_rate': 1.406128814547929e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-11 03:11:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.48 | bwd_microstep: 1341.31 | bwd_inner_microstep: 1341.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3985 [2024-06-11 03:11:56,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1507.71 | bwd_inner_microstep: 1507.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3913 [2024-06-11 03:11:58,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.97 | bwd_microstep: 1636.04 | bwd_inner_microstep: 1636.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4247 [2024-06-11 03:12:01,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.64 | bwd_microstep: 1761.81 | bwd_inner_microstep: 1761.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1870 [2024-06-11 03:12:02,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.36 | bwd_microstep: 706.92 | bwd_inner_microstep: 706.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-11 03:12:03,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.91 | bwd_microstep: 1148.22 | bwd_inner_microstep: 1148.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 03:12:05,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1399.46 | bwd_inner_microstep: 1399.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1948 [2024-06-11 03:12:06,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.21 | bwd_microstep: 728.55 | bwd_inner_microstep: 728.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 03:12:08,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1282.71 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 03:12:09,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 792.14 | bwd_inner_microstep: 792.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-11 03:12:12,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.72 | bwd_microstep: 1624.03 | bwd_inner_microstep: 1624.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1877 [2024-06-11 03:12:12,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.21 | bwd_microstep: 679.04 | bwd_inner_microstep: 679.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 03:12:14,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1351.91 | bwd_inner_microstep: 1351.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3659 [2024-06-11 03:12:16,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.76 | bwd_microstep: 1441.63 | bwd_inner_microstep: 1441.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-11 03:12:18,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.63 | bwd_microstep: 1485.28 | bwd_inner_microstep: 1485.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 03:12:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.40 | bwd_microstep: 1481.11 | bwd_inner_microstep: 1481.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2050 [2024-06-11 03:12:22,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.40 | bwd_microstep: 939.44 | bwd_inner_microstep: 939.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2399 [2024-06-11 03:12:23,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.59 | bwd_microstep: 1057.62 | bwd_inner_microstep: 1057.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 03:12:25,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.99 | bwd_microstep: 1381.21 | bwd_inner_microstep: 1381.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009 [2024-06-11 03:12:26,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.03 | bwd_microstep: 739.71 | bwd_inner_microstep: 739.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3459 [2024-06-11 03:12:28,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.68 | bwd_microstep: 1211.52 | bwd_inner_microstep: 1211.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3710 [2024-06-11 03:12:30,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.19 | bwd_microstep: 1333.66 | bwd_inner_microstep: 1333.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 03:12:32,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1549.72 | bwd_inner_microstep: 1549.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-11 03:12:34,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.72 | bwd_microstep: 1252.28 | bwd_inner_microstep: 1252.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 03:12:35,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.35 | bwd_microstep: 1391.32 | bwd_inner_microstep: 1391.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3674 [2024-06-11 03:12:37,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.71 | bwd_microstep: 1421.47 | bwd_inner_microstep: 1421.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 03:12:39,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.29 | bwd_microstep: 1450.18 | bwd_inner_microstep: 1450.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3748 [2024-06-11 03:12:41,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.32 | bwd_microstep: 1340.30 | bwd_inner_microstep: 1340.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2057 [2024-06-11 03:12:42,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.66 | bwd_microstep: 850.00 | bwd_inner_microstep: 849.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3819 [2024-06-11 03:12:45,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.20 | bwd_microstep: 1817.20 | bwd_inner_microstep: 1817.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3762 [2024-06-11 03:12:47,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.39 | bwd_microstep: 1844.67 | bwd_inner_microstep: 1844.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-11 03:12:51,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.08 | optimizer_step: 6.64 [2024-06-11 03:12:51,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 3146.28 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 1577.45 | step_microstep: 37.77 [2024-06-11 03:12:51,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15485.47 | bwd: 43094.44 | bwd_inner: 41516.09 | bwd_allreduce: 1577.68 | step: 39.19 {'loss': 1.2023, 'learning_rate': 1.3923365344450002e-06, 'epoch': 0.88} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3457 [2024-06-11 03:12:53,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.75 | bwd_microstep: 1233.70 | bwd_inner_microstep: 1233.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4761 [2024-06-11 03:12:55,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.56 | bwd_microstep: 1779.07 | bwd_inner_microstep: 1779.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2015 [2024-06-11 03:12:56,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.37 | bwd_microstep: 801.00 | bwd_inner_microstep: 800.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3765 [2024-06-11 03:12:59,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.20 | bwd_microstep: 1534.92 | bwd_inner_microstep: 1534.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 03:13:01,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.45 | bwd_microstep: 1549.09 | bwd_inner_microstep: 1549.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 03:13:02,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1248.66 | bwd_inner_microstep: 1248.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4022 [2024-06-11 03:13:05,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.23 | bwd_microstep: 1710.64 | bwd_inner_microstep: 1710.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-11 03:13:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.66 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 03:13:09,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.70 | bwd_microstep: 1293.83 | bwd_inner_microstep: 1293.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-11 03:13:10,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.34 | bwd_microstep: 1278.01 | bwd_inner_microstep: 1277.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3911 [2024-06-11 03:13:13,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.09 | bwd_microstep: 1525.73 | bwd_inner_microstep: 1525.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3480 [2024-06-11 03:13:14,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.50 | bwd_microstep: 1311.73 | bwd_inner_microstep: 1311.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 03:13:16,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.48 | bwd_microstep: 1379.26 | bwd_inner_microstep: 1379.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-11 03:13:18,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1345.75 | bwd_inner_microstep: 1345.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3410 [2024-06-11 03:13:20,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.80 | bwd_microstep: 1366.67 | bwd_inner_microstep: 1366.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1900 [2024-06-11 03:13:21,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.07 | bwd_microstep: 713.03 | bwd_inner_microstep: 713.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-11 03:13:23,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.64 | bwd_microstep: 1354.36 | bwd_inner_microstep: 1354.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 03:13:24,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.61 | bwd_microstep: 1157.46 | bwd_inner_microstep: 1157.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-11 03:13:27,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.33 | bwd_microstep: 1523.08 | bwd_inner_microstep: 1523.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3689 [2024-06-11 03:13:28,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.76 | bwd_microstep: 1326.69 | bwd_inner_microstep: 1326.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 03:13:31,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.05 | bwd_microstep: 1554.30 | bwd_inner_microstep: 1554.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 03:13:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1399.77 | bwd_inner_microstep: 1399.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3622 [2024-06-11 03:13:34,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.37 | bwd_microstep: 1246.02 | bwd_inner_microstep: 1246.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3641 [2024-06-11 03:13:36,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.56 | bwd_microstep: 1411.84 | bwd_inner_microstep: 1411.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3532 [2024-06-11 03:13:38,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.21 | bwd_microstep: 1323.66 | bwd_inner_microstep: 1323.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 03:13:40,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.24 | bwd_microstep: 1349.12 | bwd_inner_microstep: 1349.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719 [2024-06-11 03:13:42,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.20 | bwd_microstep: 1625.10 | bwd_inner_microstep: 1625.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3597 [2024-06-11 03:13:44,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.98 | bwd_microstep: 1665.11 | bwd_inner_microstep: 1665.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3608 [2024-06-11 03:13:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.23 | bwd_microstep: 1575.10 | bwd_inner_microstep: 1575.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-11 03:13:49,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1489.76 | bwd_inner_microstep: 1489.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3586 [2024-06-11 03:13:51,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.73 | bwd_microstep: 1670.49 | bwd_inner_microstep: 1670.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3616 [2024-06-11 03:13:56,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.09 | optimizer_step: 6.58 [2024-06-11 03:13:56,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.03 | bwd_microstep: 4510.89 | bwd_inner_microstep: 1874.37 | bwd_allreduce_microstep: 2636.46 | step_microstep: 37.85 [2024-06-11 03:13:56,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16805.32 | bwd: 47742.74 | bwd_inner: 45105.37 | bwd_allreduce: 2636.70 | step: 39.40 {'loss': 1.1702, 'learning_rate': 1.3786097904360563e-06, 'epoch': 0.88} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 03:13:58,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1331.20 | bwd_inner_microstep: 1331.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3948 [2024-06-11 03:14:00,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.89 | bwd_microstep: 1497.64 | bwd_inner_microstep: 1497.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 03:14:02,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1275.39 | bwd_inner_microstep: 1275.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3871 [2024-06-11 03:14:04,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.71 | bwd_microstep: 1660.86 | bwd_inner_microstep: 1660.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 03:14:06,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.46 | bwd_microstep: 1273.47 | bwd_inner_microstep: 1273.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-11 03:14:08,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1247.45 | bwd_inner_microstep: 1247.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 03:14:09,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1388.51 | bwd_inner_microstep: 1388.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 03:14:11,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1413.76 | bwd_inner_microstep: 1413.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 03:14:13,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.49 | bwd_microstep: 1386.76 | bwd_inner_microstep: 1386.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1977 [2024-06-11 03:14:14,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.83 | bwd_microstep: 704.86 | bwd_inner_microstep: 704.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3412 [2024-06-11 03:14:16,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.25 | bwd_microstep: 1181.63 | bwd_inner_microstep: 1181.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3918 [2024-06-11 03:14:18,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.94 | bwd_microstep: 1788.54 | bwd_inner_microstep: 1788.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3670 [2024-06-11 03:14:21,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.75 | bwd_microstep: 1715.87 | bwd_inner_microstep: 1715.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3397 [2024-06-11 03:14:23,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.43 | bwd_microstep: 1432.91 | bwd_inner_microstep: 1432.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3603 [2024-06-11 03:14:25,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.98 | bwd_microstep: 1428.15 | bwd_inner_microstep: 1428.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3626 [2024-06-11 03:14:27,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1701.46 | bwd_inner_microstep: 1701.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3507 [2024-06-11 03:14:29,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.83 | bwd_microstep: 1542.94 | bwd_inner_microstep: 1542.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 03:14:30,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.04 | bwd_microstep: 793.67 | bwd_inner_microstep: 793.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2590 [2024-06-11 03:14:32,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 401.72 | bwd_microstep: 1072.12 | bwd_inner_microstep: 1072.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-11 03:14:34,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1491.73 | bwd_inner_microstep: 1491.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3525 [2024-06-11 03:14:36,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.80 | bwd_microstep: 1356.81 | bwd_inner_microstep: 1356.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3453 [2024-06-11 03:14:37,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.12 | bwd_microstep: 1159.39 | bwd_inner_microstep: 1159.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 03:14:39,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.78 | bwd_microstep: 1373.51 | bwd_inner_microstep: 1373.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 03:14:41,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.97 | bwd_microstep: 1395.55 | bwd_inner_microstep: 1395.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 03:14:43,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1280.29 | bwd_inner_microstep: 1280.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3601 [2024-06-11 03:14:45,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.69 | bwd_microstep: 1605.34 | bwd_inner_microstep: 1605.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-11 03:14:46,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.83 | bwd_microstep: 973.13 | bwd_inner_microstep: 973.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 03:14:49,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.37 | bwd_microstep: 1557.90 | bwd_inner_microstep: 1557.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3598 [2024-06-11 03:14:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.96 | bwd_microstep: 1449.59 | bwd_inner_microstep: 1449.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 03:14:53,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.31 | bwd_microstep: 1643.88 | bwd_inner_microstep: 1643.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3790 [2024-06-11 03:14:55,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.64 | bwd_microstep: 1639.75 | bwd_inner_microstep: 1639.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3819 [2024-06-11 03:14:57,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.05 | optimizer_step: 6.63 [2024-06-11 03:14:57,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.38 | bwd_microstep: 1695.37 | bwd_inner_microstep: 1687.68 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.60 [2024-06-11 03:14:57,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16580.00 | bwd: 44459.47 | bwd_inner: 44450.88 | bwd_allreduce: 7.88 | step: 39.11 {'loss': 1.1656, 'learning_rate': 1.3649486308666314e-06, 'epoch': 0.89} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3394 [2024-06-11 03:14:59,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.82 | bwd_microstep: 1147.32 | bwd_inner_microstep: 1147.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 03:15:01,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1385.08 | bwd_inner_microstep: 1385.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4108 [2024-06-11 03:15:03,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.49 | bwd_microstep: 1733.44 | bwd_inner_microstep: 1733.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3466 [2024-06-11 03:15:05,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.24 | bwd_microstep: 1210.27 | bwd_inner_microstep: 1210.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-11 03:15:07,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.04 | bwd_microstep: 1320.62 | bwd_inner_microstep: 1320.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1944 [2024-06-11 03:15:08,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.88 | bwd_microstep: 729.56 | bwd_inner_microstep: 729.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 03:15:10,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1387.37 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 03:15:12,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.86 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3657 [2024-06-11 03:15:14,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1424.76 | bwd_inner_microstep: 1424.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 03:15:15,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.10 | bwd_microstep: 1387.55 | bwd_inner_microstep: 1387.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 03:15:17,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1251.16 | bwd_inner_microstep: 1251.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2521 [2024-06-11 03:15:19,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.07 | bwd_microstep: 935.01 | bwd_inner_microstep: 934.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-11 03:15:20,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.79 | bwd_microstep: 797.04 | bwd_inner_microstep: 797.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 03:15:22,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.44 | bwd_microstep: 1513.68 | bwd_inner_microstep: 1513.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2150 [2024-06-11 03:15:23,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.16 | bwd_microstep: 1045.25 | bwd_inner_microstep: 1045.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 03:15:25,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1246.44 | bwd_inner_microstep: 1246.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-11 03:15:27,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.60 | bwd_microstep: 1419.24 | bwd_inner_microstep: 1419.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3626 [2024-06-11 03:15:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.99 | bwd_microstep: 1612.49 | bwd_inner_microstep: 1612.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531 [2024-06-11 03:15:31,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.31 | bwd_microstep: 1586.24 | bwd_inner_microstep: 1586.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2580 [2024-06-11 03:15:33,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.11 | bwd_microstep: 1046.81 | bwd_inner_microstep: 1046.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2432 [2024-06-11 03:15:34,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.70 | bwd_microstep: 879.04 | bwd_inner_microstep: 879.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 03:15:36,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2178 [2024-06-11 03:15:37,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.33 | bwd_microstep: 957.16 | bwd_inner_microstep: 957.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-11 03:15:39,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1532.06 | bwd_inner_microstep: 1532.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-11 03:15:41,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.61 | bwd_microstep: 1419.89 | bwd_inner_microstep: 1419.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3567 [2024-06-11 03:15:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.08 | bwd_microstep: 1530.43 | bwd_inner_microstep: 1530.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826 [2024-06-11 03:15:45,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.73 | bwd_microstep: 1491.57 | bwd_inner_microstep: 1491.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919 [2024-06-11 03:15:46,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.00 | bwd_microstep: 718.61 | bwd_inner_microstep: 718.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3573 [2024-06-11 03:15:49,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.12 | bwd_microstep: 1698.44 | bwd_inner_microstep: 1698.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3584 [2024-06-11 03:15:51,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.58 | bwd_microstep: 1501.28 | bwd_inner_microstep: 1501.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-11 03:15:52,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.23 | bwd_microstep: 793.68 | bwd_inner_microstep: 793.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-11 03:16:00,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.21 | optimizer_step: 6.58 [2024-06-11 03:16:00,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 7814.44 | bwd_inner_microstep: 1640.50 | bwd_allreduce_microstep: 6173.88 | step_microstep: 37.88 [2024-06-11 03:16:00,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15308.65 | bwd: 47178.40 | bwd_inner: 41003.60 | bwd_allreduce: 6174.11 | step: 39.37 �████▊ | 1524/1726 [26:33:28<3:27:43, 61.70s/it] 88%|████████▊ | 1524/1726 [26:33:28<3:27:43, 61.70s/it] 88%|████████▊ | 1525/1726 [26:34:29<3:25:55, 61.47s/it] 88%|████████▊ | 1525/1726 [26:34:29<3:25:55, 61.47s/it] 88%|████████▊ | 1526/1726 [26:35:28<3:22:20, 60.70s/it] 88%|████████▊ | 1526/1726 [26:35:28<3:22:20, 60.70s/it] 88%|████████▊ | 1527/1726 [26:36:33<3:25:29, 61.96s/it] 88%|████████▊ | 1527/1726 [26:36:33<3:25:29, 61.96s/it] 89%|████████▊ | 1528/1726 [26:37:34<3:23:53, 61.79s/it] 89%|████████▊ | 1528/1726 [26:37:34<3:23:53, 61.79s/it] {'loss': 1.1391, 'learning_rate': 1.3513531038512517e-06, 'epoch': 0.89} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 03:16:02,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.32 | bwd_microstep: 1472.87 | bwd_inner_microstep: 1472.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 03:16:04,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.62 | bwd_microstep: 1274.89 | bwd_inner_microstep: 1274.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 03:16:06,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.80 | bwd_microstep: 1546.11 | bwd_inner_microstep: 1546.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 03:16:08,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1245.00 | bwd_inner_microstep: 1244.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 03:16:10,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.25 | bwd_microstep: 1474.06 | bwd_inner_microstep: 1474.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 03:16:12,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.37 | bwd_microstep: 1344.24 | bwd_inner_microstep: 1344.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-11 03:16:14,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.85 | bwd_microstep: 1639.71 | bwd_inner_microstep: 1639.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-11 03:16:15,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.63 | bwd_microstep: 792.04 | bwd_inner_microstep: 792.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 03:16:17,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.82 | bwd_microstep: 1387.75 | bwd_inner_microstep: 1387.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3733 [2024-06-11 03:16:19,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1561.79 | bwd_inner_microstep: 1561.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3675 [2024-06-11 03:16:22,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.93 | bwd_microstep: 1821.72 | bwd_inner_microstep: 1821.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2326 [2024-06-11 03:16:23,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.43 | bwd_microstep: 919.52 | bwd_inner_microstep: 919.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 03:16:25,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1574.01 | bwd_inner_microstep: 1573.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-11 03:16:27,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1482.15 | bwd_inner_microstep: 1482.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 03:16:29,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.01 | bwd_microstep: 1484.49 | bwd_inner_microstep: 1484.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-11 03:16:30,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.33 | bwd_microstep: 798.37 | bwd_inner_microstep: 798.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 03:16:32,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.39 | bwd_microstep: 1392.45 | bwd_inner_microstep: 1392.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467 [2024-06-11 03:16:34,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.22 | bwd_microstep: 1184.54 | bwd_inner_microstep: 1184.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 03:16:36,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1497.77 | bwd_inner_microstep: 1497.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 03:16:38,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.98 | bwd_microstep: 1379.33 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-11 03:16:40,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.85 | bwd_microstep: 1409.47 | bwd_inner_microstep: 1409.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3695 [2024-06-11 03:16:42,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.32 | bwd_microstep: 1360.82 | bwd_inner_microstep: 1360.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609 [2024-06-11 03:16:44,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.62 | bwd_microstep: 1341.54 | bwd_inner_microstep: 1341.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-11 03:16:46,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.45 | bwd_microstep: 1538.47 | bwd_inner_microstep: 1538.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 03:16:48,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.10 | bwd_microstep: 1553.47 | bwd_inner_microstep: 1553.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3721 [2024-06-11 03:16:50,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 1382.24 | bwd_inner_microstep: 1382.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2246 [2024-06-11 03:16:51,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.71 | bwd_microstep: 1001.57 | bwd_inner_microstep: 1001.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-11 03:16:52,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 809.31 | bwd_inner_microstep: 809.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3802 [2024-06-11 03:16:54,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.48 | bwd_microstep: 1549.14 | bwd_inner_microstep: 1549.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2039 [2024-06-11 03:16:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.37 | bwd_microstep: 903.81 | bwd_inner_microstep: 903.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390 [2024-06-11 03:16:57,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.30 | bwd_microstep: 1336.80 | bwd_inner_microstep: 1336.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-11 03:17:23,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.34 | optimizer_step: 6.59 [2024-06-11 03:17:23,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 25065.52 | bwd_inner_microstep: 1801.84 | bwd_allreduce_microstep: 23263.57 | step_microstep: 40.43 [2024-06-11 03:17:23,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16060.36 | bwd: 66525.00 | bwd_inner: 43260.47 | bwd_allreduce: 23263.84 | step: 41.89 {'loss': 1.2178, 'learning_rate': 1.3378232572732985e-06, 'epoch': 0.89} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529 [2024-06-11 03:17:25,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.75 | bwd_microstep: 1436.06 | bwd_inner_microstep: 1435.97 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.14 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3964 [2024-06-11 03:17:28,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.65 | bwd_microstep: 1688.43 | bwd_inner_microstep: 1688.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2341 [2024-06-11 03:17:29,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.94 | bwd_microstep: 885.65 | bwd_inner_microstep: 885.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-11 03:17:31,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.61 | bwd_microstep: 1444.57 | bwd_inner_microstep: 1444.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2079 [2024-06-11 03:17:32,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.85 | bwd_microstep: 818.34 | bwd_inner_microstep: 818.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 03:17:34,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1352.69 | bwd_inner_microstep: 1352.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 03:17:35,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.26 | bwd_microstep: 1245.13 | bwd_inner_microstep: 1245.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3435 [2024-06-11 03:17:37,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.14 | bwd_microstep: 1154.28 | bwd_inner_microstep: 1154.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3710 [2024-06-11 03:17:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.75 | bwd_microstep: 1555.31 | bwd_inner_microstep: 1555.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-11 03:17:41,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.99 | bwd_microstep: 1344.64 | bwd_inner_microstep: 1344.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3411 [2024-06-11 03:17:43,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1342.12 | bwd_inner_microstep: 1342.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2107 [2024-06-11 03:17:44,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.36 | bwd_microstep: 1019.52 | bwd_inner_microstep: 1019.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 03:17:46,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1247.04 | bwd_inner_microstep: 1247.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-11 03:17:48,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1411.06 | bwd_inner_microstep: 1411.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3626 [2024-06-11 03:17:50,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.83 | bwd_microstep: 1365.58 | bwd_inner_microstep: 1365.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3695 [2024-06-11 03:17:52,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.32 | bwd_microstep: 1426.97 | bwd_inner_microstep: 1426.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526 [2024-06-11 03:17:54,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.68 | bwd_microstep: 1199.42 | bwd_inner_microstep: 1199.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3533 [2024-06-11 03:17:55,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1327.44 | bwd_inner_microstep: 1327.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-11 03:17:56,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.31 | bwd_microstep: 799.35 | bwd_inner_microstep: 799.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 03:17:59,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.10 | bwd_microstep: 1560.16 | bwd_inner_microstep: 1560.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3795 [2024-06-11 03:18:01,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1656.11 | bwd_inner_microstep: 1656.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-11 03:18:03,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1427.16 | bwd_inner_microstep: 1427.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3739 [2024-06-11 03:18:05,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1368.13 | bwd_inner_microstep: 1368.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-11 03:18:06,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.23 | bwd_microstep: 977.67 | bwd_inner_microstep: 977.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 893 [2024-06-11 03:18:07,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 142.69 | bwd_microstep: 369.62 | bwd_inner_microstep: 369.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-11 03:18:08,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.56 | bwd_microstep: 978.11 | bwd_inner_microstep: 978.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3431 [2024-06-11 03:18:10,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.08 | bwd_microstep: 1406.77 | bwd_inner_microstep: 1406.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-11 03:18:12,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.15 | bwd_microstep: 1432.49 | bwd_inner_microstep: 1432.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-11 03:18:13,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.58 | bwd_microstep: 701.56 | bwd_inner_microstep: 701.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3578 [2024-06-11 03:18:15,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 1542.92 | bwd_inner_microstep: 1542.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 03:18:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.95 | bwd_microstep: 1254.16 | bwd_inner_microstep: 1254.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3573 [2024-06-11 03:18:23,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.18 | optimizer_step: 6.62 [2024-06-11 03:18:23,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.54 | bwd_microstep: 5983.39 | bwd_inner_microstep: 1762.37 | bwd_allreduce_microstep: 4220.96 | step_microstep: 39.70 [2024-06-11 03:18:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15111.84 | bwd: 44721.88 | bwd_inner: 40499.93 | bwd_allreduce: 4221.25 | step: 41.28 {'loss': 1.1358, 'learning_rate': 1.3243591387848164e-06, 'epoch': 0.89} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 03:18:25,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.14 | bwd_microstep: 1367.30 | bwd_inner_microstep: 1367.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 03:18:27,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.95 | bwd_microstep: 1281.89 | bwd_inner_microstep: 1281.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-11 03:18:29,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.09 | bwd_microstep: 1436.69 | bwd_inner_microstep: 1436.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 03:18:31,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.93 | bwd_microstep: 1382.08 | bwd_inner_microstep: 1382.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-11 03:18:33,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.22 | bwd_microstep: 1211.82 | bwd_inner_microstep: 1211.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3472 [2024-06-11 03:18:35,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.17 | bwd_microstep: 1342.87 | bwd_inner_microstep: 1342.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3481 [2024-06-11 03:18:36,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1330.93 | bwd_inner_microstep: 1330.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1867 [2024-06-11 03:18:37,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.60 | bwd_microstep: 678.03 | bwd_inner_microstep: 678.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 03:18:39,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.69 | bwd_microstep: 1255.00 | bwd_inner_microstep: 1254.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 03:18:41,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.17 | bwd_microstep: 1347.39 | bwd_inner_microstep: 1347.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586 [2024-06-11 03:18:43,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.10 | bwd_microstep: 1607.21 | bwd_inner_microstep: 1607.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-11 03:18:45,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.72 | bwd_microstep: 1342.55 | bwd_inner_microstep: 1342.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2967 [2024-06-11 03:18:47,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 446.26 | bwd_microstep: 1198.57 | bwd_inner_microstep: 1198.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-11 03:18:49,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.11 | bwd_microstep: 1409.89 | bwd_inner_microstep: 1409.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 03:18:50,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1288.66 | bwd_inner_microstep: 1288.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 03:18:52,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.86 | bwd_microstep: 1387.08 | bwd_inner_microstep: 1387.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 03:18:54,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1292.10 | bwd_inner_microstep: 1292.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122 [2024-06-11 03:18:55,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.92 | bwd_microstep: 929.78 | bwd_inner_microstep: 929.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 03:18:57,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.12 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-11 03:18:59,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.59 | bwd_microstep: 1258.17 | bwd_inner_microstep: 1258.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 03:19:01,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.11 | bwd_microstep: 1287.75 | bwd_inner_microstep: 1287.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 03:19:03,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.81 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-11 03:19:05,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.52 | bwd_microstep: 1406.11 | bwd_inner_microstep: 1406.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-11 03:19:06,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.19 | bwd_microstep: 972.60 | bwd_inner_microstep: 972.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-11 03:19:07,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.28 | bwd_microstep: 973.73 | bwd_inner_microstep: 973.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3556 [2024-06-11 03:19:10,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1526.21 | bwd_inner_microstep: 1526.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 03:19:11,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1378.97 | bwd_inner_microstep: 1378.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2021 [2024-06-11 03:19:13,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 898.50 | bwd_inner_microstep: 898.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-11 03:19:15,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.57 | bwd_microstep: 1451.08 | bwd_inner_microstep: 1451.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1858 [2024-06-11 03:19:16,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.87 | bwd_microstep: 708.24 | bwd_inner_microstep: 708.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-11 03:19:18,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.74 | bwd_microstep: 1406.95 | bwd_inner_microstep: 1406.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-11 03:19:25,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-11 03:19:25,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 6521.19 | bwd_inner_microstep: 1643.28 | bwd_allreduce_microstep: 4877.86 | step_microstep: 40.24 [2024-06-11 03:19:25,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15267.29 | bwd: 45779.19 | bwd_inner: 40900.37 | bwd_allreduce: 4878.10 | step: 41.82 {'loss': 1.1787, 'learning_rate': 1.3109607958063641e-06, 'epoch': 0.89} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 03:19:27,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.98 | bwd_microstep: 1371.85 | bwd_inner_microstep: 1371.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3976 [2024-06-11 03:19:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.40 | bwd_microstep: 1603.48 | bwd_inner_microstep: 1603.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3959 [2024-06-11 03:19:31,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.00 | bwd_microstep: 1695.83 | bwd_inner_microstep: 1695.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3893 [2024-06-11 03:19:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.48 | bwd_microstep: 1583.64 | bwd_inner_microstep: 1583.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 03:19:35,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.69 | bwd_microstep: 1376.26 | bwd_inner_microstep: 1376.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3762 [2024-06-11 03:19:37,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.72 | bwd_microstep: 1539.09 | bwd_inner_microstep: 1539.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1882 [2024-06-11 03:19:38,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.84 | bwd_microstep: 680.16 | bwd_inner_microstep: 680.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 03:19:40,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1245.05 | bwd_inner_microstep: 1245.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3737 [2024-06-11 03:19:42,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.44 | bwd_microstep: 1534.41 | bwd_inner_microstep: 1534.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 03:19:43,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.91 | bwd_microstep: 796.09 | bwd_inner_microstep: 796.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3400 [2024-06-11 03:19:45,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.19 | bwd_microstep: 1290.86 | bwd_inner_microstep: 1290.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3904 [2024-06-11 03:19:47,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.72 | bwd_microstep: 1444.19 | bwd_inner_microstep: 1444.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3714 [2024-06-11 03:19:50,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 665.09 | bwd_microstep: 1831.24 | bwd_inner_microstep: 1831.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3382 [2024-06-11 03:19:51,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.15 | bwd_microstep: 1144.01 | bwd_inner_microstep: 1143.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 03:19:53,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.47 | bwd_microstep: 1484.09 | bwd_inner_microstep: 1484.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537 [2024-06-11 03:19:55,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.26 | bwd_microstep: 1325.85 | bwd_inner_microstep: 1325.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 03:19:57,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.91 | bwd_microstep: 1283.44 | bwd_inner_microstep: 1283.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 03:19:59,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.70 | bwd_microstep: 1388.47 | bwd_inner_microstep: 1388.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 03:20:01,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1386.70 | bwd_inner_microstep: 1386.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 03:20:02,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.63 | bwd_microstep: 1279.73 | bwd_inner_microstep: 1279.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-11 03:20:04,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1285.67 | bwd_inner_microstep: 1285.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3817 [2024-06-11 03:20:06,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.72 | bwd_microstep: 1259.76 | bwd_inner_microstep: 1259.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 03:20:08,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1353.02 | bwd_inner_microstep: 1352.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 03:20:10,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.90 | bwd_inner_microstep: 1283.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 03:20:12,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.79 | bwd_microstep: 1495.84 | bwd_inner_microstep: 1495.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 03:20:14,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.55 | bwd_microstep: 1557.38 | bwd_inner_microstep: 1557.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131 [2024-06-11 03:20:15,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 929.84 | bwd_inner_microstep: 929.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3819 [2024-06-11 03:20:17,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.14 | bwd_microstep: 1500.44 | bwd_inner_microstep: 1500.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-11 03:20:19,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.37 | bwd_microstep: 1553.27 | bwd_inner_microstep: 1553.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2241 [2024-06-11 03:20:21,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.44 | bwd_microstep: 928.06 | bwd_inner_microstep: 928.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3594 [2024-06-11 03:20:23,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.12 | bwd_microstep: 1671.49 | bwd_inner_microstep: 1671.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2031 [2024-06-11 03:20:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.61 [2024-06-11 03:20:29,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.06 | bwd_microstep: 5583.73 | bwd_inner_microstep: 1035.78 | bwd_allreduce_microstep: 4547.90 | step_microstep: 37.88 [2024-06-11 03:20:29,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16080.83 | bwd: 47686.87 | bwd_inner: 43138.03 | bwd_allreduce: 4548.14 | step: 39.43 {'loss': 1.1558, 'learning_rate': 1.297628275526832e-06, 'epoch': 0.89} 89%|████████▊ | 1529/1726 [26:38:37<3:23:52, 62.10s/it] 89%|████████▊ | 1529/1726 [26:38:37<3:23:52, 62.10s/it] 89%|████████▊ | 1530/1726 [26:40:00<3:43:16, 68.35s/it] 89%|████████▊ | 1530/1726 [26:40:00<3:43:16, 68.35s/it] 89%|████████▊ | 1531/1726 [26:41:00<3:34:09, 65.90s/it] 89%|████████▊ | 1531/1726 [26:41:00<3:34:09, 65.90s/it] 89%|████████▉ | 1532/1726 [26:42:02<3:28:41, 64.55s/it] 89%|████████▉ | 1532/1726 [26:42:02<3:28:41, 64.55s/it] 89%|████████▉ | 1533/1726 [26:43:06<3:27:12, 64.42s/it] 89%|████████▉ | 1533/1726 [26:43:06<3:27dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-11 03:20:31,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.51 | bwd_microstep: 1438.63 | bwd_inner_microstep: 1438.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2403 [2024-06-11 03:20:32,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.01 | bwd_microstep: 998.78 | bwd_inner_microstep: 998.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-11 03:20:34,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1559.78 | bwd_inner_microstep: 1559.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 03:20:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.11 | bwd_microstep: 1481.79 | bwd_inner_microstep: 1481.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 03:20:38,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.78 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-11 03:20:39,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.98 | bwd_microstep: 803.68 | bwd_inner_microstep: 803.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-11 03:20:41,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1150.11 | bwd_inner_microstep: 1150.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-11 03:20:43,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.55 | bwd_microstep: 1353.35 | bwd_inner_microstep: 1353.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 03:20:45,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.59 | bwd_microstep: 1284.82 | bwd_inner_microstep: 1284.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2955 [2024-06-11 03:20:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.59 | bwd_microstep: 1167.96 | bwd_inner_microstep: 1167.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410 [2024-06-11 03:20:48,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.73 | bwd_microstep: 1152.56 | bwd_inner_microstep: 1152.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 03:20:50,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1351.58 | bwd_inner_microstep: 1351.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3482 [2024-06-11 03:20:52,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1335.87 | bwd_inner_microstep: 1335.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-11 03:20:54,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1441.48 | bwd_inner_microstep: 1441.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 03:20:55,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1291.09 | bwd_inner_microstep: 1291.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3105 [2024-06-11 03:20:57,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.78 | bwd_microstep: 1147.36 | bwd_inner_microstep: 1147.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 03:20:59,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.69 | bwd_microstep: 1285.77 | bwd_inner_microstep: 1285.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-11 03:21:01,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.37 | bwd_microstep: 1507.85 | bwd_inner_microstep: 1507.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3450 [2024-06-11 03:21:02,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.29 | bwd_microstep: 1159.37 | bwd_inner_microstep: 1159.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 03:21:04,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1375.71 | bwd_inner_microstep: 1375.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3461 [2024-06-11 03:21:06,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.69 | bwd_microstep: 1279.41 | bwd_inner_microstep: 1279.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 03:21:08,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.65 | bwd_microstep: 1557.61 | bwd_inner_microstep: 1557.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 03:21:10,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.88 | bwd_microstep: 1256.43 | bwd_inner_microstep: 1256.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-11 03:21:12,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.83 | bwd_microstep: 1487.21 | bwd_inner_microstep: 1487.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 03:21:14,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.64 | bwd_microstep: 1557.95 | bwd_inner_microstep: 1557.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3808 [2024-06-11 03:21:16,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.85 | bwd_microstep: 1358.41 | bwd_inner_microstep: 1358.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2205 [2024-06-11 03:21:18,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.04 | bwd_microstep: 990.08 | bwd_inner_microstep: 990.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3728 [2024-06-11 03:21:19,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1367.60 | bwd_inner_microstep: 1367.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3593 [2024-06-11 03:21:21,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.54 | bwd_microstep: 1369.05 | bwd_inner_microstep: 1369.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2899 [2024-06-11 03:21:23,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.18 | bwd_microstep: 1278.69 | bwd_inner_microstep: 1278.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3600 [2024-06-11 03:21:25,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.86 | bwd_microstep: 1739.53 | bwd_inner_microstep: 1739.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3619 [2024-06-11 03:21:31,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.13 | optimizer_step: 6.62 [2024-06-11 03:21:31,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 4740.25 | bwd_inner_microstep: 1777.50 | bwd_allreduce_microstep: 2962.69 | step_microstep: 39.10 [2024-06-11 03:21:31,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15926.13 | bwd: 45647.62 | bwd_inner: 42684.02 | bwd_allreduce: 2962.92 | step: 40.70 {'loss': 1.2138, 'learning_rate': 1.2843616249032874e-06, 'epoch': 0.89} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5199 [2024-06-11 03:21:34,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 760.52 | bwd_microstep: 2014.23 | bwd_inner_microstep: 2014.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3976 [2024-06-11 03:21:36,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.32 | bwd_microstep: 1402.89 | bwd_inner_microstep: 1402.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3896 [2024-06-11 03:21:38,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.56 | bwd_microstep: 1611.95 | bwd_inner_microstep: 1611.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 03:21:40,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.18 | bwd_microstep: 1649.15 | bwd_inner_microstep: 1649.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 03:21:42,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.99 | bwd_microstep: 1281.80 | bwd_inner_microstep: 1281.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481 [2024-06-11 03:21:44,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.05 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 03:21:46,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.53 | bwd_microstep: 1283.74 | bwd_inner_microstep: 1283.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-11 03:21:48,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.93 | bwd_microstep: 1525.18 | bwd_inner_microstep: 1525.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3451 [2024-06-11 03:21:49,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1257.22 | bwd_inner_microstep: 1257.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3641 [2024-06-11 03:21:51,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.85 | bwd_microstep: 1350.91 | bwd_inner_microstep: 1350.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1975 [2024-06-11 03:21:52,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.74 | bwd_microstep: 829.91 | bwd_inner_microstep: 829.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3496 [2024-06-11 03:21:54,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1346.48 | bwd_inner_microstep: 1346.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-11 03:21:56,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-11 03:21:58,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.32 | bwd_microstep: 1609.58 | bwd_inner_microstep: 1609.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3407 [2024-06-11 03:22:00,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.52 | bwd_microstep: 1214.26 | bwd_inner_microstep: 1214.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3526 [2024-06-11 03:22:02,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1439.84 | bwd_inner_microstep: 1439.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1991 [2024-06-11 03:22:03,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.14 | bwd_microstep: 801.83 | bwd_inner_microstep: 801.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2007 [2024-06-11 03:22:04,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.28 | bwd_microstep: 803.48 | bwd_inner_microstep: 803.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876 [2024-06-11 03:22:06,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.73 | bwd_microstep: 1585.06 | bwd_inner_microstep: 1585.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-11 03:22:09,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1522.17 | bwd_inner_microstep: 1522.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3820 [2024-06-11 03:22:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.98 | bwd_microstep: 1355.20 | bwd_inner_microstep: 1355.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 03:22:13,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.32 | bwd_microstep: 1560.01 | bwd_inner_microstep: 1559.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 03:22:14,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1282.88 | bwd_inner_microstep: 1282.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 03:22:16,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1414.59 | bwd_inner_microstep: 1414.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3605 [2024-06-11 03:22:18,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1506.03 | bwd_inner_microstep: 1506.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3456 [2024-06-11 03:22:20,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1545.90 | bwd_inner_microstep: 1545.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2028 [2024-06-11 03:22:22,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.05 | bwd_microstep: 836.50 | bwd_inner_microstep: 836.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3644 [2024-06-11 03:22:24,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.36 | bwd_microstep: 1575.74 | bwd_inner_microstep: 1575.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-11 03:22:26,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1589.68 | bwd_inner_microstep: 1589.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808 [2024-06-11 03:22:28,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.54 | bwd_microstep: 1384.67 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3594 [2024-06-11 03:22:30,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.38 | bwd_microstep: 1675.91 | bwd_inner_microstep: 1675.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3809 [2024-06-11 03:22:33,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.09 | optimizer_step: 6.63 [2024-06-11 03:22:33,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.98 | bwd_microstep: 1790.67 | bwd_inner_microstep: 1782.61 | bwd_allreduce_microstep: 8.00 | step_microstep: 38.48 [2024-06-11 03:22:33,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16707.26 | bwd: 44802.40 | bwd_inner: 44793.49 | bwd_allreduce: 8.23 | step: 40.07 {'loss': 1.1549, 'learning_rate': 1.2711608906608098e-06, 'epoch': 0.89} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-11 03:22:35,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.24 | bwd_microstep: 1477.97 | bwd_inner_microstep: 1477.77 | bwd_allreduce_microstep: 0.10 | step_microstep: 0.18 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1876 [2024-06-11 03:22:36,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.55 | bwd_microstep: 680.30 | bwd_inner_microstep: 680.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4299 [2024-06-11 03:22:38,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.34 | bwd_microstep: 1546.22 | bwd_inner_microstep: 1546.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3850 [2024-06-11 03:22:40,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.86 | bwd_microstep: 1660.63 | bwd_inner_microstep: 1660.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480 [2024-06-11 03:22:42,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.59 | bwd_microstep: 1186.21 | bwd_inner_microstep: 1186.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 03:22:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.66 | bwd_microstep: 1248.12 | bwd_inner_microstep: 1248.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-11 03:22:46,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.28 | bwd_microstep: 1539.38 | bwd_inner_microstep: 1539.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2376 [2024-06-11 03:22:47,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.12 | bwd_microstep: 1028.59 | bwd_inner_microstep: 1028.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3405 [2024-06-11 03:22:49,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.04 | bwd_microstep: 1402.40 | bwd_inner_microstep: 1402.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 03:22:51,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1348.46 | bwd_inner_microstep: 1348.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 03:22:53,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1389.59 | bwd_inner_microstep: 1389.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3917 [2024-06-11 03:22:55,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.48 | bwd_microstep: 1603.26 | bwd_inner_microstep: 1603.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3637 [2024-06-11 03:22:57,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.58 | bwd_microstep: 1607.15 | bwd_inner_microstep: 1607.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3415 [2024-06-11 03:22:59,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.96 | bwd_microstep: 1376.08 | bwd_inner_microstep: 1376.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3675 [2024-06-11 03:23:01,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.28 | bwd_microstep: 1726.09 | bwd_inner_microstep: 1726.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 03:23:03,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.73 | bwd_microstep: 1391.80 | bwd_inner_microstep: 1391.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3531 [2024-06-11 03:23:06,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.35 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-11 03:23:07,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 788.77 | bwd_inner_microstep: 788.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 03:23:09,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1371.60 | bwd_inner_microstep: 1371.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3501 [2024-06-11 03:23:11,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.44 | bwd_microstep: 1630.46 | bwd_inner_microstep: 1630.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 03:23:13,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.92 | bwd_microstep: 1380.75 | bwd_inner_microstep: 1380.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 03:23:15,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.84 | bwd_microstep: 1484.48 | bwd_inner_microstep: 1484.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3534 [2024-06-11 03:23:17,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.31 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3460 [2024-06-11 03:23:19,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.80 | bwd_microstep: 1325.16 | bwd_inner_microstep: 1325.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-11 03:23:21,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.93 | bwd_microstep: 1647.05 | bwd_inner_microstep: 1647.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 03:23:23,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.88 | bwd_microstep: 1518.26 | bwd_inner_microstep: 1518.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3810 [2024-06-11 03:23:25,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.50 | bwd_microstep: 1514.17 | bwd_inner_microstep: 1514.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-11 03:23:27,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1453.27 | bwd_inner_microstep: 1453.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3724 [2024-06-11 03:23:29,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.70 | bwd_microstep: 1440.04 | bwd_inner_microstep: 1440.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-11 03:23:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1495.79 | bwd_inner_microstep: 1495.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3558 [2024-06-11 03:23:33,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1402.84 | bwd_inner_microstep: 1402.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 03:23:35,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.02 | optimizer_step: 6.62 [2024-06-11 03:23:35,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.01 | bwd_microstep: 1294.50 | bwd_inner_microstep: 1286.21 | bwd_allreduce_microstep: 8.24 | step_microstep: 37.58 [2024-06-11 03:23:35,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16841.03 | bwd: 45109.74 | bwd_inner: 45100.46 | bwd_allreduce: 8.55 | step: 39.24 {'loss': 1.16, 'learning_rate': 1.2580261192923126e-06, 'epoch': 0.89} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1887 [2024-06-11 03:23:36,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.57 | bwd_microstep: 837.37 | bwd_inner_microstep: 837.24 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3976 [2024-06-11 03:23:38,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.84 | bwd_microstep: 1604.83 | bwd_inner_microstep: 1604.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 03:23:40,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1247.78 | bwd_inner_microstep: 1247.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-11 03:23:41,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.16 | bwd_microstep: 695.01 | bwd_inner_microstep: 694.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787 [2024-06-11 03:23:43,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.05 | bwd_microstep: 1648.31 | bwd_inner_microstep: 1648.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 03:23:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1280.64 | bwd_inner_microstep: 1280.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 03:23:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.95 | bwd_microstep: 1282.16 | bwd_inner_microstep: 1282.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705 [2024-06-11 03:23:49,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1525.57 | bwd_inner_microstep: 1525.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 03:23:51,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.84 | bwd_microstep: 1289.87 | bwd_inner_microstep: 1289.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-11 03:23:53,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.49 | bwd_microstep: 1278.76 | bwd_inner_microstep: 1278.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 03:23:54,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.79 | bwd_microstep: 1373.47 | bwd_inner_microstep: 1373.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3504 [2024-06-11 03:23:57,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.35 | bwd_microstep: 1576.06 | bwd_inner_microstep: 1576.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3508 [2024-06-11 03:23:59,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.06 | bwd_microstep: 1410.04 | bwd_inner_microstep: 1410.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3510 [2024-06-11 03:24:01,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.11 | bwd_microstep: 1555.76 | bwd_inner_microstep: 1555.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-11 03:24:03,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.32 | bwd_microstep: 1490.76 | bwd_inner_microstep: 1490.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3661 [2024-06-11 03:24:05,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1385.24 | bwd_inner_microstep: 1385.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 03:24:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1385.98 | bwd_inner_microstep: 1385.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3528 [2024-06-11 03:24:08,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.19 | bwd_microstep: 1325.63 | bwd_inner_microstep: 1325.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2010 [2024-06-11 03:24:09,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.45 | bwd_microstep: 741.95 | bwd_inner_microstep: 741.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 03:24:11,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.79 | bwd_microstep: 1461.31 | bwd_inner_microstep: 1461.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-11 03:24:14,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1510.71 | bwd_inner_microstep: 1510.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 03:24:15,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.60 | bwd_microstep: 1386.77 | bwd_inner_microstep: 1386.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 03:24:17,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1389.78 | bwd_inner_microstep: 1389.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 03:24:19,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.89 | bwd_microstep: 1284.73 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1940 [2024-06-11 03:24:20,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.99 | bwd_microstep: 767.31 | bwd_inner_microstep: 767.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 03:24:22,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.90 | bwd_microstep: 1554.82 | bwd_inner_microstep: 1554.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-11 03:24:25,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.31 | bwd_microstep: 1580.31 | bwd_inner_microstep: 1580.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 03:24:27,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.65 | bwd_microstep: 1603.49 | bwd_inner_microstep: 1603.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3776 [2024-06-11 03:24:29,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.50 | bwd_microstep: 1592.29 | bwd_inner_microstep: 1592.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2698 [2024-06-11 03:24:31,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.49 | bwd_microstep: 1229.01 | bwd_inner_microstep: 1228.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-11 03:24:33,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.60 | bwd_microstep: 1594.41 | bwd_inner_microstep: 1594.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3006 [2024-06-11 03:24:38,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-11 03:24:38,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.15 | bwd_microstep: 4795.08 | bwd_inner_microstep: 1401.91 | bwd_allreduce_microstep: 3393.08 | step_microstep: 40.55 [2024-06-11 03:24:38,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16131.20 | bwd: 46685.25 | bwd_inner: 43291.11 | bwd_allreduce: 3393.39 | step: 42.22 {'loss': 1.1929, 'learning_rate': 1.244957357058394e-06, 'epoch': 0.89} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 03:24:40,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.68 | bwd_microstep: 1375.13 | bwd_inner_microstep: 1375.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3947 [2024-06-11 03:24:42,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1455.71 | bwd_inner_microstep: 1455.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 03:24:44,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.38 | bwd_microstep: 1238.93 | bwd_inner_microstep: 1238.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-11 03:24:46,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.13 | bwd_microstep: 1542.84 | bwd_inner_microstep: 1542.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3766 [2024-06-11 03:24:48,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.11 | bwd_microstep: 1640.38 | bwd_inner_microstep: 1640.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3499 [2024-06-11 03:24:50,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.27 | bwd_microstep: 1190.79 | bwd_inner_microstep: 1190.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 03:24:52,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.79 | bwd_microstep: 1251.29 | bwd_inner_microstep: 1251.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-11 03:24:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.71 | bwd_microstep: 1289.06 | bwd_inner_microstep: 1289.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3693 [2024-06-11 03:24:55,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1388.57 | bwd_inner_microstep: 1388.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553 [2024-06-11 03:24:57,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1427.40 | bwd_inner_microstep: 1427.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 03:24:59,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.08 | bwd_microstep: 1515.26 | bwd_inner_microstep: 1515.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 03:25:01,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.44 | bwd_microstep: 1379.99 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3500 [2024-06-11 03:25:03,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.39 | bwd_microstep: 1584.80 | bwd_inner_microstep: 1584.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3497 [2024-06-11 03:25:06,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.07 | bwd_microstep: 1583.33 | bwd_inner_microstep: 1583.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3512 [2024-06-11 03:25:08,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.84 | bwd_microstep: 1413.06 | bwd_inner_microstep: 1413.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1982 [2024-06-11 03:25:09,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.30 | bwd_microstep: 800.88 | bwd_inner_microstep: 800.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843 [2024-06-11 03:25:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.49 | bwd_microstep: 1663.11 | bwd_inner_microstep: 1663.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3835 [2024-06-11 03:25:13,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.74 | bwd_microstep: 1465.43 | bwd_inner_microstep: 1465.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2125 [2024-06-11 03:25:14,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.40 | bwd_microstep: 767.69 | bwd_inner_microstep: 767.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 03:25:16,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 1655.68 | bwd_inner_microstep: 1655.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-11 03:25:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.21 | bwd_microstep: 1160.93 | bwd_inner_microstep: 1160.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2110 [2024-06-11 03:25:19,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.31 | bwd_microstep: 857.71 | bwd_inner_microstep: 857.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3722 [2024-06-11 03:25:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1242.41 | bwd_inner_microstep: 1242.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3608 [2024-06-11 03:25:23,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1440.00 | bwd_inner_microstep: 1439.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3549 [2024-06-11 03:25:25,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.51 | bwd_microstep: 1236.76 | bwd_inner_microstep: 1236.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-11 03:25:26,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.82 | bwd_microstep: 1399.32 | bwd_inner_microstep: 1399.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3812 [2024-06-11 03:25:29,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.80 | bwd_microstep: 1500.96 | bwd_inner_microstep: 1500.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3599 [2024-06-11 03:25:31,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.98 | bwd_microstep: 1599.76 | bwd_inner_microstep: 1599.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3751 [2024-06-11 03:25:33,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1535.45 | bwd_inner_microstep: 1535.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 03:25:35,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.58 | bwd_microstep: 1649.14 | bwd_inner_microstep: 1649.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570 [2024-06-11 03:25:37,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.21 | bwd_microstep: 1600.03 | bwd_inner_microstep: 1600.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2198 [2024-06-11 03:25:40,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.10 | optimizer_step: 6.60 [2024-06-11 03:25:40,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.12 | bwd_microstep: 2577.27 | bwd_inner_microstep: 1089.83 | bwd_allreduce_microstep: 1487.38 | step_microstep: 38.63 [2024-06-11 03:25:40,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16374.46 | bwd: 45429.09 | bwd_inner: 43940.80 | bwd_allreduce: 1487.61 | step: 40.37 {'loss': 1.1995, 'learning_rate': 1.2319546499871616e-06, 'epoch': 0.89} :12, 64.42s/it] 89%|████████▉ | 1534/1726 [26:44:08<3:23:44, 63.67s/it] 89%|████████▉ | 1534/1726 [26:44:08<3:23:44, 63.67s/it] 89%|████████▉ | 1535/1726 [26:45:09<3:20:56, 63.12s/it] 89%|████████▉ | 1535/1726 [26:45:09<3:20:56, 63.12s/it] 89%|████████▉ | 1536/1726 [26:46:12<3:19:06, 62.88s/it] 89%|████████▉ | 1536/1726 [26:46:12<3:19:06, 62.88s/it] 89%|████████▉ | 1537/1726 [26:47:15<3:18:20, 62.97s/it] 89%|████████▉ | 1537/1726 [26:47:15<3:18:20, 62.97s/it] 89%|████████▉ | 1538/1726 [26:48:17<3:16:32, 62.73s/it] 89%|████████▉ | 1538/172dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-11 03:25:41,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.39 | bwd_microstep: 781.81 | bwd_inner_microstep: 781.74 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.08 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3984 [2024-06-11 03:25:44,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.36 | bwd_microstep: 1801.59 | bwd_inner_microstep: 1801.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3907 [2024-06-11 03:25:46,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.25 | bwd_microstep: 1655.41 | bwd_inner_microstep: 1655.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-11 03:25:48,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1351.18 | bwd_inner_microstep: 1351.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842 [2024-06-11 03:25:50,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.40 | bwd_microstep: 1560.50 | bwd_inner_microstep: 1560.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-11 03:25:52,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.09 | bwd_microstep: 1299.09 | bwd_inner_microstep: 1299.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-11 03:25:54,176] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.03 | bwd_microstep: 1214.49 | bwd_inner_microstep: 1214.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:25:56,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1384.89 | bwd_inner_microstep: 1384.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603 [2024-06-11 03:25:57,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.30 | bwd_microstep: 1310.91 | bwd_inner_microstep: 1310.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3489 [2024-06-11 03:25:59,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.73 | bwd_microstep: 1187.23 | bwd_inner_microstep: 1187.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 03:26:01,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1286.38 | bwd_inner_microstep: 1286.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2117 [2024-06-11 03:26:02,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.87 | bwd_microstep: 734.41 | bwd_inner_microstep: 734.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 03:26:04,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.85 | bwd_microstep: 1349.31 | bwd_inner_microstep: 1349.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 03:26:05,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.97 | bwd_microstep: 1245.25 | bwd_inner_microstep: 1245.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3384 [2024-06-11 03:26:07,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.67 | bwd_microstep: 1366.34 | bwd_inner_microstep: 1366.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 03:26:09,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1341.40 | bwd_inner_microstep: 1341.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3593 [2024-06-11 03:26:11,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.58 | bwd_microstep: 1311.96 | bwd_inner_microstep: 1311.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3639 [2024-06-11 03:26:13,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.50 | bwd_microstep: 1563.40 | bwd_inner_microstep: 1563.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3835 [2024-06-11 03:26:15,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.94 | bwd_microstep: 1585.28 | bwd_inner_microstep: 1585.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 03:26:17,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1399.58 | bwd_inner_microstep: 1399.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-11 03:26:19,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1460.97 | bwd_inner_microstep: 1460.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-11 03:26:21,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.00 | bwd_microstep: 1290.54 | bwd_inner_microstep: 1290.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-11 03:26:22,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.38 | bwd_microstep: 914.99 | bwd_inner_microstep: 914.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-11 03:26:24,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.59 | bwd_microstep: 1295.74 | bwd_inner_microstep: 1295.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3555 [2024-06-11 03:26:26,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.15 | bwd_microstep: 1497.67 | bwd_inner_microstep: 1497.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3818 [2024-06-11 03:26:28,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1586.30 | bwd_inner_microstep: 1586.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3556 [2024-06-11 03:26:30,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1458.59 | bwd_inner_microstep: 1458.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3652 [2024-06-11 03:26:32,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1386.90 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3613 [2024-06-11 03:26:34,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.06 | bwd_microstep: 1543.07 | bwd_inner_microstep: 1543.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2185 [2024-06-11 03:26:36,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.93 | bwd_microstep: 956.10 | bwd_inner_microstep: 956.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3752 [2024-06-11 03:26:38,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.46 | bwd_microstep: 1437.39 | bwd_inner_microstep: 1437.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-11 03:26:44,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.31 | optimizer_step: 6.60 [2024-06-11 03:26:44,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 5166.98 | bwd_inner_microstep: 1621.65 | bwd_allreduce_microstep: 3545.22 | step_microstep: 40.45 [2024-06-11 03:26:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16122.43 | bwd: 46725.68 | bwd_inner: 43179.41 | bwd_allreduce: 3545.51 | step: 42.17 {'loss': 1.1667, 'learning_rate': 1.2190180438740895e-06, 'epoch': 0.89} dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3483 [2024-06-11 03:26:46,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.63 | bwd_microstep: 1525.19 | bwd_inner_microstep: 1524.99 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.19 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4108 [2024-06-11 03:26:48,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.30 | bwd_microstep: 1627.87 | bwd_inner_microstep: 1627.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3883 [2024-06-11 03:26:50,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1383.18 | bwd_inner_microstep: 1383.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 03:26:52,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.21 | bwd_microstep: 1392.90 | bwd_inner_microstep: 1392.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 03:26:54,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1488.37 | bwd_inner_microstep: 1488.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3783 [2024-06-11 03:26:56,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.96 | bwd_microstep: 1546.97 | bwd_inner_microstep: 1546.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1991 [2024-06-11 03:26:57,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.38 | bwd_microstep: 835.08 | bwd_inner_microstep: 835.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888 [2024-06-11 03:26:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.60 | bwd_microstep: 681.29 | bwd_inner_microstep: 681.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 03:27:00,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.51 | bwd_microstep: 1485.80 | bwd_inner_microstep: 1485.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 03:27:02,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1505.33 | bwd_inner_microstep: 1505.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2006 [2024-06-11 03:27:03,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.69 | bwd_microstep: 774.82 | bwd_inner_microstep: 774.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 03:27:05,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.35 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2485 [2024-06-11 03:27:07,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.08 | bwd_microstep: 1143.91 | bwd_inner_microstep: 1143.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-11 03:27:08,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.05 | bwd_microstep: 1346.20 | bwd_inner_microstep: 1346.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3397 [2024-06-11 03:27:10,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.93 | bwd_microstep: 1306.27 | bwd_inner_microstep: 1306.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587 [2024-06-11 03:27:12,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1338.32 | bwd_inner_microstep: 1338.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1997 [2024-06-11 03:27:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 897.76 | bwd_inner_microstep: 897.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4024 [2024-06-11 03:27:15,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.64 | bwd_microstep: 1418.92 | bwd_inner_microstep: 1418.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2125 [2024-06-11 03:27:17,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.44 | bwd_microstep: 959.74 | bwd_inner_microstep: 959.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 03:27:18,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.52 | bwd_microstep: 1163.44 | bwd_inner_microstep: 1163.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 03:27:20,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1282.36 | bwd_inner_microstep: 1282.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3626 [2024-06-11 03:27:22,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.45 | bwd_microstep: 1219.54 | bwd_inner_microstep: 1219.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 03:27:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.82 | bwd_microstep: 1163.11 | bwd_inner_microstep: 1163.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-11 03:27:25,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1302.25 | bwd_inner_microstep: 1302.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 03:27:27,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1563.45 | bwd_inner_microstep: 1563.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3834 [2024-06-11 03:27:29,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.26 | bwd_microstep: 1490.04 | bwd_inner_microstep: 1490.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-11 03:27:31,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.95 | bwd_microstep: 1217.70 | bwd_inner_microstep: 1217.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 03:27:33,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.04 | bwd_microstep: 1557.07 | bwd_inner_microstep: 1557.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-11 03:27:35,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1527.05 | bwd_inner_microstep: 1527.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2131 [2024-06-11 03:27:36,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.28 | bwd_microstep: 836.25 | bwd_inner_microstep: 836.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 03:27:38,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.44 | bwd_microstep: 1467.63 | bwd_inner_microstep: 1467.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-11 03:27:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.26 | optimizer_step: 6.61 [2024-06-11 03:27:45,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 6341.49 | bwd_inner_microstep: 1870.93 | bwd_allreduce_microstep: 4470.50 | step_microstep: 39.55 [2024-06-11 03:27:45,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15472.25 | bwd: 46076.08 | bwd_inner: 41604.51 | bwd_allreduce: 4470.82 | step: 41.38 {'loss': 1.1129, 'learning_rate': 1.2061475842818337e-06, 'epoch': 0.89} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1911 [2024-06-11 03:27:47,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.73 | bwd_microstep: 865.58 | bwd_inner_microstep: 865.47 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2913 [2024-06-11 03:27:48,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.13 | bwd_microstep: 1085.85 | bwd_inner_microstep: 1085.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3848 [2024-06-11 03:27:50,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.35 | bwd_microstep: 1490.89 | bwd_inner_microstep: 1490.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4116 [2024-06-11 03:27:53,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.76 | bwd_microstep: 1734.38 | bwd_inner_microstep: 1734.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3762 [2024-06-11 03:27:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.00 | bwd_microstep: 1566.26 | bwd_inner_microstep: 1566.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 03:27:56,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1247.67 | bwd_inner_microstep: 1247.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 03:27:58,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.43 | bwd_microstep: 1276.81 | bwd_inner_microstep: 1276.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705 [2024-06-11 03:28:00,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.79 | bwd_microstep: 1624.10 | bwd_inner_microstep: 1624.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3419 [2024-06-11 03:28:02,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.48 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2178 [2024-06-11 03:28:03,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.80 | bwd_microstep: 885.68 | bwd_inner_microstep: 885.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-11 03:28:05,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.92 | bwd_microstep: 1354.20 | bwd_inner_microstep: 1354.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 03:28:07,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.39 | bwd_microstep: 1382.48 | bwd_inner_microstep: 1382.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005 [2024-06-11 03:28:08,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.62 | bwd_microstep: 830.41 | bwd_inner_microstep: 830.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-11 03:28:10,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.08 | bwd_microstep: 1488.56 | bwd_inner_microstep: 1488.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3988 [2024-06-11 03:28:13,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.04 | bwd_microstep: 1700.83 | bwd_inner_microstep: 1700.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-11 03:28:15,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1482.84 | bwd_inner_microstep: 1482.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3416 [2024-06-11 03:28:17,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.25 | bwd_microstep: 1471.58 | bwd_inner_microstep: 1471.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-11 03:28:19,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1511.16 | bwd_inner_microstep: 1511.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3469 [2024-06-11 03:28:21,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.11 | bwd_microstep: 1428.05 | bwd_inner_microstep: 1428.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-11 03:28:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.88 | bwd_microstep: 1416.82 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 03:28:25,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.19 | bwd_microstep: 1390.73 | bwd_inner_microstep: 1390.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3713 [2024-06-11 03:28:27,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.49 | bwd_microstep: 1461.65 | bwd_inner_microstep: 1461.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-11 03:28:28,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 802.25 | bwd_inner_microstep: 802.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 03:28:30,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.67 | bwd_microstep: 1554.71 | bwd_inner_microstep: 1554.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 03:28:32,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1402.29 | bwd_inner_microstep: 1402.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-11 03:28:33,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.34 | bwd_microstep: 972.86 | bwd_inner_microstep: 972.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-11 03:28:35,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.08 | bwd_microstep: 1216.29 | bwd_inner_microstep: 1216.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 03:28:37,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1493.88 | bwd_inner_microstep: 1493.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2352 [2024-06-11 03:28:38,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.49 | bwd_microstep: 892.46 | bwd_inner_microstep: 892.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3559 [2024-06-11 03:28:40,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.36 | bwd_microstep: 1233.07 | bwd_inner_microstep: 1233.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1990 [2024-06-11 03:28:41,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.50 | bwd_microstep: 736.60 | bwd_inner_microstep: 736.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3736 [2024-06-11 03:28:48,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.32 | optimizer_step: 6.60 [2024-06-11 03:28:48,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.19 | bwd_microstep: 5831.18 | bwd_inner_microstep: 1619.94 | bwd_allreduce_microstep: 4211.19 | step_microstep: 39.31 [2024-06-11 03:28:48,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15628.90 | bwd: 46128.75 | bwd_inner: 41916.56 | bwd_allreduce: 4211.47 | step: 40.83 {'loss': 1.176, 'learning_rate': 1.1933433165400854e-06, 'epoch': 0.89} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 03:28:49,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.58 | bwd_microstep: 1334.86 | bwd_inner_microstep: 1334.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 03:28:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1381.62 | bwd_inner_microstep: 1381.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 03:28:53,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.67 | bwd_microstep: 1548.17 | bwd_inner_microstep: 1548.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-11 03:28:56,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.50 | bwd_microstep: 1550.22 | bwd_inner_microstep: 1550.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 03:28:57,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.09 | bwd_microstep: 1283.25 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3728 [2024-06-11 03:28:59,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1437.48 | bwd_inner_microstep: 1437.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 03:29:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.99 | bwd_microstep: 1388.71 | bwd_inner_microstep: 1388.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 03:29:03,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.28 | bwd_microstep: 1289.70 | bwd_inner_microstep: 1289.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-11 03:29:05,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.05 | bwd_microstep: 1533.26 | bwd_inner_microstep: 1533.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3408 [2024-06-11 03:29:07,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.34 | bwd_microstep: 1325.17 | bwd_inner_microstep: 1325.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3689 [2024-06-11 03:29:09,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.07 | bwd_microstep: 1615.89 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-11 03:29:11,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1447.53 | bwd_inner_microstep: 1447.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3511 [2024-06-11 03:29:13,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.87 | bwd_microstep: 1585.70 | bwd_inner_microstep: 1585.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087 [2024-06-11 03:29:15,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.28 | bwd_microstep: 1014.80 | bwd_inner_microstep: 1014.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3651 [2024-06-11 03:29:17,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.74 | bwd_microstep: 1382.37 | bwd_inner_microstep: 1382.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3606 [2024-06-11 03:29:18,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1311.24 | bwd_inner_microstep: 1311.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-11 03:29:21,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.17 | bwd_microstep: 1609.22 | bwd_inner_microstep: 1609.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3623 [2024-06-11 03:29:23,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.39 | bwd_microstep: 1612.64 | bwd_inner_microstep: 1612.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 03:29:25,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.72 | bwd_microstep: 1257.34 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557 [2024-06-11 03:29:27,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.73 | bwd_microstep: 1428.82 | bwd_inner_microstep: 1428.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2435 [2024-06-11 03:29:28,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.91 | bwd_microstep: 854.72 | bwd_inner_microstep: 854.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3527 [2024-06-11 03:29:30,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.40 | bwd_microstep: 1229.73 | bwd_inner_microstep: 1229.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3725 [2024-06-11 03:29:32,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 1442.91 | bwd_inner_microstep: 1442.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3808 [2024-06-11 03:29:33,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.65 | bwd_microstep: 1291.08 | bwd_inner_microstep: 1291.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3709 [2024-06-11 03:29:35,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.07 | bwd_microstep: 1436.86 | bwd_inner_microstep: 1436.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-11 03:29:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.56 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3544 [2024-06-11 03:29:39,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.97 | bwd_microstep: 1297.65 | bwd_inner_microstep: 1297.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-11 03:29:41,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.05 | bwd_microstep: 1475.61 | bwd_inner_microstep: 1475.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-11 03:29:43,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1357.10 | bwd_inner_microstep: 1357.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 2957 [2024-06-11 03:29:45,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.77 | bwd_microstep: 1362.45 | bwd_inner_microstep: 1362.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3608 [2024-06-11 03:29:47,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1572.28 | bwd_inner_microstep: 1572.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2953 [2024-06-11 03:29:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.05 | optimizer_step: 6.62 [2024-06-11 03:29:49,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.02 | bwd_microstep: 1383.49 | bwd_inner_microstep: 1322.79 | bwd_allreduce_microstep: 60.65 | step_microstep: 38.32 [2024-06-11 03:29:49,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16579.07 | bwd: 44446.84 | bwd_inner: 44385.29 | bwd_allreduce: 60.88 | step: 39.83 {'loss': 1.145, 'learning_rate': 1.1806052857454087e-06, 'epoch': 0.89} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 03:29:51,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.07 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4154 [2024-06-11 03:29:53,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.89 | bwd_microstep: 1650.20 | bwd_inner_microstep: 1650.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 03:29:55,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1247.87 | bwd_inner_microstep: 1247.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-11 03:29:57,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1493.52 | bwd_inner_microstep: 1493.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.39 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 03:29:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1479.00 | bwd_inner_microstep: 1478.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2217 [2024-06-11 03:30:00,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.99 | bwd_microstep: 893.12 | bwd_inner_microstep: 893.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 03:30:02,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1388.02 | bwd_inner_microstep: 1388.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 03:30:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.99 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590 [2024-06-11 03:30:06,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.24 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1990 [2024-06-11 03:30:07,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 901.97 | bwd_inner_microstep: 901.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 03:30:09,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.15 | bwd_microstep: 1382.41 | bwd_inner_microstep: 1382.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3379 [2024-06-11 03:30:11,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.65 | bwd_microstep: 1241.86 | bwd_inner_microstep: 1241.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3414 [2024-06-11 03:30:12,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.08 | bwd_microstep: 1186.70 | bwd_inner_microstep: 1186.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-11 03:30:14,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.72 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3398 [2024-06-11 03:30:16,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1407.71 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 03:30:18,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.38 | bwd_microstep: 1344.19 | bwd_inner_microstep: 1344.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3593 [2024-06-11 03:30:20,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.50 | bwd_microstep: 1212.41 | bwd_inner_microstep: 1212.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3978 [2024-06-11 03:30:22,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.12 | bwd_microstep: 1613.58 | bwd_inner_microstep: 1613.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1997 [2024-06-11 03:30:23,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.64 | bwd_microstep: 786.69 | bwd_inner_microstep: 786.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3676 [2024-06-11 03:30:25,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.22 | bwd_microstep: 1280.82 | bwd_inner_microstep: 1280.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 03:30:27,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.12 | bwd_microstep: 1280.14 | bwd_inner_microstep: 1280.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3702 [2024-06-11 03:30:29,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.63 | bwd_microstep: 1628.54 | bwd_inner_microstep: 1628.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3546 [2024-06-11 03:30:31,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.88 | bwd_microstep: 1427.13 | bwd_inner_microstep: 1427.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 03:30:33,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.15 | bwd_microstep: 1557.26 | bwd_inner_microstep: 1557.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3787 [2024-06-11 03:30:35,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.39 | bwd_microstep: 1354.63 | bwd_inner_microstep: 1354.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2177 [2024-06-11 03:30:36,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.87 | bwd_microstep: 888.97 | bwd_inner_microstep: 888.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564 [2024-06-11 03:30:38,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.55 | bwd_microstep: 1428.40 | bwd_inner_microstep: 1428.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-11 03:30:39,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.41 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-11 03:30:41,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.88 | bwd_microstep: 1534.57 | bwd_inner_microstep: 1534.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3803 [2024-06-11 03:30:43,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.11 | bwd_microstep: 1500.43 | bwd_inner_microstep: 1500.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424 [2024-06-11 03:30:45,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.69 | bwd_microstep: 1545.29 | bwd_inner_microstep: 1545.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3809 [2024-06-11 03:30:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-11 03:30:49,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.48 | bwd_microstep: 2439.74 | bwd_inner_microstep: 2093.18 | bwd_allreduce_microstep: 346.50 | step_microstep: 38.88 [2024-06-11 03:30:49,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15974.28 | bwd: 43383.99 | bwd_inner: 43036.59 | bwd_allreduce: 346.73 | step: 41.78 {'loss': 1.1578, 'learning_rate': 1.1679335367610855e-06, 'epoch': 0.89} 6 [26:48:17<3:16:32, 62.73s/it] 89%|████████▉ | 1539/1726 [26:49:20<3:15:56, 62.87s/it] 89%|████████▉ | 1539/1726 [26:49:20<3:15:56, 62.87s/it] 89%|████████▉ | 1540/1726 [26:50:22<3:13:59, 62.58s/it] 89%|████████▉ | 1540/1726 [26:50:22<3:13:59, 62.58s/it] 89%|████████▉ | 1541/1726 [26:51:24<3:12:30, 62.43s/it] 89%|████████▉ | 1541/1726 [26:51:24<3:12:30, 62.43s/it] 89%|████████▉ | 1542/1726 [26:52:26<3:10:29, 62.12s/it] 89%|████████▉ | 1542/1726 [26:52:26<3:10:29, 62.12s/it] 89%|████████▉ | 1543/1726 [26:53:25<3:07:14, 61.39s/it] 89%|███████�dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-11 03:30:50,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.83 | bwd_microstep: 1274.11 | bwd_inner_microstep: 1273.92 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 03:30:52,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3866 [2024-06-11 03:30:54,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.90 | bwd_microstep: 1568.69 | bwd_inner_microstep: 1568.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3784 [2024-06-11 03:30:56,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.55 | bwd_microstep: 1349.23 | bwd_inner_microstep: 1349.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-11 03:30:58,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1389.77 | bwd_inner_microstep: 1389.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3736 [2024-06-11 03:31:00,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.43 | bwd_microstep: 1533.68 | bwd_inner_microstep: 1533.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-11 03:31:01,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.83 | bwd_microstep: 797.68 | bwd_inner_microstep: 797.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920 [2024-06-11 03:31:02,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.64 | bwd_microstep: 689.84 | bwd_inner_microstep: 689.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 03:31:04,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.88 | bwd_microstep: 1291.11 | bwd_inner_microstep: 1291.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2176 [2024-06-11 03:31:06,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.36 | bwd_microstep: 1013.26 | bwd_inner_microstep: 1013.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 03:31:08,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2141 [2024-06-11 03:31:09,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.01 | bwd_microstep: 929.43 | bwd_inner_microstep: 929.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3525 [2024-06-11 03:31:11,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.83 | bwd_microstep: 1541.90 | bwd_inner_microstep: 1541.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674 [2024-06-11 03:31:13,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1620.81 | bwd_inner_microstep: 1620.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2991 [2024-06-11 03:31:15,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.17 | bwd_microstep: 1200.91 | bwd_inner_microstep: 1200.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3642 [2024-06-11 03:31:17,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.49 | bwd_microstep: 1713.24 | bwd_inner_microstep: 1713.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3649 [2024-06-11 03:31:20,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.37 | bwd_microstep: 1716.11 | bwd_inner_microstep: 1716.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-11 03:31:22,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.23 | bwd_microstep: 1476.77 | bwd_inner_microstep: 1476.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-11 03:31:24,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.95 | bwd_microstep: 1581.39 | bwd_inner_microstep: 1581.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-11 03:31:26,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.78 | bwd_microstep: 1557.16 | bwd_inner_microstep: 1557.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-11 03:31:28,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.42 | bwd_microstep: 1472.39 | bwd_inner_microstep: 1472.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-11 03:31:30,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.91 | bwd_microstep: 1495.20 | bwd_inner_microstep: 1495.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3532 [2024-06-11 03:31:32,789] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.51 | bwd_microstep: 1687.92 | bwd_inner_microstep: 1687.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 03:31:34,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.43 | bwd_microstep: 1486.83 | bwd_inner_microstep: 1486.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 03:31:37,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.86 | bwd_microstep: 1646.78 | bwd_inner_microstep: 1646.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2085 [2024-06-11 03:31:38,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.44 | bwd_microstep: 820.66 | bwd_inner_microstep: 820.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-11 03:31:40,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1453.33 | bwd_inner_microstep: 1453.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-11 03:31:42,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.10 | bwd_microstep: 1452.33 | bwd_inner_microstep: 1452.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-11 03:31:44,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1309.91 | bwd_inner_microstep: 1309.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3812 [2024-06-11 03:31:46,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.11 | bwd_microstep: 1487.05 | bwd_inner_microstep: 1487.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-11 03:31:48,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1409.75 | bwd_inner_microstep: 1409.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-11 03:31:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.07 | optimizer_step: 6.64 [2024-06-11 03:31:50,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.91 | bwd_microstep: 2277.62 | bwd_inner_microstep: 1780.84 | bwd_allreduce_microstep: 496.74 | step_microstep: 37.70 [2024-06-11 03:31:50,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16529.10 | bwd: 45016.37 | bwd_inner: 44518.58 | bwd_allreduce: 497.04 | step: 39.26 {'loss': 1.1787, 'learning_rate': 1.155328114216947e-06, 'epoch': 0.89} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3400 [2024-06-11 03:31:52,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.59 | bwd_microstep: 1303.97 | bwd_inner_microstep: 1303.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:31:54,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.61 | bwd_microstep: 1383.98 | bwd_inner_microstep: 1383.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3401 [2024-06-11 03:31:56,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.89 | bwd_microstep: 1209.42 | bwd_inner_microstep: 1209.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3853 [2024-06-11 03:31:58,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.57 | bwd_microstep: 1370.06 | bwd_inner_microstep: 1370.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-11 03:31:59,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.17 | bwd_microstep: 798.06 | bwd_inner_microstep: 798.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749 [2024-06-11 03:32:01,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.26 | bwd_microstep: 1641.03 | bwd_inner_microstep: 1641.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-11 03:32:03,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1286.88 | bwd_inner_microstep: 1286.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3708 [2024-06-11 03:32:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.24 | bwd_microstep: 1426.48 | bwd_inner_microstep: 1426.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3707 [2024-06-11 03:32:07,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.60 | bwd_microstep: 1630.64 | bwd_inner_microstep: 1630.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696 [2024-06-11 03:32:09,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.32 | bwd_microstep: 1531.32 | bwd_inner_microstep: 1531.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-11 03:32:10,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.72 | bwd_microstep: 799.15 | bwd_inner_microstep: 799.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 03:32:12,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.79 | bwd_microstep: 1401.72 | bwd_inner_microstep: 1401.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1888 [2024-06-11 03:32:13,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.20 | bwd_microstep: 686.26 | bwd_inner_microstep: 686.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3632 [2024-06-11 03:32:15,685] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.84 | bwd_microstep: 1409.18 | bwd_inner_microstep: 1409.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 03:32:17,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.38 | bwd_microstep: 1391.75 | bwd_inner_microstep: 1391.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 03:32:19,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.49 | bwd_microstep: 1586.89 | bwd_inner_microstep: 1586.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-11 03:32:21,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.32 | bwd_microstep: 1280.87 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3426 [2024-06-11 03:32:23,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.20 | bwd_microstep: 1371.69 | bwd_inner_microstep: 1371.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3703 [2024-06-11 03:32:25,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.86 | bwd_microstep: 1727.57 | bwd_inner_microstep: 1727.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3429 [2024-06-11 03:32:27,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.03 | bwd_microstep: 1513.23 | bwd_inner_microstep: 1513.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706 [2024-06-11 03:32:30,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.72 | bwd_microstep: 1532.33 | bwd_inner_microstep: 1532.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-11 03:32:32,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.34 | bwd_microstep: 1537.03 | bwd_inner_microstep: 1537.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-11 03:32:34,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.54 | bwd_microstep: 1403.77 | bwd_inner_microstep: 1403.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2919 [2024-06-11 03:32:35,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.32 | bwd_microstep: 1130.62 | bwd_inner_microstep: 1130.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3561 [2024-06-11 03:32:37,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1465.20 | bwd_inner_microstep: 1465.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 03:32:39,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.33 | bwd_microstep: 1375.98 | bwd_inner_microstep: 1375.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 03:32:41,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1403.45 | bwd_inner_microstep: 1403.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-11 03:32:43,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1515.83 | bwd_inner_microstep: 1515.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-11 03:32:45,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1301.67 | bwd_inner_microstep: 1301.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2049 [2024-06-11 03:32:46,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.47 | bwd_microstep: 843.12 | bwd_inner_microstep: 843.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3620 [2024-06-11 03:32:48,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.50 | bwd_microstep: 1516.13 | bwd_inner_microstep: 1516.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-11 03:33:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.64 | optimizer_step: 6.61 [2024-06-11 03:33:10,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.22 | bwd_microstep: 21402.31 | bwd_inner_microstep: 1867.99 | bwd_allreduce_microstep: 19534.22 | step_microstep: 41.51 [2024-06-11 03:33:10,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16231.18 | bwd: 63177.58 | bwd_inner: 43642.27 | bwd_allreduce: 19534.55 | step: 43.09 {'loss': 1.1732, 'learning_rate': 1.1427890625092265e-06, 'epoch': 0.9} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4614 [2024-06-11 03:33:13,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.84 | bwd_microstep: 1650.82 | bwd_inner_microstep: 1650.72 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3891 [2024-06-11 03:33:15,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.88 | bwd_microstep: 1578.77 | bwd_inner_microstep: 1578.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-11 03:33:17,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.89 | bwd_microstep: 1340.04 | bwd_inner_microstep: 1340.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3766 [2024-06-11 03:33:19,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.34 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 03:33:20,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1372.82 | bwd_inner_microstep: 1372.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3484 [2024-06-11 03:33:22,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.23 | bwd_microstep: 1217.54 | bwd_inner_microstep: 1217.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1898 [2024-06-11 03:33:23,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.65 | bwd_microstep: 712.48 | bwd_inner_microstep: 712.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3559 [2024-06-11 03:33:25,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1490.87 | bwd_inner_microstep: 1490.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-11 03:33:27,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1277.58 | bwd_inner_microstep: 1277.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3573 [2024-06-11 03:33:29,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.79 | bwd_microstep: 1497.08 | bwd_inner_microstep: 1497.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-11 03:33:31,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1409.99 | bwd_inner_microstep: 1409.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3645 [2024-06-11 03:33:33,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.17 | bwd_microstep: 1441.51 | bwd_inner_microstep: 1441.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1971 [2024-06-11 03:33:34,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.13 | bwd_microstep: 891.19 | bwd_inner_microstep: 891.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3684 [2024-06-11 03:33:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.58 | bwd_microstep: 1551.81 | bwd_inner_microstep: 1551.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-11 03:33:54,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.41 | bwd_microstep: 1242.85 | bwd_inner_microstep: 1242.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2296 [2024-06-11 03:33:55,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.67 | bwd_microstep: 1001.40 | bwd_inner_microstep: 1001.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-11 03:33:57,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.82 | bwd_microstep: 1281.05 | bwd_inner_microstep: 1281.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 1078 [2024-06-11 03:33:57,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 161.97 | bwd_microstep: 416.85 | bwd_inner_microstep: 416.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 03:33:59,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.37 | bwd_microstep: 1388.50 | bwd_inner_microstep: 1388.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-11 03:34:01,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.08 | bwd_microstep: 1488.42 | bwd_inner_microstep: 1488.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-11 03:34:03,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 1393.41 | bwd_inner_microstep: 1393.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 03:34:05,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.88 | bwd_microstep: 1387.41 | bwd_inner_microstep: 1387.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-11 03:34:07,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1346.13 | bwd_inner_microstep: 1346.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3604 [2024-06-11 03:34:09,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1537.12 | bwd_inner_microstep: 1537.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 03:34:10,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.29 | bwd_microstep: 975.52 | bwd_inner_microstep: 975.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3899 [2024-06-11 03:34:13,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.23 | bwd_microstep: 1522.40 | bwd_inner_microstep: 1522.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009 [2024-06-11 03:34:14,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.33 | bwd_microstep: 739.91 | bwd_inner_microstep: 739.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1463 [2024-06-11 03:34:14,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.44 | bwd_microstep: 541.97 | bwd_inner_microstep: 541.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817 [2024-06-11 03:34:17,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.48 | bwd_microstep: 1757.44 | bwd_inner_microstep: 1757.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592 [2024-06-11 03:34:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.75 | bwd_microstep: 1663.90 | bwd_inner_microstep: 1663.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3582 [2024-06-11 03:34:21,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.14 | bwd_microstep: 1626.53 | bwd_inner_microstep: 1626.04 | bwd_allreduce_microstep: 0.24 | step_microstep: 0.37 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-11 03:35:17,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.61 [2024-06-11 03:35:17,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.24 | bwd_microstep: 55470.59 | bwd_inner_microstep: 1809.39 | bwd_allreduce_microstep: 53661.12 | step_microstep: 39.50 [2024-06-11 03:35:17,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15664.74 | bwd: 95657.25 | bwd_inner: 41994.71 | bwd_allreduce: 53661.69 | step: 41.62 {'loss': 1.1604, 'learning_rate': 1.1303164258003974e-06, 'epoch': 0.9} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3459 [2024-06-11 03:35:19,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.33 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-11 03:35:21,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.51 | bwd_microstep: 1475.02 | bwd_inner_microstep: 1474.59 | bwd_allreduce_microstep: 0.22 | step_microstep: 0.36 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3791 [2024-06-11 03:35:23,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.73 | bwd_microstep: 1540.04 | bwd_inner_microstep: 1540.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4138 [2024-06-11 03:35:26,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.01 | bwd_microstep: 1533.92 | bwd_inner_microstep: 1533.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4080 [2024-06-11 03:35:28,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.83 | bwd_microstep: 1718.82 | bwd_inner_microstep: 1718.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 03:35:30,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1377.14 | bwd_inner_microstep: 1377.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 03:35:32,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.65 | bwd_microstep: 1627.97 | bwd_inner_microstep: 1627.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-11 03:35:33,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.30 | bwd_microstep: 797.39 | bwd_inner_microstep: 797.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429 [2024-06-11 03:35:35,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.86 | bwd_microstep: 1278.05 | bwd_inner_microstep: 1278.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2428 [2024-06-11 03:35:36,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.82 | bwd_microstep: 1097.30 | bwd_inner_microstep: 1097.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3687 [2024-06-11 03:35:39,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.03 | bwd_microstep: 1722.75 | bwd_inner_microstep: 1722.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3692 [2024-06-11 03:35:41,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.20 | bwd_microstep: 1564.56 | bwd_inner_microstep: 1564.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2128 [2024-06-11 03:35:42,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.48 | bwd_microstep: 826.05 | bwd_inner_microstep: 826.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 03:35:44,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.21 | bwd_microstep: 1372.76 | bwd_inner_microstep: 1372.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3496 [2024-06-11 03:35:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.98 | bwd_microstep: 1574.97 | bwd_inner_microstep: 1574.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3880 [2024-06-11 03:35:49,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1686.66 | bwd_inner_microstep: 1686.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-11 03:35:50,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.24 | bwd_microstep: 1254.32 | bwd_inner_microstep: 1254.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 03:35:52,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.28 | bwd_microstep: 1558.32 | bwd_inner_microstep: 1558.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2063 [2024-06-11 03:35:54,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.87 | bwd_microstep: 845.49 | bwd_inner_microstep: 845.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3462 [2024-06-11 03:35:55,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.68 | bwd_microstep: 1310.44 | bwd_inner_microstep: 1310.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1999 [2024-06-11 03:35:57,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.92 | bwd_microstep: 800.40 | bwd_inner_microstep: 800.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-11 03:35:58,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.89 | bwd_microstep: 804.46 | bwd_inner_microstep: 804.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 03:36:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.80 | bwd_microstep: 1515.88 | bwd_inner_microstep: 1515.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-11 03:36:02,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1286.98 | bwd_inner_microstep: 1286.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3597 [2024-06-11 03:36:04,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1438.66 | bwd_inner_microstep: 1438.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 03:36:05,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.51 | bwd_microstep: 1283.27 | bwd_inner_microstep: 1283.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3709 [2024-06-11 03:36:08,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.43 | bwd_microstep: 1696.83 | bwd_inner_microstep: 1696.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3558 [2024-06-11 03:36:10,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.61 | bwd_microstep: 1561.77 | bwd_inner_microstep: 1561.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 03:36:12,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.35 | bwd_microstep: 1403.28 | bwd_inner_microstep: 1403.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806 [2024-06-11 03:36:14,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.60 | bwd_microstep: 1521.92 | bwd_inner_microstep: 1521.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3585 [2024-06-11 03:36:16,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.24 | bwd_microstep: 1651.83 | bwd_inner_microstep: 1651.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-11 03:36:19,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.07 | optimizer_step: 6.57 [2024-06-11 03:36:19,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.61 | bwd_microstep: 2428.44 | bwd_inner_microstep: 1661.03 | bwd_allreduce_microstep: 767.36 | step_microstep: 38.48 [2024-06-11 03:36:19,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16401.55 | bwd: 44972.16 | bwd_inner: 44203.55 | bwd_allreduce: 767.80 | step: 40.42 {'loss': 1.1832, 'learning_rate': 1.1179102480190208e-06, 'epoch': 0.9} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1921 [2024-06-11 03:36:20,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.37 | bwd_microstep: 817.87 | bwd_inner_microstep: 817.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 03:36:22,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.90 | bwd_microstep: 1278.64 | bwd_inner_microstep: 1278.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3872 [2024-06-11 03:36:24,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.30 | bwd_microstep: 1466.40 | bwd_inner_microstep: 1466.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-11 03:36:26,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.21 | bwd_microstep: 1376.93 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-11 03:36:27,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.32 | bwd_microstep: 716.55 | bwd_inner_microstep: 716.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 03:36:29,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1382.55 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-11 03:36:31,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.54 | bwd_microstep: 1640.18 | bwd_inner_microstep: 1640.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1872 [2024-06-11 03:36:32,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.47 | bwd_microstep: 678.35 | bwd_inner_microstep: 678.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1453 [2024-06-11 03:36:33,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 227.82 | bwd_microstep: 601.57 | bwd_inner_microstep: 601.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3406 [2024-06-11 03:36:35,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.01 | bwd_microstep: 1340.19 | bwd_inner_microstep: 1340.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-11 03:36:37,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.00 | bwd_microstep: 1507.83 | bwd_inner_microstep: 1507.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2126 [2024-06-11 03:36:38,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.34 | bwd_microstep: 1022.02 | bwd_inner_microstep: 1021.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 03:36:40,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.31 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3656 [2024-06-11 03:36:43,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 647.06 | bwd_microstep: 1783.22 | bwd_inner_microstep: 1783.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505 [2024-06-11 03:36:44,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.57 | bwd_microstep: 1191.52 | bwd_inner_microstep: 1191.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3495 [2024-06-11 03:36:46,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.34 | bwd_microstep: 1189.74 | bwd_inner_microstep: 1189.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3498 [2024-06-11 03:36:48,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1251.23 | bwd_inner_microstep: 1251.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-11 03:36:50,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.10 | bwd_microstep: 1451.89 | bwd_inner_microstep: 1451.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3446 [2024-06-11 03:36:51,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.39 | bwd_microstep: 1189.01 | bwd_inner_microstep: 1188.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3772 [2024-06-11 03:36:53,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.81 | bwd_microstep: 1250.47 | bwd_inner_microstep: 1250.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3818 [2024-06-11 03:36:55,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.02 | bwd_microstep: 1292.14 | bwd_inner_microstep: 1292.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-11 03:36:57,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1286.44 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145 [2024-06-11 03:36:58,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.99 | bwd_microstep: 852.67 | bwd_inner_microstep: 852.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2141 [2024-06-11 03:36:59,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.97 | bwd_microstep: 867.26 | bwd_inner_microstep: 867.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-11 03:37:01,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.75 | bwd_microstep: 1566.37 | bwd_inner_microstep: 1566.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-11 03:37:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 875.82 | bwd_inner_microstep: 875.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-11 03:37:05,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.29 | bwd_microstep: 1639.70 | bwd_inner_microstep: 1639.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3052 [2024-06-11 03:37:06,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.10 | bwd_microstep: 1172.37 | bwd_inner_microstep: 1172.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3414 [2024-06-11 03:37:08,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.29 | bwd_microstep: 1444.93 | bwd_inner_microstep: 1444.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2270 [2024-06-11 03:37:09,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.80 | bwd_microstep: 969.46 | bwd_inner_microstep: 969.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-11 03:37:11,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.82 | bwd_microstep: 1438.11 | bwd_inner_microstep: 1438.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 03:37:19,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.36 | optimizer_step: 6.59 [2024-06-11 03:37:19,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.53 | bwd_microstep: 7130.91 | bwd_inner_microstep: 1526.22 | bwd_allreduce_microstep: 5604.61 | step_microstep: 39.60 [2024-06-11 03:37:19,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14716.45 | bwd: 45010.89 | bwd_inner: 39405.26 | bwd_allreduce: 5604.89 | step: 41.15 {'loss': 1.185, 'learning_rate': 1.1055705728595955e-06, 'epoch': 0.9} ��▉ | 1543/1726 [26:53:25<3:07:14, 61.39s/it] 89%|████████▉ | 1544/1726 [26:54:27<3:06:40, 61.54s/it] 89%|████████▉ | 1544/1726 [26:54:27<3:06:40, 61.54s/it] 90%|████████▉ | 1545/1726 [26:55:47<3:22:08, 67.01s/it] 90%|████████▉ | 1545/1726 [26:55:47<3:22:08, 67.01s/it] 90%|████████▉ | 1546/1726 [26:57:54<4:15:06, 85.04s/it] 90%|████████▉ | 1546/1726 [26:57:54<4:15:06, 85.04s/it] 90%|████████▉ | 1547/1726 [26:58:56<3:52:50, 78.05s/it] 90%|████████▉ | 1547/1726 [26:58:56<3:52:50, 78.05s/it] 90%|████████▉ | 1548/1726 [26:59:56<3:35:32, 72.65s/it] 90%|██dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2454 [2024-06-11 03:37:21,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.28 | bwd_microstep: 1034.26 | bwd_inner_microstep: 1034.16 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3084 [2024-06-11 03:37:22,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.42 | bwd_microstep: 1088.02 | bwd_inner_microstep: 1087.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 03:37:24,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.24 | bwd_microstep: 1478.75 | bwd_inner_microstep: 1478.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.92 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 03:37:25,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.53 | bwd_microstep: 790.57 | bwd_inner_microstep: 790.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-11 03:37:27,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1345.10 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:37:29,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.39 | bwd_microstep: 1385.12 | bwd_inner_microstep: 1385.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 03:37:31,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.84 | bwd_microstep: 1341.68 | bwd_inner_microstep: 1341.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 03:37:33,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.76 | bwd_microstep: 1283.57 | bwd_inner_microstep: 1283.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-11 03:37:35,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1433.31 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443 [2024-06-11 03:37:37,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1450.22 | bwd_inner_microstep: 1450.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 03:37:38,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 794.56 | bwd_inner_microstep: 794.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2148 [2024-06-11 03:37:39,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.32 | bwd_microstep: 820.92 | bwd_inner_microstep: 820.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3565 [2024-06-11 03:37:41,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.84 | bwd_microstep: 1331.67 | bwd_inner_microstep: 1331.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3485 [2024-06-11 03:37:43,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.77 | bwd_microstep: 1545.24 | bwd_inner_microstep: 1545.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-11 03:37:45,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1487.59 | bwd_inner_microstep: 1487.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542 [2024-06-11 03:37:47,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.82 | bwd_microstep: 1455.40 | bwd_inner_microstep: 1455.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1922 [2024-06-11 03:37:48,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.61 | bwd_microstep: 725.92 | bwd_inner_microstep: 725.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3500 [2024-06-11 03:37:50,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1250.42 | bwd_inner_microstep: 1250.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-11 03:37:51,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.24 | bwd_microstep: 805.12 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3509 [2024-06-11 03:37:53,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.01 | bwd_microstep: 1529.13 | bwd_inner_microstep: 1529.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3421 [2024-06-11 03:37:55,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.71 | bwd_microstep: 1314.87 | bwd_inner_microstep: 1314.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3797 [2024-06-11 03:37:57,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.42 | bwd_microstep: 1616.98 | bwd_inner_microstep: 1616.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 03:37:59,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.52 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612 [2024-06-11 03:38:01,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.11 | bwd_microstep: 1347.56 | bwd_inner_microstep: 1347.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3689 [2024-06-11 03:38:03,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.10 | bwd_microstep: 1392.81 | bwd_inner_microstep: 1392.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-11 03:38:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.31 | bwd_microstep: 1646.21 | bwd_inner_microstep: 1646.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3943 [2024-06-11 03:38:07,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.20 | bwd_microstep: 1803.08 | bwd_inner_microstep: 1803.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2073 [2024-06-11 03:38:08,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.75 | bwd_microstep: 818.83 | bwd_inner_microstep: 818.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3763 [2024-06-11 03:38:11,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.23 | bwd_microstep: 1646.94 | bwd_inner_microstep: 1646.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-11 03:38:13,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1647.44 | bwd_inner_microstep: 1647.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2272 [2024-06-11 03:38:14,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.14 | bwd_microstep: 874.56 | bwd_inner_microstep: 874.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 03:38:20,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.23 | optimizer_step: 6.58 [2024-06-11 03:38:20,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.55 | bwd_microstep: 5620.07 | bwd_inner_microstep: 1458.34 | bwd_allreduce_microstep: 4161.66 | step_microstep: 38.54 [2024-06-11 03:38:20,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15401.67 | bwd: 45482.64 | bwd_inner: 41319.97 | bwd_allreduce: 4161.95 | step: 42.17 {'loss': 1.2078, 'learning_rate': 1.0932974437823884e-06, 'epoch': 0.9} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3412 [2024-06-11 03:38:22,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1301.53 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 03:38:24,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.25 | bwd_microstep: 1342.50 | bwd_inner_microstep: 1342.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3868 [2024-06-11 03:38:26,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.44 | bwd_microstep: 1564.90 | bwd_inner_microstep: 1564.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 03:38:28,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.54 | bwd_microstep: 1554.89 | bwd_inner_microstep: 1554.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 03:38:30,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.34 | bwd_microstep: 1479.35 | bwd_inner_microstep: 1479.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 03:38:32,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.82 | bwd_microstep: 1481.31 | bwd_inner_microstep: 1481.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 03:38:34,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.27 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3717 [2024-06-11 03:38:36,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.22 | bwd_microstep: 1436.00 | bwd_inner_microstep: 1435.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 03:38:38,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1391.78 | bwd_inner_microstep: 1391.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 03:38:40,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.31 | bwd_microstep: 1390.39 | bwd_inner_microstep: 1390.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3499 [2024-06-11 03:38:42,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1417.09 | bwd_inner_microstep: 1417.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1907 [2024-06-11 03:38:43,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.91 | bwd_microstep: 779.89 | bwd_inner_microstep: 779.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3667 [2024-06-11 03:38:45,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.93 | bwd_microstep: 1523.24 | bwd_inner_microstep: 1523.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 03:38:47,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.62 | bwd_microstep: 1584.54 | bwd_inner_microstep: 1584.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 03:38:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.71 | bwd_microstep: 1340.37 | bwd_inner_microstep: 1340.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 03:38:51,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.86 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 03:38:53,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.00 | bwd_microstep: 1281.75 | bwd_inner_microstep: 1281.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 03:38:55,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.35 | bwd_microstep: 1661.52 | bwd_inner_microstep: 1661.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616 [2024-06-11 03:38:57,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.55 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2888 [2024-06-11 03:38:59,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.12 | bwd_microstep: 1085.07 | bwd_inner_microstep: 1085.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-11 03:39:01,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1431.26 | bwd_inner_microstep: 1431.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 03:39:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.42 | bwd_microstep: 1455.87 | bwd_inner_microstep: 1455.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-11 03:39:04,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.60 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488 [2024-06-11 03:39:06,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.39 | bwd_microstep: 1412.73 | bwd_inner_microstep: 1412.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-11 03:39:08,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.28 | bwd_microstep: 1447.85 | bwd_inner_microstep: 1447.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-11 03:39:09,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.81 | bwd_microstep: 798.26 | bwd_inner_microstep: 798.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 03:39:11,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.87 | bwd_microstep: 1250.40 | bwd_inner_microstep: 1250.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3598 [2024-06-11 03:39:13,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.14 | bwd_microstep: 1213.30 | bwd_inner_microstep: 1213.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 03:39:15,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.34 | bwd_microstep: 1377.22 | bwd_inner_microstep: 1377.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233 [2024-06-11 03:39:16,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.23 | bwd_microstep: 868.50 | bwd_inner_microstep: 868.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-11 03:39:18,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.07 | bwd_microstep: 1447.01 | bwd_inner_microstep: 1446.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949 [2024-06-11 03:39:23,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-11 03:39:23,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.76 | bwd_microstep: 4694.27 | bwd_inner_microstep: 789.46 | bwd_allreduce_microstep: 3904.75 | step_microstep: 39.03 [2024-06-11 03:39:23,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15890.51 | bwd: 46405.21 | bwd_inner: 42499.50 | bwd_allreduce: 3905.00 | step: 40.61 {'loss': 1.1253, 'learning_rate': 1.0810909040132977e-06, 'epoch': 0.9} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 03:39:25,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.49 | bwd_microstep: 1268.15 | bwd_inner_microstep: 1268.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-11 03:39:26,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 267.89 | bwd_microstep: 694.20 | bwd_inner_microstep: 694.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 03:39:28,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1450.42 | bwd_inner_microstep: 1450.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-11 03:39:30,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.36 | bwd_microstep: 1289.19 | bwd_inner_microstep: 1289.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-11 03:39:32,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.13 | bwd_microstep: 1454.86 | bwd_inner_microstep: 1454.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 03:39:33,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.25 | bwd_microstep: 1384.95 | bwd_inner_microstep: 1384.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-11 03:39:35,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.10 | bwd_microstep: 1186.18 | bwd_inner_microstep: 1186.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 03:39:37,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1385.45 | bwd_inner_microstep: 1385.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 03:39:39,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.07 | bwd_microstep: 1277.76 | bwd_inner_microstep: 1277.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-11 03:39:41,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1397.20 | bwd_inner_microstep: 1397.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3684 [2024-06-11 03:39:43,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.26 | bwd_microstep: 1505.59 | bwd_inner_microstep: 1505.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-11 03:39:45,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.05 | bwd_microstep: 1614.29 | bwd_inner_microstep: 1614.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3680 [2024-06-11 03:39:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.32 | bwd_microstep: 1606.73 | bwd_inner_microstep: 1606.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3507 [2024-06-11 03:39:49,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.88 | bwd_microstep: 1366.39 | bwd_inner_microstep: 1366.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3499 [2024-06-11 03:39:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1577.65 | bwd_inner_microstep: 1577.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3644 [2024-06-11 03:39:53,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1413.54 | bwd_inner_microstep: 1413.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3530 [2024-06-11 03:39:55,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.78 | bwd_microstep: 1325.88 | bwd_inner_microstep: 1325.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3834 [2024-06-11 03:39:57,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.65 | bwd_microstep: 1358.68 | bwd_inner_microstep: 1358.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453 [2024-06-11 03:39:59,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.57 | bwd_microstep: 1289.55 | bwd_inner_microstep: 1289.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3149 [2024-06-11 03:40:00,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.22 | bwd_microstep: 1254.48 | bwd_inner_microstep: 1254.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3574 [2024-06-11 03:40:02,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.03 | bwd_microstep: 1302.73 | bwd_inner_microstep: 1302.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616 [2024-06-11 03:40:04,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.86 | bwd_microstep: 1311.31 | bwd_inner_microstep: 1311.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-11 03:40:06,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.68 | bwd_microstep: 1493.06 | bwd_inner_microstep: 1493.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3608 [2024-06-11 03:40:08,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.23 | bwd_microstep: 1212.89 | bwd_inner_microstep: 1212.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-11 03:40:10,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.70 | bwd_microstep: 1394.15 | bwd_inner_microstep: 1394.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016 [2024-06-11 03:40:11,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.40 | bwd_microstep: 710.35 | bwd_inner_microstep: 710.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 03:40:13,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.82 | bwd_microstep: 1257.17 | bwd_inner_microstep: 1257.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3729 [2024-06-11 03:40:14,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.47 | bwd_microstep: 1366.76 | bwd_inner_microstep: 1366.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 2728 [2024-06-11 03:40:16,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.04 | bwd_microstep: 1234.02 | bwd_inner_microstep: 1234.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3534 [2024-06-11 03:40:18,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.54 | bwd_microstep: 1342.23 | bwd_inner_microstep: 1342.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 03:40:20,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.69 | bwd_microstep: 1544.41 | bwd_inner_microstep: 1544.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3587 [2024-06-11 03:40:25,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-11 03:40:25,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.50 | bwd_microstep: 3929.91 | bwd_inner_microstep: 1814.13 | bwd_allreduce_microstep: 2115.73 | step_microstep: 39.24 [2024-06-11 03:40:25,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16110.17 | bwd: 45200.12 | bwd_inner: 43083.49 | bwd_allreduce: 2115.96 | step: 40.83 {'loss': 1.2279, 'learning_rate': 1.0689509965436918e-06, 'epoch': 0.9} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 03:40:26,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1276.50 | bwd_inner_microstep: 1276.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-11 03:40:28,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.96 | bwd_microstep: 1276.32 | bwd_inner_microstep: 1276.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3895 [2024-06-11 03:40:31,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.89 | bwd_microstep: 1682.88 | bwd_inner_microstep: 1682.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 03:40:33,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1646.86 | bwd_inner_microstep: 1646.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3478 [2024-06-11 03:40:35,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.18 | bwd_microstep: 1261.66 | bwd_inner_microstep: 1261.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3480 [2024-06-11 03:40:36,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.95 | bwd_microstep: 1213.86 | bwd_inner_microstep: 1213.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 03:40:38,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1484.12 | bwd_inner_microstep: 1484.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 03:40:40,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.63 | bwd_microstep: 1288.53 | bwd_inner_microstep: 1288.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1869 [2024-06-11 03:40:41,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.56 | bwd_microstep: 714.77 | bwd_inner_microstep: 714.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942 [2024-06-11 03:40:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.52 | bwd_microstep: 819.00 | bwd_inner_microstep: 818.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 03:40:44,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1381.44 | bwd_inner_microstep: 1381.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487 [2024-06-11 03:40:46,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.12 | bwd_microstep: 1581.44 | bwd_inner_microstep: 1581.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3699 [2024-06-11 03:40:49,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.22 | bwd_microstep: 1726.41 | bwd_inner_microstep: 1726.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 03:40:51,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.59 | bwd_microstep: 1375.41 | bwd_inner_microstep: 1375.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2133 [2024-06-11 03:40:52,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.59 | bwd_microstep: 928.85 | bwd_inner_microstep: 928.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 03:40:54,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.34 | bwd_microstep: 1345.83 | bwd_inner_microstep: 1345.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3421 [2024-06-11 03:40:55,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.29 | bwd_microstep: 1230.97 | bwd_inner_microstep: 1230.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3677 [2024-06-11 03:40:58,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.42 | bwd_microstep: 1629.57 | bwd_inner_microstep: 1629.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3506 [2024-06-11 03:41:00,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1348.67 | bwd_inner_microstep: 1348.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1970 [2024-06-11 03:41:01,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.95 | bwd_microstep: 796.81 | bwd_inner_microstep: 796.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 03:41:02,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.20 | bwd_microstep: 1186.48 | bwd_inner_microstep: 1186.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3836 [2024-06-11 03:41:04,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.07 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1558.69 | bwd_allreduce_microstep: 0.31 | step_microstep: 0.36 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2302 [2024-06-11 03:41:06,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.57 | bwd_microstep: 848.31 | bwd_inner_microstep: 848.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 03:41:07,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1255.94 | bwd_inner_microstep: 1255.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 03:41:09,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.59 | bwd_microstep: 1301.49 | bwd_inner_microstep: 1301.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3613 [2024-06-11 03:41:11,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1514.73 | bwd_inner_microstep: 1514.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-11 03:41:13,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1611.84 | bwd_inner_microstep: 1611.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2280 [2024-06-11 03:41:15,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.13 | bwd_microstep: 879.88 | bwd_inner_microstep: 879.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 03:41:16,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3427 [2024-06-11 03:41:18,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1376.64 | bwd_inner_microstep: 1376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 03:41:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.03 | bwd_microstep: 1414.17 | bwd_inner_microstep: 1413.91 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.30 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-11 03:41:27,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-11 03:41:27,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 5952.00 | bwd_inner_microstep: 1677.89 | bwd_allreduce_microstep: 4274.05 | step_microstep: 39.07 [2024-06-11 03:41:27,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15650.31 | bwd: 46197.20 | bwd_inner: 41921.57 | bwd_allreduce: 4274.78 | step: 41.20 {'loss': 1.1327, 'learning_rate': 1.0568777641302663e-06, 'epoch': 0.9} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2055 [2024-06-11 03:41:28,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.03 | bwd_microstep: 872.59 | bwd_inner_microstep: 872.41 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2342 [2024-06-11 03:41:29,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.77 | bwd_microstep: 984.04 | bwd_inner_microstep: 984.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-11 03:41:32,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.34 | bwd_microstep: 1480.34 | bwd_inner_microstep: 1480.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-11 03:41:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.11 | bwd_microstep: 1543.66 | bwd_inner_microstep: 1543.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 03:41:35,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.29 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-11 03:41:37,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1396.02 | bwd_inner_microstep: 1395.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 03:41:39,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1253.73 | bwd_inner_microstep: 1253.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-11 03:41:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.31 | bwd_microstep: 793.15 | bwd_inner_microstep: 793.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 03:41:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.80 | bwd_microstep: 1245.67 | bwd_inner_microstep: 1245.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-11 03:41:44,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.31 | bwd_microstep: 1424.10 | bwd_inner_microstep: 1424.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 03:41:46,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.74 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-11 03:41:47,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 892.76 | bwd_inner_microstep: 892.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3667 [2024-06-11 03:41:49,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.65 | bwd_microstep: 1687.58 | bwd_inner_microstep: 1687.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-11 03:41:51,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1352.52 | bwd_inner_microstep: 1352.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 03:41:53,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.72 | bwd_microstep: 1477.35 | bwd_inner_microstep: 1477.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-11 03:41:55,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.34 | bwd_microstep: 1477.55 | bwd_inner_microstep: 1477.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2144 [2024-06-11 03:41:56,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.42 | bwd_microstep: 932.35 | bwd_inner_microstep: 932.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3465 [2024-06-11 03:41:58,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.75 | bwd_microstep: 1477.72 | bwd_inner_microstep: 1477.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-11 03:42:01,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.73 | bwd_microstep: 1584.63 | bwd_inner_microstep: 1584.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-11 03:42:03,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.88 | bwd_microstep: 1498.07 | bwd_inner_microstep: 1498.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 03:42:04,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 03:42:06,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1253.33 | bwd_inner_microstep: 1253.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-11 03:42:08,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.09 | bwd_microstep: 1309.16 | bwd_inner_microstep: 1309.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3609 [2024-06-11 03:42:10,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.59 | bwd_microstep: 1406.77 | bwd_inner_microstep: 1406.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3555 [2024-06-11 03:42:12,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.34 | bwd_microstep: 1297.31 | bwd_inner_microstep: 1297.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-11 03:42:14,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.34 | bwd_microstep: 1431.87 | bwd_inner_microstep: 1431.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-11 03:42:16,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.99 | bwd_microstep: 1433.39 | bwd_inner_microstep: 1433.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3603 [2024-06-11 03:42:18,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.07 | bwd_microstep: 1411.33 | bwd_inner_microstep: 1411.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 03:42:20,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.89 | bwd_microstep: 1511.84 | bwd_inner_microstep: 1511.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1956 [2024-06-11 03:42:21,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.18 | bwd_microstep: 701.67 | bwd_inner_microstep: 701.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3579 [2024-06-11 03:42:23,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.45 | bwd_microstep: 1697.76 | bwd_inner_microstep: 1697.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-11 03:42:28,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.10 | optimizer_step: 6.62 [2024-06-11 03:42:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 3886.82 | bwd_inner_microstep: 1630.54 | bwd_allreduce_microstep: 2256.23 | step_microstep: 37.98 [2024-06-11 03:42:28,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15746.29 | bwd: 44535.30 | bwd_inner: 42278.03 | bwd_allreduce: 2256.53 | step: 39.60 ██████▉ | 1548/1726 [26:59:56<3:35:32, 72.65s/it] 90%|████████▉ | 1549/1726 [27:00:57<3:24:13, 69.23s/it] 90%|████████▉ | 1549/1726 [27:00:57<3:24:13, 69.23s/it] 90%|████████▉ | 1550/1726 [27:02:00<3:17:16, 67.25s/it] 90%|████████▉ | 1550/1726 [27:02:00<3:17:16, 67.25s/it] 90%|████████▉ | 1551/1726 [27:03:01<3:11:15, 65.57s/it] 90%|████████▉ | 1551/1726 [27:03:01<3:11:15, 65.57s/it] 90%|████████▉ | 1552/1726 [27:04:04<3:07:13, 64.56s/it] 90%|████████▉ | 1552/1726 [27:04:04<3:07:13, 64.56s/it] 90%|████████▉ | 1553/1726 [27:05:04<3:02:44, 63.38s/it] {'loss': 1.1744, 'learning_rate': 1.0448712492948743e-06, 'epoch': 0.9} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3457 [2024-06-11 03:42:30,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.18 | bwd_microstep: 1563.21 | bwd_inner_microstep: 1563.14 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.23 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3620 [2024-06-11 03:42:32,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.22 | bwd_microstep: 1537.97 | bwd_inner_microstep: 1537.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845 [2024-06-11 03:42:34,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.24 | bwd_microstep: 1656.30 | bwd_inner_microstep: 1656.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-11 03:42:36,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.11 | bwd_microstep: 1550.28 | bwd_inner_microstep: 1550.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3576 [2024-06-11 03:42:38,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.91 | bwd_microstep: 1363.76 | bwd_inner_microstep: 1363.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764 [2024-06-11 03:42:40,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.52 | bwd_microstep: 1643.83 | bwd_inner_microstep: 1643.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3718 [2024-06-11 03:42:42,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.37 | bwd_microstep: 1296.81 | bwd_inner_microstep: 1296.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3754 [2024-06-11 03:42:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.71 | bwd_microstep: 1541.59 | bwd_inner_microstep: 1541.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3420 [2024-06-11 03:42:46,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.20 | bwd_microstep: 1153.62 | bwd_inner_microstep: 1153.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-11 03:42:48,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.11 | bwd_microstep: 1381.45 | bwd_inner_microstep: 1381.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2147 [2024-06-11 03:42:49,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.21 | bwd_microstep: 946.17 | bwd_inner_microstep: 946.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3542 [2024-06-11 03:42:51,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.66 | bwd_microstep: 1355.47 | bwd_inner_microstep: 1355.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-11 03:42:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.15 | bwd_microstep: 1488.97 | bwd_inner_microstep: 1488.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948 [2024-06-11 03:42:54,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.07 | bwd_microstep: 706.23 | bwd_inner_microstep: 706.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-11 03:42:55,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.44 | bwd_microstep: 793.43 | bwd_inner_microstep: 793.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3547 [2024-06-11 03:42:57,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.80 | bwd_microstep: 1454.91 | bwd_inner_microstep: 1454.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-11 03:42:59,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.07 | bwd_microstep: 1298.38 | bwd_inner_microstep: 1298.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-11 03:43:00,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.77 | bwd_microstep: 978.15 | bwd_inner_microstep: 978.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3903 [2024-06-11 03:43:02,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.35 | bwd_microstep: 1592.78 | bwd_inner_microstep: 1592.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 03:43:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.18 | bwd_microstep: 1610.16 | bwd_inner_microstep: 1610.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-11 03:43:06,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.96 | bwd_microstep: 801.19 | bwd_inner_microstep: 801.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131 [2024-06-11 03:43:07,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.79 | bwd_microstep: 927.23 | bwd_inner_microstep: 927.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 03:43:09,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.83 | bwd_microstep: 1500.47 | bwd_inner_microstep: 1500.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969 [2024-06-11 03:43:10,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.18 | bwd_microstep: 828.95 | bwd_inner_microstep: 828.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 03:43:12,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.87 | bwd_microstep: 1259.66 | bwd_inner_microstep: 1259.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 03:43:14,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1391.89 | bwd_inner_microstep: 1391.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 03:43:16,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1557.56 | bwd_inner_microstep: 1557.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3566 [2024-06-11 03:43:18,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.78 | bwd_microstep: 1544.03 | bwd_inner_microstep: 1544.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3813 [2024-06-11 03:43:20,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.54 | bwd_microstep: 1501.44 | bwd_inner_microstep: 1501.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-11 03:43:23,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.89 | bwd_microstep: 1602.87 | bwd_inner_microstep: 1602.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3801 [2024-06-11 03:43:25,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.46 | bwd_microstep: 1687.14 | bwd_inner_microstep: 1687.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-11 03:43:30,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.55 | optimizer_gradients: 4.24 | optimizer_step: 6.61 [2024-06-11 03:43:30,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.43 | bwd_microstep: 4542.39 | bwd_inner_microstep: 2137.04 | bwd_allreduce_microstep: 2405.27 | step_microstep: 40.25 [2024-06-11 03:43:30,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16116.30 | bwd: 46058.33 | bwd_inner: 43652.07 | bwd_allreduce: 2405.55 | step: 41.96 {'loss': 1.1841, 'learning_rate': 1.0329314943244117e-06, 'epoch': 0.9} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 03:43:32,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.89 | bwd_microstep: 1475.31 | bwd_inner_microstep: 1475.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 03:43:34,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1278.42 | bwd_inner_microstep: 1278.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-11 03:43:36,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.82 | bwd_microstep: 1455.98 | bwd_inner_microstep: 1455.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3586 [2024-06-11 03:43:38,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.92 | bwd_microstep: 1405.66 | bwd_inner_microstep: 1405.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2310 [2024-06-11 03:43:39,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.35 | bwd_microstep: 884.81 | bwd_inner_microstep: 884.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-11 03:43:41,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.42 | bwd_microstep: 1285.38 | bwd_inner_microstep: 1285.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 03:43:43,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1385.62 | bwd_inner_microstep: 1385.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-11 03:43:45,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.64 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-11 03:43:47,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.78 | bwd_microstep: 1531.12 | bwd_inner_microstep: 1531.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 03:43:48,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.78 | bwd_microstep: 1255.65 | bwd_inner_microstep: 1255.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 03:43:50,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.60 | bwd_microstep: 1385.06 | bwd_inner_microstep: 1385.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1945 [2024-06-11 03:43:51,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.49 | bwd_microstep: 729.98 | bwd_inner_microstep: 729.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 03:43:53,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.17 | bwd_microstep: 1257.33 | bwd_inner_microstep: 1257.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-11 03:43:54,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.64 | bwd_microstep: 793.60 | bwd_inner_microstep: 793.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 03:43:56,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 03:43:58,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1376.37 | bwd_inner_microstep: 1376.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3606 [2024-06-11 03:44:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.25 | bwd_microstep: 1704.83 | bwd_inner_microstep: 1704.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 03:44:02,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1353.89 | bwd_inner_microstep: 1353.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 03:44:04,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1256.87 | bwd_inner_microstep: 1256.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-11 03:44:06,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1262.20 | bwd_inner_microstep: 1262.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 03:44:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.97 | bwd_microstep: 1496.54 | bwd_inner_microstep: 1496.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3550 [2024-06-11 03:44:10,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.08 | bwd_microstep: 1402.25 | bwd_inner_microstep: 1402.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2100 [2024-06-11 03:44:11,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.51 | bwd_microstep: 921.10 | bwd_inner_microstep: 921.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-11 03:44:12,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.43 | bwd_microstep: 802.32 | bwd_inner_microstep: 802.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-11 03:44:14,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.26 | bwd_microstep: 1660.65 | bwd_inner_microstep: 1660.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 03:44:17,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.40 | bwd_microstep: 1660.00 | bwd_inner_microstep: 1659.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2291 [2024-06-11 03:44:18,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.28 | bwd_microstep: 1073.41 | bwd_inner_microstep: 1073.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3834 [2024-06-11 03:44:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.48 | bwd_microstep: 1502.54 | bwd_inner_microstep: 1502.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3806 [2024-06-11 03:44:22,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.50 | bwd_microstep: 1355.33 | bwd_inner_microstep: 1355.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3813 [2024-06-11 03:44:24,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.99 | bwd_microstep: 1357.82 | bwd_inner_microstep: 1357.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-11 03:44:25,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.27 | bwd_microstep: 878.80 | bwd_inner_microstep: 878.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-11 03:44:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.15 | optimizer_step: 6.58 [2024-06-11 03:44:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.08 | bwd_microstep: 4906.23 | bwd_inner_microstep: 1616.17 | bwd_allreduce_microstep: 3290.00 | step_microstep: 38.71 [2024-06-11 03:44:31,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15462.03 | bwd: 44725.28 | bwd_inner: 41434.38 | bwd_allreduce: 3290.23 | step: 40.22 {'loss': 1.1215, 'learning_rate': 1.0210585412706187e-06, 'epoch': 0.9} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-11 03:44:32,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.87 | bwd_microstep: 1266.03 | bwd_inner_microstep: 1265.82 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3574 [2024-06-11 03:44:34,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.54 | bwd_microstep: 1428.65 | bwd_inner_microstep: 1428.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-11 03:44:36,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.85 | bwd_microstep: 1449.83 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 03:44:38,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.46 | bwd_microstep: 1386.78 | bwd_inner_microstep: 1386.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3787 [2024-06-11 03:44:40,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.77 | bwd_microstep: 1644.36 | bwd_inner_microstep: 1644.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 03:44:42,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1246.68 | bwd_inner_microstep: 1246.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 03:44:44,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.74 | bwd_microstep: 1384.00 | bwd_inner_microstep: 1383.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-11 03:44:46,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1348.38 | bwd_inner_microstep: 1348.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 03:44:48,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1249.65 | bwd_inner_microstep: 1249.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 03:44:50,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.11 | bwd_microstep: 1481.24 | bwd_inner_microstep: 1481.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3447 [2024-06-11 03:44:52,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.24 | bwd_microstep: 1375.94 | bwd_inner_microstep: 1375.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-11 03:44:54,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.07 | bwd_microstep: 1523.20 | bwd_inner_microstep: 1523.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-11 03:44:56,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.40 | bwd_microstep: 1344.92 | bwd_inner_microstep: 1344.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 03:44:57,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.68 | bwd_microstep: 1343.13 | bwd_inner_microstep: 1343.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3399 [2024-06-11 03:44:59,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.54 | bwd_microstep: 1310.61 | bwd_inner_microstep: 1310.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972 [2024-06-11 03:45:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.35 | bwd_microstep: 733.96 | bwd_inner_microstep: 733.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3472 [2024-06-11 03:45:02,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.14 | bwd_microstep: 1315.30 | bwd_inner_microstep: 1315.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3525 [2024-06-11 03:45:04,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.04 | bwd_microstep: 1437.95 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 03:45:06,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.55 | bwd_microstep: 1551.54 | bwd_inner_microstep: 1551.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022 [2024-06-11 03:45:07,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.46 | bwd_microstep: 806.41 | bwd_inner_microstep: 806.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706 [2024-06-11 03:45:09,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.12 | bwd_microstep: 1534.93 | bwd_inner_microstep: 1534.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3830 [2024-06-11 03:45:12,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.43 | bwd_microstep: 1824.83 | bwd_inner_microstep: 1824.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 03:45:14,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.12 | bwd_microstep: 1288.93 | bwd_inner_microstep: 1288.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2029 [2024-06-11 03:45:15,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.83 | bwd_microstep: 779.29 | bwd_inner_microstep: 779.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 03:45:17,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.55 | bwd_microstep: 1404.63 | bwd_inner_microstep: 1404.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-11 03:45:19,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.58 | bwd_microstep: 1509.13 | bwd_inner_microstep: 1509.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 03:45:21,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1254.50 | bwd_inner_microstep: 1254.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3768 [2024-06-11 03:45:23,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.26 | bwd_microstep: 1847.69 | bwd_inner_microstep: 1847.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3581 [2024-06-11 03:45:25,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1333.38 | bwd_inner_microstep: 1333.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1988 [2024-06-11 03:45:26,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.09 | bwd_microstep: 707.07 | bwd_inner_microstep: 707.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2034 [2024-06-11 03:45:27,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.94 | bwd_microstep: 716.17 | bwd_inner_microstep: 716.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3560 [2024-06-11 03:45:31,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.59 [2024-06-11 03:45:31,827] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.99 | bwd_microstep: 3665.88 | bwd_inner_microstep: 2237.55 | bwd_allreduce_microstep: 1428.26 | step_microstep: 40.30 [2024-06-11 03:45:31,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15920.43 | bwd: 44495.04 | bwd_inner: 43065.70 | bwd_allreduce: 1428.59 | step: 43.07 {'loss': 1.2027, 'learning_rate': 1.0092524319499853e-06, 'epoch': 0.9} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-11 03:45:32,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.90 | bwd_microstep: 780.88 | bwd_inner_microstep: 780.82 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2380 [2024-06-11 03:45:34,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.10 | bwd_microstep: 960.60 | bwd_inner_microstep: 960.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 03:45:35,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1250.89 | bwd_inner_microstep: 1250.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2343 [2024-06-11 03:45:37,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.68 | bwd_microstep: 984.77 | bwd_inner_microstep: 984.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-11 03:45:39,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.19 | bwd_microstep: 1456.05 | bwd_inner_microstep: 1456.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3760 [2024-06-11 03:45:41,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.90 | bwd_microstep: 1640.49 | bwd_inner_microstep: 1640.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 03:45:43,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3433 [2024-06-11 03:45:45,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.16 | bwd_microstep: 1280.45 | bwd_inner_microstep: 1280.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 03:45:47,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.07 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 03:45:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1482.43 | bwd_inner_microstep: 1482.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 03:45:51,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.07 | bwd_microstep: 1389.07 | bwd_inner_microstep: 1389.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2002 [2024-06-11 03:45:52,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.04 | bwd_microstep: 896.47 | bwd_inner_microstep: 896.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1950 [2024-06-11 03:45:53,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.55 | bwd_microstep: 822.40 | bwd_inner_microstep: 822.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 03:45:55,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1250.24 | bwd_inner_microstep: 1250.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3424 [2024-06-11 03:45:57,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1539.95 | bwd_inner_microstep: 1539.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3631 [2024-06-11 03:45:59,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1572.79 | bwd_inner_microstep: 1572.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3668 [2024-06-11 03:46:01,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.04 | bwd_microstep: 1358.66 | bwd_inner_microstep: 1358.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-11 03:46:03,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1525.44 | bwd_inner_microstep: 1525.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2010 [2024-06-11 03:46:04,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.26 | bwd_microstep: 709.93 | bwd_inner_microstep: 709.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122 [2024-06-11 03:46:05,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.10 | bwd_microstep: 831.93 | bwd_inner_microstep: 831.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 03:46:07,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.47 | bwd_microstep: 1383.91 | bwd_inner_microstep: 1383.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-11 03:46:09,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.78 | bwd_microstep: 1287.65 | bwd_inner_microstep: 1287.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3558 [2024-06-11 03:46:11,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.80 | bwd_microstep: 1430.17 | bwd_inner_microstep: 1430.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-11 03:46:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.10 | bwd_microstep: 1499.34 | bwd_inner_microstep: 1499.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 03:46:15,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.68 | bwd_microstep: 1275.96 | bwd_inner_microstep: 1275.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-11 03:46:17,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.41 | bwd_microstep: 1425.81 | bwd_inner_microstep: 1425.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3796 [2024-06-11 03:46:19,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.75 | bwd_microstep: 1506.94 | bwd_inner_microstep: 1506.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3478 [2024-06-11 03:46:21,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.38 | bwd_microstep: 1247.46 | bwd_inner_microstep: 1247.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-11 03:46:23,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.96 | bwd_microstep: 1594.45 | bwd_inner_microstep: 1594.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3430 [2024-06-11 03:46:25,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.53 | bwd_microstep: 1408.37 | bwd_inner_microstep: 1408.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 03:46:27,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.52 | bwd_microstep: 1552.91 | bwd_inner_microstep: 1552.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3580 [2024-06-11 03:46:36,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.42 | optimizer_step: 6.64 [2024-06-11 03:46:36,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.35 | bwd_microstep: 8128.38 | bwd_inner_microstep: 1652.10 | bwd_allreduce_microstep: 6476.21 | step_microstep: 40.14 [2024-06-11 03:46:36,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15650.46 | bwd: 48417.90 | bwd_inner: 41940.72 | bwd_allreduce: 6476.47 | step: 41.72 {'loss': 1.1824, 'learning_rate': 9.975132079435635e-07, 'epoch': 0.9} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3466 [2024-06-11 03:46:38,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.68 | bwd_microstep: 1403.54 | bwd_inner_microstep: 1403.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-11 03:46:39,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.50 | bwd_microstep: 792.94 | bwd_inner_microstep: 792.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3882 [2024-06-11 03:46:41,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.21 | bwd_microstep: 1489.36 | bwd_inner_microstep: 1489.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1868 [2024-06-11 03:46:42,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.42 | bwd_microstep: 677.69 | bwd_inner_microstep: 677.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 03:46:44,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1374.05 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 03:46:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.90 | bwd_microstep: 1403.12 | bwd_inner_microstep: 1403.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 03:46:48,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1390.08 | bwd_inner_microstep: 1390.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 03:46:49,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1246.37 | bwd_inner_microstep: 1246.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759 [2024-06-11 03:46:51,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.30 | bwd_microstep: 1436.40 | bwd_inner_microstep: 1436.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 03:46:53,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.72 | bwd_microstep: 1398.85 | bwd_inner_microstep: 1398.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2469 [2024-06-11 03:46:55,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.26 | bwd_microstep: 1021.01 | bwd_inner_microstep: 1020.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 03:46:57,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.50 | bwd_microstep: 1387.79 | bwd_inner_microstep: 1387.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2131 [2024-06-11 03:46:58,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.16 | bwd_microstep: 987.87 | bwd_inner_microstep: 987.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 03:47:00,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.36 | bwd_microstep: 1487.17 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3697 [2024-06-11 03:47:02,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.88 | bwd_microstep: 1447.07 | bwd_inner_microstep: 1447.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3497 [2024-06-11 03:47:04,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.02 | bwd_microstep: 1613.06 | bwd_inner_microstep: 1613.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3505 [2024-06-11 03:47:06,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.53 | bwd_microstep: 1418.37 | bwd_inner_microstep: 1418.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 03:47:08,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.78 | bwd_microstep: 1491.44 | bwd_inner_microstep: 1491.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-11 03:47:10,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.73 | bwd_microstep: 1357.02 | bwd_inner_microstep: 1356.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1914 [2024-06-11 03:47:11,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.78 | bwd_microstep: 752.33 | bwd_inner_microstep: 752.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 03:47:13,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.41 | bwd_microstep: 1390.93 | bwd_inner_microstep: 1390.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3537 [2024-06-11 03:47:15,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.56 | bwd_microstep: 1327.55 | bwd_inner_microstep: 1327.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2236 [2024-06-11 03:47:16,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.18 | bwd_microstep: 866.39 | bwd_inner_microstep: 866.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3532 [2024-06-11 03:47:18,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.42 | bwd_microstep: 1426.13 | bwd_inner_microstep: 1426.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3612 [2024-06-11 03:47:20,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.37 | bwd_microstep: 1342.60 | bwd_inner_microstep: 1342.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 03:47:22,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.49 | bwd_microstep: 1550.06 | bwd_inner_microstep: 1550.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2189 [2024-06-11 03:47:23,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.76 | bwd_microstep: 955.62 | bwd_inner_microstep: 955.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-11 03:47:26,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.93 | bwd_microstep: 1584.16 | bwd_inner_microstep: 1584.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775 [2024-06-11 03:47:28,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.24 | bwd_microstep: 1744.36 | bwd_inner_microstep: 1744.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3572 [2024-06-11 03:47:30,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.39 | bwd_microstep: 1491.64 | bwd_inner_microstep: 1491.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3447 [2024-06-11 03:47:32,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-11 03:47:37,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.46 | optimizer_step: 6.60 [2024-06-11 03:47:37,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.64 | bwd_microstep: 4660.22 | bwd_inner_microstep: 1467.57 | bwd_allreduce_microstep: 3192.58 | step_microstep: 39.91 [2024-06-11 03:47:37,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15720.71 | bwd: 45329.78 | bwd_inner: 42136.27 | bwd_allreduce: 3192.82 | step: 41.77 90%|████████▉ | 1553/1726 [27:05:04<3:02:44, 63.38s/it] 90%|█████████ | 1554/1726 [27:06:07<3:00:57, 63.13s/it] 90%|█████████ | 1554/1726 [27:06:07<3:00:57, 63.13s/it] 90%|█████████ | 1555/1726 [27:07:07<2:57:40, 62.34s/it] 90%|█████████ | 1555/1726 [27:07:07<2:57:40, 62.34s/it] 90%|█████████ | 1556/1726 [27:08:08<2:55:18, 61.87s/it] 90%|█████████ | 1556/1726 [27:08:08<2:55:18, 61.87s/it] 90%|█████████ | 1557/1726 [27:09:12<2:56:24, 62.63s/it] 90%|█████████ | 1557/1726 [27:09:12<2:56:24, 62.63s/it] 90%|█████████ | 1558/1726 [27:10:14<2:54:20, 62.27s/it] {'loss': 1.1748, 'learning_rate': 9.858409105968337e-07, 'epoch': 0.9} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-11 03:47:39,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.93 | bwd_microstep: 1476.25 | bwd_inner_microstep: 1476.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 03:47:41,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 1244.87 | bwd_inner_microstep: 1244.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-11 03:47:43,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.13 | bwd_microstep: 1582.56 | bwd_inner_microstep: 1582.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 03:47:45,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.32 | bwd_microstep: 1283.88 | bwd_inner_microstep: 1283.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 03:47:47,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.05 | bwd_microstep: 1251.56 | bwd_inner_microstep: 1251.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 03:47:49,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1387.53 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-11 03:47:51,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1631.71 | bwd_inner_microstep: 1631.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518 [2024-06-11 03:47:52,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.68 | bwd_microstep: 1194.74 | bwd_inner_microstep: 1194.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 722 [2024-06-11 03:47:53,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 118.17 | bwd_microstep: 295.69 | bwd_inner_microstep: 295.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1968 [2024-06-11 03:47:54,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.90 | bwd_microstep: 854.40 | bwd_inner_microstep: 854.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3623 [2024-06-11 03:47:56,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.73 | bwd_microstep: 1407.09 | bwd_inner_microstep: 1407.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3418 [2024-06-11 03:47:58,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.85 | bwd_microstep: 1210.04 | bwd_inner_microstep: 1210.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3658 [2024-06-11 03:48:00,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.91 | bwd_microstep: 1519.07 | bwd_inner_microstep: 1519.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3963 [2024-06-11 03:48:02,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.85 | bwd_microstep: 1700.55 | bwd_inner_microstep: 1700.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 03:48:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.97 | bwd_microstep: 1284.35 | bwd_inner_microstep: 1284.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 03:48:06,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.49 | bwd_microstep: 1275.40 | bwd_inner_microstep: 1275.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 03:48:08,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1380.33 | bwd_inner_microstep: 1380.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-11 03:48:10,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.40 | bwd_microstep: 1461.33 | bwd_inner_microstep: 1461.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3508 [2024-06-11 03:48:11,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1191.48 | bwd_inner_microstep: 1191.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3697 [2024-06-11 03:48:13,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1531.42 | bwd_inner_microstep: 1531.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-11 03:48:15,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.22 | bwd_microstep: 1150.02 | bwd_inner_microstep: 1149.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3819 [2024-06-11 03:48:17,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.42 | bwd_microstep: 1359.37 | bwd_inner_microstep: 1359.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3710 [2024-06-11 03:48:19,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.18 | bwd_microstep: 1660.84 | bwd_inner_microstep: 1660.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2087 [2024-06-11 03:48:20,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.46 | bwd_microstep: 1016.38 | bwd_inner_microstep: 1016.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3564 [2024-06-11 03:48:22,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1330.99 | bwd_inner_microstep: 1330.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-11 03:48:24,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.45 | bwd_microstep: 976.79 | bwd_inner_microstep: 976.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3758 [2024-06-11 03:48:25,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1279.17 | bwd_inner_microstep: 1279.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-11 03:48:28,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1495.97 | bwd_inner_microstep: 1495.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 03:48:30,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.00 | bwd_microstep: 1655.05 | bwd_inner_microstep: 1655.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-11 03:48:32,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1572.36 | bwd_inner_microstep: 1572.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-11 03:48:34,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1344.48 | bwd_inner_microstep: 1344.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-11 03:48:38,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 03:48:38,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.45 | bwd_microstep: 3394.43 | bwd_inner_microstep: 1810.71 | bwd_allreduce_microstep: 1583.67 | step_microstep: 38.16 [2024-06-11 03:48:38,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.63 | bwd: 44400.09 | bwd_inner: 42815.44 | bwd_allreduce: 1583.95 | step: 39.74 {'loss': 1.1705, 'learning_rate': 9.742355810195804e-07, 'epoch': 0.9} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559 [2024-06-11 03:48:40,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1322.90 | bwd_inner_microstep: 1322.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862 [2024-06-11 03:48:42,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.91 | bwd_microstep: 1565.02 | bwd_inner_microstep: 1565.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3830 [2024-06-11 03:48:44,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1355.26 | bwd_inner_microstep: 1355.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 03:48:46,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.62 | bwd_microstep: 1282.10 | bwd_inner_microstep: 1282.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 03:48:47,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1406.81 | bwd_inner_microstep: 1406.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3524 [2024-06-11 03:48:49,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1324.57 | bwd_inner_microstep: 1324.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 03:48:51,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.43 | bwd_microstep: 1383.71 | bwd_inner_microstep: 1383.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-11 03:48:52,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.71 | bwd_microstep: 686.93 | bwd_inner_microstep: 686.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3707 [2024-06-11 03:48:54,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.46 | bwd_microstep: 1428.12 | bwd_inner_microstep: 1428.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-11 03:48:56,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1150.00 | bwd_inner_microstep: 1149.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2153 [2024-06-11 03:48:57,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.07 | bwd_microstep: 899.95 | bwd_inner_microstep: 899.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 03:48:59,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.39 | bwd_microstep: 1375.17 | bwd_inner_microstep: 1375.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2624 [2024-06-11 03:49:00,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.39 | bwd_microstep: 1017.25 | bwd_inner_microstep: 1017.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 03:49:02,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.35 | bwd_microstep: 1353.06 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-11 03:49:04,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1648.59 | bwd_inner_microstep: 1648.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 03:49:06,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1255.38 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3466 [2024-06-11 03:49:08,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.24 | bwd_microstep: 1183.22 | bwd_inner_microstep: 1183.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627 [2024-06-11 03:49:10,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1443.33 | bwd_inner_microstep: 1443.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3577 [2024-06-11 03:49:12,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.74 | bwd_microstep: 1407.72 | bwd_inner_microstep: 1407.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3508 [2024-06-11 03:49:14,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.77 | bwd_microstep: 1681.75 | bwd_inner_microstep: 1681.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-11 03:49:15,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 789.18 | bwd_inner_microstep: 789.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 03:49:17,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.03 | bwd_microstep: 1390.92 | bwd_inner_microstep: 1390.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-11 03:49:18,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.66 | bwd_microstep: 787.17 | bwd_inner_microstep: 787.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3443 [2024-06-11 03:49:20,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.78 | bwd_microstep: 1158.24 | bwd_inner_microstep: 1158.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 03:49:22,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1283.38 | bwd_inner_microstep: 1283.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3760 [2024-06-11 03:49:24,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.75 | bwd_microstep: 1468.67 | bwd_inner_microstep: 1468.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3813 [2024-06-11 03:49:26,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.04 | bwd_microstep: 1603.25 | bwd_inner_microstep: 1603.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3568 [2024-06-11 03:49:28,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1526.61 | bwd_inner_microstep: 1526.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-11 03:49:30,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.60 | bwd_microstep: 1396.24 | bwd_inner_microstep: 1396.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3771 [2024-06-11 03:49:32,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.94 | bwd_microstep: 1543.11 | bwd_inner_microstep: 1543.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 03:49:34,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.32 | bwd_microstep: 1160.23 | bwd_inner_microstep: 1160.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3770 [2024-06-11 03:49:38,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 03:49:38,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.19 | bwd_microstep: 3808.90 | bwd_inner_microstep: 1870.20 | bwd_allreduce_microstep: 1938.64 | step_microstep: 38.82 [2024-06-11 03:49:38,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15747.68 | bwd: 44086.75 | bwd_inner: 42147.20 | bwd_allreduce: 1938.88 | step: 40.38 {'loss': 1.0872, 'learning_rate': 9.626972600856966e-07, 'epoch': 0.9} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 03:49:40,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1363.83 | bwd_inner_microstep: 1363.71 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2441 [2024-06-11 03:49:41,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.61 | bwd_microstep: 945.49 | bwd_inner_microstep: 945.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841 [2024-06-11 03:49:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.01 | bwd_microstep: 1560.41 | bwd_inner_microstep: 1560.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2339 [2024-06-11 03:49:45,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.13 | bwd_microstep: 891.70 | bwd_inner_microstep: 891.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 03:49:46,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1291.69 | bwd_inner_microstep: 1291.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 03:49:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1384.46 | bwd_inner_microstep: 1384.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 03:49:50,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.90 | bwd_microstep: 1479.18 | bwd_inner_microstep: 1479.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3499 [2024-06-11 03:49:52,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.47 | bwd_microstep: 1221.59 | bwd_inner_microstep: 1221.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 03:49:54,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1251.42 | bwd_inner_microstep: 1251.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 03:49:56,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1248.63 | bwd_inner_microstep: 1248.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-11 03:49:57,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.47 | bwd_microstep: 1283.00 | bwd_inner_microstep: 1282.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-11 03:49:59,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.02 | bwd_microstep: 1342.46 | bwd_inner_microstep: 1342.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-11 03:50:01,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1315.41 | bwd_inner_microstep: 1315.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2099 [2024-06-11 03:50:02,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.38 | bwd_microstep: 923.29 | bwd_inner_microstep: 923.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 03:50:04,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.64 | bwd_microstep: 1485.55 | bwd_inner_microstep: 1485.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 03:50:06,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1483.39 | bwd_inner_microstep: 1483.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3637 [2024-06-11 03:50:09,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.13 | bwd_microstep: 1647.54 | bwd_inner_microstep: 1647.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-11 03:50:10,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.60 | bwd_microstep: 1183.70 | bwd_inner_microstep: 1183.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3857 [2024-06-11 03:50:12,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1317.29 | bwd_inner_microstep: 1317.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3625 [2024-06-11 03:50:14,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 1457.00 | bwd_inner_microstep: 1456.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1933 [2024-06-11 03:50:15,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.31 | bwd_microstep: 727.39 | bwd_inner_microstep: 727.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 03:50:17,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.40 | bwd_microstep: 1285.45 | bwd_inner_microstep: 1285.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-11 03:50:19,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1355.49 | bwd_inner_microstep: 1355.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-11 03:50:21,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1388.12 | bwd_inner_microstep: 1388.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2183 [2024-06-11 03:50:22,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.68 | bwd_microstep: 919.98 | bwd_inner_microstep: 919.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3571 [2024-06-11 03:50:24,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.11 | bwd_microstep: 1482.30 | bwd_inner_microstep: 1482.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454 [2024-06-11 03:50:26,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.56 | bwd_microstep: 1161.93 | bwd_inner_microstep: 1161.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3813 [2024-06-11 03:50:28,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.64 | bwd_microstep: 1460.60 | bwd_inner_microstep: 1460.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-11 03:50:29,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.10 | bwd_microstep: 1337.03 | bwd_inner_microstep: 1337.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1944 [2024-06-11 03:50:31,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.48 | bwd_microstep: 823.70 | bwd_inner_microstep: 823.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3815 [2024-06-11 03:50:33,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.96 | bwd_microstep: 1853.58 | bwd_inner_microstep: 1853.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3432 [2024-06-11 03:50:41,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.14 | optimizer_step: 6.60 [2024-06-11 03:50:41,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.04 | bwd_microstep: 6918.94 | bwd_inner_microstep: 1565.26 | bwd_allreduce_microstep: 5353.62 | step_microstep: 39.02 [2024-06-11 03:50:41,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15507.40 | bwd: 46791.57 | bwd_inner: 41436.93 | bwd_allreduce: 5353.90 | step: 40.58 {'loss': 1.1616, 'learning_rate': 9.512259884331021e-07, 'epoch': 0.9} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-11 03:50:43,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.60 | bwd_microstep: 1460.67 | bwd_inner_microstep: 1460.60 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 03:50:44,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.02 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2369 [2024-06-11 03:50:46,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.74 | bwd_microstep: 993.65 | bwd_inner_microstep: 993.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3410 [2024-06-11 03:50:48,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.91 | bwd_microstep: 1281.55 | bwd_inner_microstep: 1281.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 03:50:50,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1383.71 | bwd_inner_microstep: 1383.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1871 [2024-06-11 03:50:50,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.36 | bwd_microstep: 677.83 | bwd_inner_microstep: 677.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3455 [2024-06-11 03:50:52,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1455.77 | bwd_inner_microstep: 1455.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3764 [2024-06-11 03:50:55,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1540.59 | bwd_inner_microstep: 1540.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-11 03:50:56,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.12 | bwd_microstep: 1188.53 | bwd_inner_microstep: 1188.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3567 [2024-06-11 03:50:58,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.87 | bwd_microstep: 1597.13 | bwd_inner_microstep: 1597.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3728 [2024-06-11 03:51:01,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.17 | bwd_microstep: 1625.47 | bwd_inner_microstep: 1625.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923 [2024-06-11 03:51:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.72 | bwd_microstep: 818.47 | bwd_inner_microstep: 818.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3662 [2024-06-11 03:51:04,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.78 | bwd_microstep: 1624.42 | bwd_inner_microstep: 1624.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3655 [2024-06-11 03:51:06,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.64 | bwd_microstep: 1526.16 | bwd_inner_microstep: 1526.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912 [2024-06-11 03:51:07,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.12 | bwd_microstep: 683.98 | bwd_inner_microstep: 683.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710 [2024-06-11 03:51:09,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.06 | bwd_microstep: 1534.61 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110 [2024-06-11 03:51:10,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.29 | bwd_microstep: 824.59 | bwd_inner_microstep: 824.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3684 [2024-06-11 03:51:13,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.28 | bwd_microstep: 1627.06 | bwd_inner_microstep: 1627.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3823 [2024-06-11 03:51:15,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.28 | bwd_microstep: 1484.52 | bwd_inner_microstep: 1484.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 03:51:16,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.52 | bwd_microstep: 1254.55 | bwd_inner_microstep: 1254.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520 [2024-06-11 03:51:18,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.84 | bwd_microstep: 1220.62 | bwd_inner_microstep: 1220.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3388 [2024-06-11 03:51:20,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.53 | bwd_microstep: 1275.05 | bwd_inner_microstep: 1275.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 03:51:22,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1503.62 | bwd_inner_microstep: 1503.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463 [2024-06-11 03:51:24,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.69 | bwd_microstep: 1436.85 | bwd_inner_microstep: 1436.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-11 03:51:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.75 | bwd_microstep: 1530.64 | bwd_inner_microstep: 1530.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 03:51:28,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.90 | bwd_microstep: 1274.87 | bwd_inner_microstep: 1274.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3826 [2024-06-11 03:51:30,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.49 | bwd_microstep: 1421.28 | bwd_inner_microstep: 1421.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3724 [2024-06-11 03:51:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.68 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615 [2024-06-11 03:51:34,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.62 | bwd_microstep: 1305.78 | bwd_inner_microstep: 1305.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2243 [2024-06-11 03:51:35,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.20 | bwd_microstep: 993.70 | bwd_inner_microstep: 993.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-11 03:51:37,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1399.43 | bwd_inner_microstep: 1399.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3816 [2024-06-11 03:53:16,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.23 | optimizer_step: 6.63 [2024-06-11 03:53:16,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.32 | bwd_microstep: 98276.27 | bwd_inner_microstep: 1715.80 | bwd_allreduce_microstep: 96560.39 | step_microstep: 39.00 [2024-06-11 03:53:16,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15843.22 | bwd: 139062.45 | bwd_inner: 42501.08 | bwd_allreduce: 96560.67 | step: 40.56 {'loss': 1.1784, 'learning_rate': 9.398218064635478e-07, 'epoch': 0.9} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 03:53:18,254] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1330.75 | bwd_inner_microstep: 1330.55 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3935 [2024-06-11 03:53:20,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1483.00 | bwd_inner_microstep: 1482.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3479 [2024-06-11 03:53:22,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.33 | bwd_microstep: 1472.68 | bwd_inner_microstep: 1472.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 03:53:24,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.07 | bwd_microstep: 1373.96 | bwd_inner_microstep: 1373.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 03:53:26,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.77 | bwd_microstep: 1274.37 | bwd_inner_microstep: 1274.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3780 [2024-06-11 03:53:27,933] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.99 | bwd_microstep: 1392.16 | bwd_inner_microstep: 1392.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 03:53:29,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.57 | bwd_microstep: 1400.43 | bwd_inner_microstep: 1400.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2524 [2024-06-11 03:53:31,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.41 | bwd_microstep: 899.20 | bwd_inner_microstep: 899.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710 [2024-06-11 03:53:33,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1426.65 | bwd_inner_microstep: 1426.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3702 [2024-06-11 03:53:35,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.95 | bwd_microstep: 1527.14 | bwd_inner_microstep: 1527.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 03:53:37,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1293.14 | bwd_inner_microstep: 1293.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2495 [2024-06-11 03:53:38,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 381.05 | bwd_microstep: 1021.39 | bwd_inner_microstep: 1021.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3419 [2024-06-11 03:53:40,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.25 | bwd_microstep: 1375.59 | bwd_inner_microstep: 1375.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-11 03:53:42,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.21 | bwd_microstep: 1438.21 | bwd_inner_microstep: 1438.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3525 [2024-06-11 03:53:44,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.46 | bwd_microstep: 1582.87 | bwd_inner_microstep: 1582.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2133 [2024-06-11 03:53:45,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.54 | bwd_microstep: 831.61 | bwd_inner_microstep: 831.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648 [2024-06-11 03:53:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.52 | bwd_microstep: 1607.34 | bwd_inner_microstep: 1607.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-11 03:53:49,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.34 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3706 [2024-06-11 03:53:52,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.61 | bwd_microstep: 1527.91 | bwd_inner_microstep: 1527.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-11 03:53:53,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.39 | bwd_microstep: 1259.56 | bwd_inner_microstep: 1259.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-11 03:53:56,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1650.67 | bwd_inner_microstep: 1650.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3856 [2024-06-11 03:53:58,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.46 | bwd_microstep: 1558.52 | bwd_inner_microstep: 1558.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-11 03:54:00,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.18 | bwd_microstep: 1520.77 | bwd_inner_microstep: 1520.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3106 [2024-06-11 03:54:02,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1341.07 | bwd_inner_microstep: 1341.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3966 [2024-06-11 03:54:04,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.64 | bwd_microstep: 1471.73 | bwd_inner_microstep: 1471.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2216 [2024-06-11 03:54:05,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.69 | bwd_microstep: 864.88 | bwd_inner_microstep: 864.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3562 [2024-06-11 03:54:07,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.11 | bwd_microstep: 1265.14 | bwd_inner_microstep: 1265.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3770 [2024-06-11 03:54:09,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.13 | bwd_microstep: 1444.36 | bwd_inner_microstep: 1444.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-11 03:54:10,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.55 | bwd_microstep: 1308.28 | bwd_inner_microstep: 1308.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2059 [2024-06-11 03:54:12,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.46 | bwd_microstep: 864.20 | bwd_inner_microstep: 864.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-11 03:54:14,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.09 | bwd_microstep: 1651.42 | bwd_inner_microstep: 1651.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3436 [2024-06-11 03:54:32,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.27 | optimizer_step: 6.61 [2024-06-11 03:54:32,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.48 | bwd_microstep: 17695.97 | bwd_inner_microstep: 1417.73 | bwd_allreduce_microstep: 16278.17 | step_microstep: 39.49 [2024-06-11 03:54:32,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16161.70 | bwd: 59586.21 | bwd_inner: 43306.96 | bwd_allreduce: 16278.49 | step: 41.04 90%|█████████ | 1558/1726 [27:10:14<2:54:20, 62.27s/it] 90%|█████████ | 1559/1726 [27:11:15<2:51:59, 61.80s/it] 90%|█████████ | 1559/1726 [27:11:15<2:51:59, 61.80s/it] 90%|█████████ | 1560/1726 [27:12:15<2:49:37, 61.31s/it] 90%|█████████ | 1560/1726 [27:12:15<2:49:37, 61.31s/it] 90%|█████████ | 1561/1726 [27:13:17<2:49:41, 61.71s/it] 90%|█████████ | 1561/1726 [27:13:17<2:49:41, 61.71s/it] 90%|█████████ | 1562/1726 [27:15:53<4:05:22, 89.77s/it] 90%|█████████ | 1562/1726 [27:15:53<4:05:22, 89.77s/it] 91%|█████████ | 1563/1726 [27:17:09<3:52:50, 85.71s/it] {'loss': 1.1384, 'learning_rate': 9.284847543425113e-07, 'epoch': 0.91} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-11 03:54:34,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.77 | bwd_microstep: 1331.14 | bwd_inner_microstep: 1331.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3861 [2024-06-11 03:54:36,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1455.46 | bwd_inner_microstep: 1455.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 03:54:38,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1374.93 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-11 03:54:40,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.65 | bwd_microstep: 1212.84 | bwd_inner_microstep: 1212.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 03:54:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.09 | bwd_microstep: 1476.39 | bwd_inner_microstep: 1476.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 03:54:43,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1274.99 | bwd_inner_microstep: 1274.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 03:54:45,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.28 | bwd_microstep: 1278.50 | bwd_inner_microstep: 1278.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617 [2024-06-11 03:54:47,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.07 | bwd_microstep: 1306.19 | bwd_inner_microstep: 1306.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3545 [2024-06-11 03:54:49,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.04 | bwd_microstep: 1418.61 | bwd_inner_microstep: 1418.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 03:54:51,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.08 | bwd_microstep: 1482.39 | bwd_inner_microstep: 1482.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2334 [2024-06-11 03:54:52,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.18 | bwd_microstep: 981.98 | bwd_inner_microstep: 981.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3643 [2024-06-11 03:54:55,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.25 | bwd_microstep: 1706.02 | bwd_inner_microstep: 1705.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 03:54:57,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.99 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 03:54:59,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.60 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3685 [2024-06-11 03:55:01,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.15 | bwd_microstep: 1521.94 | bwd_inner_microstep: 1521.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 03:55:03,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1380.10 | bwd_inner_microstep: 1380.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3753 [2024-06-11 03:55:05,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.65 | bwd_microstep: 1587.18 | bwd_inner_microstep: 1587.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3587 [2024-06-11 03:55:07,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1334.38 | bwd_inner_microstep: 1334.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1994 [2024-06-11 03:55:53,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.12 | bwd_microstep: 795.62 | bwd_inner_microstep: 795.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3826 [2024-06-11 03:55:54,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.68 | bwd_microstep: 1347.11 | bwd_inner_microstep: 1347.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-11 03:55:56,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.62 | bwd_microstep: 1210.41 | bwd_inner_microstep: 1210.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-11 03:55:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.16 | bwd_microstep: 793.83 | bwd_inner_microstep: 793.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 03:55:59,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.64 | bwd_microstep: 1291.33 | bwd_inner_microstep: 1291.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-11 03:56:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.23 | bwd_microstep: 1637.86 | bwd_inner_microstep: 1637.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1981 [2024-06-11 03:56:02,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 703.99 | bwd_inner_microstep: 703.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 03:56:04,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.16 | bwd_microstep: 1371.93 | bwd_inner_microstep: 1371.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 03:56:06,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.62 | bwd_microstep: 1449.03 | bwd_inner_microstep: 1449.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-11 03:56:08,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.59 | bwd_microstep: 1457.37 | bwd_inner_microstep: 1457.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1978 [2024-06-11 03:56:09,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.76 | bwd_microstep: 765.22 | bwd_inner_microstep: 765.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920 [2024-06-11 03:56:10,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.38 | bwd_microstep: 716.07 | bwd_inner_microstep: 716.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3807 [2024-06-11 03:56:13,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.54 | bwd_microstep: 1745.83 | bwd_inner_microstep: 1745.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3800 [2024-06-11 03:56:20,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.21 | optimizer_step: 6.60 [2024-06-11 03:56:20,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 636.53 | bwd_microstep: 6927.40 | bwd_inner_microstep: 1981.48 | bwd_allreduce_microstep: 4945.84 | step_microstep: 40.41 [2024-06-11 03:56:20,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15745.94 | bwd: 47114.07 | bwd_inner: 42167.31 | bwd_allreduce: 4946.08 | step: 41.85 {'loss': 1.1611, 'learning_rate': 9.172148719990237e-07, 'epoch': 0.91} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 03:56:22,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.79 | bwd_microstep: 1376.81 | bwd_inner_microstep: 1376.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2318 [2024-06-11 03:56:23,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.77 | bwd_microstep: 882.43 | bwd_inner_microstep: 882.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-11 03:56:26,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.99 | bwd_microstep: 1545.82 | bwd_inner_microstep: 1545.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 03:56:27,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1373.27 | bwd_inner_microstep: 1373.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 03:56:29,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.29 | bwd_microstep: 1373.78 | bwd_inner_microstep: 1373.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3807 [2024-06-11 03:56:31,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.68 | bwd_microstep: 1379.98 | bwd_inner_microstep: 1379.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758 [2024-06-11 03:56:33,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.18 | bwd_microstep: 1542.57 | bwd_inner_microstep: 1542.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 03:56:35,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.72 | bwd_microstep: 1383.85 | bwd_inner_microstep: 1383.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 03:56:37,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.69 | bwd_microstep: 1245.33 | bwd_inner_microstep: 1245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 03:56:39,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 03:56:40,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.25 | bwd_microstep: 1287.50 | bwd_inner_microstep: 1287.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 03:56:42,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.34 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1959 [2024-06-11 03:56:43,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.57 | bwd_microstep: 765.09 | bwd_inner_microstep: 765.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3503 [2024-06-11 03:56:45,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.91 | bwd_microstep: 1438.76 | bwd_inner_microstep: 1438.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3498 [2024-06-11 03:56:48,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.97 | bwd_microstep: 1579.32 | bwd_inner_microstep: 1579.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3956 [2024-06-11 03:56:50,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.31 | bwd_microstep: 1793.91 | bwd_inner_microstep: 1793.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3816 [2024-06-11 03:56:52,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1484.00 | bwd_inner_microstep: 1483.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3524 [2024-06-11 03:56:54,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.59 | bwd_microstep: 1419.98 | bwd_inner_microstep: 1419.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642 [2024-06-11 03:56:56,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.16 | bwd_microstep: 1407.14 | bwd_inner_microstep: 1407.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 03:56:58,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.15 | bwd_microstep: 1251.44 | bwd_inner_microstep: 1251.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3860 [2024-06-11 03:57:00,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.44 | bwd_microstep: 1368.12 | bwd_inner_microstep: 1368.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700 [2024-06-11 03:57:02,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1427.93 | bwd_inner_microstep: 1427.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-11 03:57:04,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1406.89 | bwd_inner_microstep: 1406.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 03:57:05,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1295.00 | bwd_inner_microstep: 1294.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3679 [2024-06-11 03:57:07,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.08 | bwd_microstep: 1458.55 | bwd_inner_microstep: 1458.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 03:57:10,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.18 | bwd_microstep: 1644.48 | bwd_inner_microstep: 1644.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 03:57:11,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.54 | bwd_microstep: 1257.38 | bwd_inner_microstep: 1257.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085 [2024-06-11 03:57:13,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.07 | bwd_microstep: 851.49 | bwd_inner_microstep: 851.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 03:57:15,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.27 | bwd_microstep: 1410.37 | bwd_inner_microstep: 1410.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3719 [2024-06-11 03:57:17,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1533.08 | bwd_inner_microstep: 1533.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 714 [2024-06-11 03:57:17,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 117.45 | bwd_microstep: 289.69 | bwd_inner_microstep: 289.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-11 03:57:24,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.01 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 03:57:24,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.17 | bwd_microstep: 6145.02 | bwd_inner_microstep: 1810.77 | bwd_allreduce_microstep: 4334.19 | step_microstep: 38.16 [2024-06-11 03:57:24,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16003.79 | bwd: 47255.63 | bwd_inner: 42920.54 | bwd_allreduce: 4334.42 | step: 39.63 {'loss': 1.2075, 'learning_rate': 9.060121991255566e-07, 'epoch': 0.91} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 03:57:26,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.83 | bwd_microstep: 1270.70 | bwd_inner_microstep: 1270.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 03:57:28,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.99 | bwd_microstep: 1380.48 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2348 [2024-06-11 03:57:29,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.00 | bwd_microstep: 981.95 | bwd_inner_microstep: 981.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3780 [2024-06-11 03:57:31,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1474.89 | bwd_inner_microstep: 1474.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899 [2024-06-11 03:57:32,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.16 | bwd_microstep: 712.92 | bwd_inner_microstep: 712.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 03:57:34,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.65 | bwd_microstep: 1384.42 | bwd_inner_microstep: 1384.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 03:57:36,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.95 | bwd_microstep: 1250.73 | bwd_inner_microstep: 1250.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 03:57:37,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.38 | bwd_microstep: 1386.47 | bwd_inner_microstep: 1386.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1945 [2024-06-11 03:57:39,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.88 | bwd_microstep: 793.03 | bwd_inner_microstep: 793.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3431 [2024-06-11 03:57:40,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.93 | bwd_microstep: 1393.35 | bwd_inner_microstep: 1393.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3686 [2024-06-11 03:57:43,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.80 | bwd_microstep: 1566.42 | bwd_inner_microstep: 1566.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3391 [2024-06-11 03:57:44,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.45 | bwd_microstep: 1335.91 | bwd_inner_microstep: 1335.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 03:57:46,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.50 | bwd_microstep: 1384.51 | bwd_inner_microstep: 1384.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-11 03:57:48,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.31 | bwd_microstep: 1248.87 | bwd_inner_microstep: 1248.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-11 03:57:50,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.83 | bwd_microstep: 1494.27 | bwd_inner_microstep: 1494.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 03:57:52,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.25 | bwd_microstep: 1479.48 | bwd_inner_microstep: 1479.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 03:57:54,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.13 | bwd_microstep: 1348.99 | bwd_inner_microstep: 1348.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3640 [2024-06-11 03:57:56,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.23 | bwd_microstep: 1708.69 | bwd_inner_microstep: 1708.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3641 [2024-06-11 03:57:59,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.90 | bwd_microstep: 1614.10 | bwd_inner_microstep: 1614.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-11 03:58:01,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.45 | bwd_microstep: 1657.32 | bwd_inner_microstep: 1657.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-11 03:58:03,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.48 | bwd_microstep: 1521.79 | bwd_inner_microstep: 1521.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3515 [2024-06-11 03:58:05,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.13 | bwd_microstep: 1287.74 | bwd_inner_microstep: 1287.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2025 [2024-06-11 03:58:06,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.61 | bwd_microstep: 714.94 | bwd_inner_microstep: 714.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-11 03:58:08,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.99 | bwd_microstep: 1393.29 | bwd_inner_microstep: 1393.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-11 03:58:09,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.16 | bwd_microstep: 918.39 | bwd_inner_microstep: 918.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2013 [2024-06-11 03:58:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.10 | bwd_microstep: 900.83 | bwd_inner_microstep: 900.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2068 [2024-06-11 03:58:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.27 | bwd_microstep: 1009.65 | bwd_inner_microstep: 1009.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800 [2024-06-11 03:58:14,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.74 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 03:58:16,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.96 | bwd_microstep: 1377.82 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3903 [2024-06-11 03:58:18,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1426.00 | bwd_inner_microstep: 1425.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 03:58:20,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1554.96 | bwd_inner_microstep: 1554.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-11 03:58:26,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.58 [2024-06-11 03:58:26,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.40 | bwd_microstep: 5246.98 | bwd_inner_microstep: 1881.47 | bwd_allreduce_microstep: 3365.45 | step_microstep: 40.74 [2024-06-11 03:58:26,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15711.29 | bwd: 45671.09 | bwd_inner: 42304.72 | bwd_allreduce: 3365.69 | step: 42.27 {'loss': 1.1893, 'learning_rate': 8.948767751778598e-07, 'epoch': 0.91} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3453 [2024-06-11 03:58:27,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.39 | bwd_microstep: 1280.00 | bwd_inner_microstep: 1279.83 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 03:58:29,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.29 | bwd_microstep: 1350.28 | bwd_inner_microstep: 1350.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862 [2024-06-11 03:58:31,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.42 | bwd_microstep: 1459.38 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 03:58:33,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.61 | bwd_microstep: 1549.40 | bwd_inner_microstep: 1549.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3862 [2024-06-11 03:58:36,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.99 | bwd_microstep: 1563.33 | bwd_inner_microstep: 1563.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 03:58:37,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.22 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 03:58:39,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1384.12 | bwd_inner_microstep: 1384.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 03:58:41,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1397.51 | bwd_inner_microstep: 1397.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637 [2024-06-11 03:58:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1418.96 | bwd_inner_microstep: 1418.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2162 [2024-06-11 03:58:45,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.50 | bwd_microstep: 949.00 | bwd_inner_microstep: 948.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 03:58:46,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1253.55 | bwd_inner_microstep: 1253.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-11 03:58:48,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.20 | bwd_microstep: 1346.25 | bwd_inner_microstep: 1346.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3434 [2024-06-11 03:58:50,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.14 | bwd_microstep: 1407.70 | bwd_inner_microstep: 1407.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 03:58:52,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.35 | bwd_microstep: 1486.98 | bwd_inner_microstep: 1486.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-11 03:58:54,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.90 | bwd_microstep: 1450.44 | bwd_inner_microstep: 1450.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 03:58:56,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1481.77 | bwd_inner_microstep: 1481.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2005 [2024-06-11 03:58:57,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.05 | bwd_microstep: 833.09 | bwd_inner_microstep: 833.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3459 [2024-06-11 03:58:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.75 | bwd_microstep: 1229.44 | bwd_inner_microstep: 1229.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3662 [2024-06-11 03:59:01,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.51 | bwd_microstep: 1542.50 | bwd_inner_microstep: 1542.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3635 [2024-06-11 03:59:03,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1506.04 | bwd_inner_microstep: 1506.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2684 [2024-06-11 03:59:05,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.59 | bwd_microstep: 1025.54 | bwd_inner_microstep: 1025.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2113 [2024-06-11 03:59:06,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.84 | bwd_microstep: 828.05 | bwd_inner_microstep: 828.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-11 03:59:08,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.94 | bwd_microstep: 1502.68 | bwd_inner_microstep: 1502.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2081 [2024-06-11 03:59:09,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.53 | bwd_microstep: 725.59 | bwd_inner_microstep: 725.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-11 03:59:11,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1481.72 | bwd_inner_microstep: 1481.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3759 [2024-06-11 03:59:13,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1250.12 | bwd_inner_microstep: 1250.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 03:59:14,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.67 | bwd_microstep: 1290.34 | bwd_inner_microstep: 1290.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3655 [2024-06-11 03:59:16,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.00 | bwd_microstep: 1422.52 | bwd_inner_microstep: 1422.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 03:59:19,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.37 | bwd_microstep: 1642.74 | bwd_inner_microstep: 1642.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-11 03:59:21,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.05 | bwd_microstep: 1508.16 | bwd_inner_microstep: 1508.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597 [2024-06-11 03:59:23,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.14 | bwd_microstep: 1402.52 | bwd_inner_microstep: 1402.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3814 [2024-06-11 03:59:27,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-11 03:59:27,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.73 | bwd_microstep: 3287.26 | bwd_inner_microstep: 1446.82 | bwd_allreduce_microstep: 1840.39 | step_microstep: 37.98 [2024-06-11 03:59:27,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15974.53 | bwd: 44643.35 | bwd_inner: 42801.93 | bwd_allreduce: 1840.68 | step: 39.55 {'loss': 1.1215, 'learning_rate': 8.83808639374848e-07, 'epoch': 0.91} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 03:59:28,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1365.35 | bwd_inner_microstep: 1365.24 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3969 [2024-06-11 03:59:31,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.60 | bwd_microstep: 1600.54 | bwd_inner_microstep: 1600.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 03:59:32,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.22 | bwd_microstep: 786.61 | bwd_inner_microstep: 786.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-11 03:59:34,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.80 | bwd_microstep: 1295.92 | bwd_inner_microstep: 1295.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-11 03:59:36,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.85 | bwd_microstep: 1536.65 | bwd_inner_microstep: 1536.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3420 [2024-06-11 03:59:37,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.83 | bwd_microstep: 1251.06 | bwd_inner_microstep: 1251.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 03:59:39,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.38 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 03:59:41,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1250.45 | bwd_inner_microstep: 1250.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3410 [2024-06-11 03:59:42,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.02 | bwd_inner_microstep: 1150.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-11 03:59:44,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.78 | bwd_microstep: 1190.72 | bwd_inner_microstep: 1190.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 03:59:46,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.96 | bwd_microstep: 1248.28 | bwd_inner_microstep: 1248.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 03:59:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1283.64 | bwd_inner_microstep: 1283.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1956 [2024-06-11 03:59:49,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.26 | bwd_microstep: 889.69 | bwd_inner_microstep: 889.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 03:59:51,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.14 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3512 [2024-06-11 03:59:53,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1320.64 | bwd_inner_microstep: 1320.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-11 03:59:55,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.86 | bwd_microstep: 1439.33 | bwd_inner_microstep: 1439.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3517 [2024-06-11 03:59:56,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.97 | bwd_microstep: 1288.93 | bwd_inner_microstep: 1288.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 03:59:58,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.81 | bwd_microstep: 1544.20 | bwd_inner_microstep: 1544.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-11 04:00:00,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1317.47 | bwd_inner_microstep: 1317.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-11 04:00:02,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1348.86 | bwd_inner_microstep: 1348.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-11 04:00:04,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.80 | bwd_microstep: 1280.95 | bwd_inner_microstep: 1280.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2111 [2024-06-11 04:00:05,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.46 | bwd_microstep: 825.92 | bwd_inner_microstep: 825.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3522 [2024-06-11 04:00:07,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.93 | bwd_microstep: 1452.67 | bwd_inner_microstep: 1452.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 04:00:09,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.19 | bwd_microstep: 1386.33 | bwd_inner_microstep: 1386.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 04:00:11,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1491.99 | bwd_inner_microstep: 1491.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3570 [2024-06-11 04:00:13,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.18 | bwd_microstep: 1459.73 | bwd_inner_microstep: 1459.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2059 [2024-06-11 04:00:14,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.98 | bwd_microstep: 820.73 | bwd_inner_microstep: 820.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3481 [2024-06-11 04:00:16,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.21 | bwd_microstep: 1572.63 | bwd_inner_microstep: 1572.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3562 [2024-06-11 04:00:19,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.55 | bwd_microstep: 1591.86 | bwd_inner_microstep: 1591.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897 [2024-06-11 04:00:20,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.94 | bwd_microstep: 777.12 | bwd_inner_microstep: 777.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2995 [2024-06-11 04:00:21,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.00 | bwd_microstep: 1239.62 | bwd_inner_microstep: 1239.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3577 [2024-06-11 04:00:25,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.19 | optimizer_step: 6.57 [2024-06-11 04:00:25,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.61 | bwd_microstep: 3545.73 | bwd_inner_microstep: 1528.30 | bwd_allreduce_microstep: 2017.38 | step_microstep: 38.05 [2024-06-11 04:00:25,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15429.20 | bwd: 43217.52 | bwd_inner: 41199.14 | bwd_allreduce: 2017.66 | step: 39.64 91%|█████████ | 1563/1726 [27:17:09<3:52:50, 85.71s/it] 91%|█████████ | 1564/1726 [27:18:57<4:09:32, 92.43s/it] 91%|█████████ | 1564/1726 [27:18:57<4:09:32, 92.43s/it] 91%|█████████ | 1565/1726 [27:20:01<3:44:47, 83.78s/it] 91%|█████████ | 1565/1726 [27:20:01<3:44:47, 83.78s/it] 91%|█████████ | 1566/1726 [27:21:02<3:25:45, 77.16s/it] 91%|█████████ | 1566/1726 [27:21:02<3:25:45, 77.16s/it] 91%|█████████ | 1567/1726 [27:22:03<3:11:35, 72.30s/it] 91%|█████████ | 1567/1726 [27:22:03<3:11:35, 72.30s/it] 91%|█████████ | 1568/1726 [27:23:02<2:59:51, 68.30s/it] {'loss': 1.1624, 'learning_rate': 8.728078306984322e-07, 'epoch': 0.91} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3469 [2024-06-11 04:00:28,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.34 | bwd_microstep: 1569.20 | bwd_inner_microstep: 1569.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2218 [2024-06-11 04:00:29,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.15 | bwd_microstep: 902.30 | bwd_inner_microstep: 902.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1931 [2024-06-11 04:00:30,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.42 | bwd_microstep: 794.76 | bwd_inner_microstep: 794.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3770 [2024-06-11 04:00:32,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.86 | bwd_microstep: 1402.83 | bwd_inner_microstep: 1402.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 04:00:34,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1385.20 | bwd_inner_microstep: 1385.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-11 04:00:35,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.27 | bwd_microstep: 810.64 | bwd_inner_microstep: 810.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 04:00:37,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1358.79 | bwd_inner_microstep: 1358.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-11 04:00:38,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.79 | bwd_microstep: 789.44 | bwd_inner_microstep: 789.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 04:00:40,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3730 [2024-06-11 04:00:42,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1565.01 | bwd_inner_microstep: 1564.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2161 [2024-06-11 04:00:43,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.86 | bwd_microstep: 980.93 | bwd_inner_microstep: 980.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3667 [2024-06-11 04:00:46,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.33 | bwd_microstep: 1687.64 | bwd_inner_microstep: 1687.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-11 04:00:48,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.54 | bwd_microstep: 1627.36 | bwd_inner_microstep: 1627.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3528 [2024-06-11 04:00:50,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.91 | bwd_microstep: 1541.73 | bwd_inner_microstep: 1541.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3662 [2024-06-11 04:00:52,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.89 | bwd_microstep: 1717.70 | bwd_inner_microstep: 1717.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3525 [2024-06-11 04:00:55,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.12 | bwd_microstep: 1690.03 | bwd_inner_microstep: 1690.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 04:00:57,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.96 | bwd_microstep: 1392.57 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 04:00:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1356.45 | bwd_inner_microstep: 1356.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 04:01:00,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.14 | bwd_microstep: 1277.44 | bwd_inner_microstep: 1277.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2373 [2024-06-11 04:01:02,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.11 | bwd_microstep: 905.64 | bwd_inner_microstep: 905.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3727 [2024-06-11 04:01:04,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.22 | bwd_microstep: 1601.91 | bwd_inner_microstep: 1601.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2023 [2024-06-11 04:01:05,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.04 | bwd_microstep: 714.49 | bwd_inner_microstep: 714.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-11 04:01:07,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.62 | bwd_microstep: 1521.09 | bwd_inner_microstep: 1521.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 04:01:09,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.06 | bwd_microstep: 1284.69 | bwd_inner_microstep: 1284.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-11 04:01:11,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.92 | bwd_microstep: 1403.31 | bwd_inner_microstep: 1403.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-11 04:01:13,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.49 | bwd_microstep: 1536.01 | bwd_inner_microstep: 1535.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-11 04:01:15,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.73 | bwd_microstep: 1441.08 | bwd_inner_microstep: 1441.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1951 [2024-06-11 04:01:16,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.32 | bwd_microstep: 702.05 | bwd_inner_microstep: 702.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 04:01:18,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1395.39 | bwd_inner_microstep: 1395.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3768 [2024-06-11 04:01:20,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.17 | bwd_microstep: 1649.22 | bwd_inner_microstep: 1649.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3581 [2024-06-11 04:01:22,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.04 | bwd_microstep: 1547.42 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3538 [2024-06-11 04:01:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.99 | optimizer_gradients: 4.04 | optimizer_step: 6.60 [2024-06-11 04:01:26,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.64 | bwd_microstep: 3305.60 | bwd_inner_microstep: 1869.78 | bwd_allreduce_microstep: 1435.78 | step_microstep: 37.80 [2024-06-11 04:01:26,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15886.81 | bwd: 44244.54 | bwd_inner: 42807.83 | bwd_allreduce: 1436.02 | step: 39.28 {'loss': 1.1727, 'learning_rate': 8.618743878934022e-07, 'epoch': 0.91} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-11 04:01:28,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.25 | bwd_microstep: 1475.83 | bwd_inner_microstep: 1475.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3954 [2024-06-11 04:01:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.04 | bwd_microstep: 1593.05 | bwd_inner_microstep: 1593.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 04:01:32,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.97 | bwd_microstep: 1246.32 | bwd_inner_microstep: 1246.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-11 04:01:33,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.35 | bwd_microstep: 675.28 | bwd_inner_microstep: 675.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-11 04:01:35,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.36 | bwd_microstep: 1539.33 | bwd_inner_microstep: 1539.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 04:01:37,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.19 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 04:01:39,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.54 | bwd_inner_microstep: 1480.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 04:01:41,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.92 | bwd_microstep: 1287.87 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2448 [2024-06-11 04:01:42,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.51 | bwd_microstep: 978.13 | bwd_inner_microstep: 978.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3481 [2024-06-11 04:01:44,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.65 | bwd_microstep: 1409.90 | bwd_inner_microstep: 1409.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 04:01:46,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1489.31 | bwd_inner_microstep: 1489.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-11 04:01:47,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.61 | bwd_microstep: 884.74 | bwd_inner_microstep: 884.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 04:01:49,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.92 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974 [2024-06-11 04:01:50,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.01 | bwd_microstep: 889.81 | bwd_inner_microstep: 889.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3663 [2024-06-11 04:01:52,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.83 | bwd_microstep: 1484.35 | bwd_inner_microstep: 1484.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-11 04:01:54,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.08 | bwd_microstep: 1483.27 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3641 [2024-06-11 04:01:56,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.86 | bwd_microstep: 1313.02 | bwd_inner_microstep: 1313.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 636 [2024-06-11 04:01:57,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.95 | bwd_microstep: 263.90 | bwd_inner_microstep: 263.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 04:01:59,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.10 | bwd_microstep: 1391.15 | bwd_inner_microstep: 1391.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983 [2024-06-11 04:02:00,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.82 | bwd_microstep: 706.00 | bwd_inner_microstep: 705.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3448 [2024-06-11 04:02:01,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.06 | bwd_microstep: 1156.15 | bwd_inner_microstep: 1156.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3599 [2024-06-11 04:02:03,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.27 | bwd_microstep: 1310.73 | bwd_inner_microstep: 1310.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 04:02:05,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.11 | bwd_microstep: 1510.67 | bwd_inner_microstep: 1510.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3597 [2024-06-11 04:02:07,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.71 | bwd_microstep: 1404.51 | bwd_inner_microstep: 1404.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3613 [2024-06-11 04:02:09,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.98 | bwd_microstep: 1467.28 | bwd_inner_microstep: 1467.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3561 [2024-06-11 04:02:11,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.51 | bwd_microstep: 1501.70 | bwd_inner_microstep: 1501.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3805 [2024-06-11 04:02:13,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.05 | bwd_microstep: 1646.66 | bwd_inner_microstep: 1646.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3600 [2024-06-11 04:02:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.04 | bwd_microstep: 1596.76 | bwd_inner_microstep: 1596.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-11 04:02:17,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1425.92 | bwd_inner_microstep: 1425.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3457 [2024-06-11 04:02:19,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1434.50 | bwd_inner_microstep: 1434.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-11 04:02:21,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1314.53 | bwd_inner_microstep: 1314.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3001 [2024-06-11 04:02:27,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.09 | optimizer_step: 6.56 [2024-06-11 04:02:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.45 | bwd_microstep: 5437.78 | bwd_inner_microstep: 1518.76 | bwd_allreduce_microstep: 3918.97 | step_microstep: 37.92 [2024-06-11 04:02:27,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15495.36 | bwd: 45476.13 | bwd_inner: 41556.25 | bwd_allreduce: 3919.20 | step: 39.40 {'loss': 1.1421, 'learning_rate': 8.510083494672905e-07, 'epoch': 0.91} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1919 [2024-06-11 04:02:28,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.70 | bwd_microstep: 870.62 | bwd_inner_microstep: 870.50 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3896 [2024-06-11 04:02:31,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.79 | bwd_microstep: 1485.48 | bwd_inner_microstep: 1485.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 04:02:32,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.00 | bwd_microstep: 1373.98 | bwd_inner_microstep: 1373.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 04:02:34,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1274.50 | bwd_inner_microstep: 1274.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3757 [2024-06-11 04:02:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.27 | bwd_microstep: 1639.31 | bwd_inner_microstep: 1639.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 04:02:38,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.31 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-11 04:02:40,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1410.68 | bwd_inner_microstep: 1410.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-11 04:02:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.26 | bwd_microstep: 796.95 | bwd_inner_microstep: 796.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1969 [2024-06-11 04:02:42,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 795.36 | bwd_inner_microstep: 795.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 04:02:44,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1388.28 | bwd_inner_microstep: 1388.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1914 [2024-06-11 04:02:45,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 717.73 | bwd_inner_microstep: 717.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-11 04:02:46,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.35 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-11 04:02:48,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.60 | bwd_microstep: 1282.18 | bwd_inner_microstep: 1282.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 04:02:50,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.31 | bwd_microstep: 1255.67 | bwd_inner_microstep: 1255.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2729 [2024-06-11 04:02:51,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.56 | bwd_microstep: 1118.07 | bwd_inner_microstep: 1118.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3410 [2024-06-11 04:02:54,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.31 | bwd_microstep: 1503.61 | bwd_inner_microstep: 1503.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829 [2024-06-11 04:02:56,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.03 | bwd_microstep: 1583.06 | bwd_inner_microstep: 1583.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625 [2024-06-11 04:02:58,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.18 | bwd_microstep: 1531.57 | bwd_inner_microstep: 1531.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-11 04:03:00,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.24 | bwd_microstep: 1494.67 | bwd_inner_microstep: 1494.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3702 [2024-06-11 04:03:02,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.75 | bwd_microstep: 1330.32 | bwd_inner_microstep: 1330.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 04:03:04,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.08 | bwd_microstep: 1398.30 | bwd_inner_microstep: 1398.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3542 [2024-06-11 04:03:06,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.72 | bwd_microstep: 1486.63 | bwd_inner_microstep: 1486.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 04:03:08,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.66 | bwd_microstep: 1553.32 | bwd_inner_microstep: 1553.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 04:03:11,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.15 | bwd_microstep: 2238.31 | bwd_inner_microstep: 2238.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-11 04:03:12,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 978.71 | bwd_inner_microstep: 978.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428 [2024-06-11 04:03:14,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.72 | bwd_microstep: 1346.77 | bwd_inner_microstep: 1346.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3387 [2024-06-11 04:03:15,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.49 | bwd_microstep: 1145.06 | bwd_inner_microstep: 1145.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3059 [2024-06-11 04:03:17,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.65 | bwd_microstep: 1269.03 | bwd_inner_microstep: 1269.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2278 [2024-06-11 04:03:19,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.98 | bwd_microstep: 971.43 | bwd_inner_microstep: 971.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815 [2024-06-11 04:03:21,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.43 | bwd_microstep: 1597.03 | bwd_inner_microstep: 1597.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2239 [2024-06-11 04:03:22,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.20 | bwd_microstep: 962.98 | bwd_inner_microstep: 962.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3564 [2024-06-11 04:03:28,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.14 | optimizer_step: 6.60 [2024-06-11 04:03:28,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 5314.12 | bwd_inner_microstep: 1461.22 | bwd_allreduce_microstep: 3852.84 | step_microstep: 38.93 [2024-06-11 04:03:28,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15117.44 | bwd: 45200.74 | bwd_inner: 41346.89 | bwd_allreduce: 3853.12 | step: 40.40 {'loss': 1.2051, 'learning_rate': 8.402097536902221e-07, 'epoch': 0.91} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 04:03:30,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1365.99 | bwd_inner_microstep: 1365.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3473 [2024-06-11 04:03:31,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.97 | bwd_microstep: 1210.63 | bwd_inner_microstep: 1210.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-11 04:03:34,133] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.51 | bwd_microstep: 1550.33 | bwd_inner_microstep: 1550.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3792 [2024-06-11 04:03:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.50 | bwd_microstep: 1478.17 | bwd_inner_microstep: 1478.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-11 04:03:37,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.42 | bwd_microstep: 1148.77 | bwd_inner_microstep: 1148.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 04:03:39,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.48 | bwd_microstep: 1245.61 | bwd_inner_microstep: 1245.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3755 [2024-06-11 04:03:41,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1535.59 | bwd_inner_microstep: 1535.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 04:03:43,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.63 | bwd_microstep: 1283.93 | bwd_inner_microstep: 1283.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-11 04:03:45,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.10 | bwd_microstep: 1539.15 | bwd_inner_microstep: 1539.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 4031 [2024-06-11 04:03:47,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.04 | bwd_microstep: 1564.41 | bwd_inner_microstep: 1564.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445 [2024-06-11 04:03:49,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.69 | bwd_microstep: 1214.35 | bwd_inner_microstep: 1214.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3627 [2024-06-11 04:03:51,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.91 | bwd_microstep: 1424.20 | bwd_inner_microstep: 1424.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3836 [2024-06-11 04:03:53,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.07 | bwd_microstep: 1758.30 | bwd_inner_microstep: 1758.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-11 04:03:55,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.42 | bwd_microstep: 1406.85 | bwd_inner_microstep: 1406.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 04:03:57,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1343.55 | bwd_inner_microstep: 1343.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-11 04:03:59,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.84 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1605.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3661 [2024-06-11 04:04:01,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.31 | bwd_microstep: 1283.12 | bwd_inner_microstep: 1283.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1993 [2024-06-11 04:04:02,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 803.22 | bwd_inner_microstep: 803.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3703 [2024-06-11 04:04:04,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.24 | bwd_microstep: 1433.35 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-11 04:04:06,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.93 | bwd_microstep: 1417.35 | bwd_inner_microstep: 1417.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-11 04:04:08,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.56 | bwd_microstep: 1254.78 | bwd_inner_microstep: 1254.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3520 [2024-06-11 04:04:10,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.43 | bwd_microstep: 1317.94 | bwd_inner_microstep: 1317.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3609 [2024-06-11 04:04:12,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.56 | bwd_microstep: 1453.03 | bwd_inner_microstep: 1453.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816 [2024-06-11 04:04:14,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1386.05 | bwd_inner_microstep: 1386.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3604 [2024-06-11 04:04:15,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1409.01 | bwd_inner_microstep: 1408.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-11 04:04:17,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.28 | bwd_microstep: 1441.55 | bwd_inner_microstep: 1441.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 04:04:19,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1405.30 | bwd_inner_microstep: 1405.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 04:04:21,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 1387.25 | bwd_inner_microstep: 1387.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-11 04:04:23,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.09 | bwd_microstep: 1444.00 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-11 04:04:25,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.06 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-11 04:04:27,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.60 | bwd_microstep: 1402.90 | bwd_inner_microstep: 1402.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2267 [2024-06-11 04:04:29,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.87 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-11 04:04:29,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.91 | bwd_microstep: 1104.81 | bwd_inner_microstep: 962.46 | bwd_allreduce_microstep: 142.30 | step_microstep: 37.77 [2024-06-11 04:04:29,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16432.07 | bwd: 44055.98 | bwd_inner: 43912.78 | bwd_allreduce: 142.52 | step: 39.24 {'loss': 1.1591, 'learning_rate': 8.29478638594805e-07, 'epoch': 0.91} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-11 04:04:31,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1497.76 | bwd_inner_microstep: 1497.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3409 [2024-06-11 04:04:32,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.21 | bwd_microstep: 1148.66 | bwd_inner_microstep: 1148.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-11 04:04:34,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1483.31 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-11 04:04:37,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.92 | bwd_microstep: 1495.45 | bwd_inner_microstep: 1495.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-11 04:04:38,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.29 | bwd_microstep: 1184.61 | bwd_inner_microstep: 1184.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 04:04:40,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.98 | bwd_microstep: 1511.69 | bwd_inner_microstep: 1511.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 04:04:42,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.70 | bwd_microstep: 1286.31 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-11 04:04:44,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.61 | bwd_microstep: 1630.26 | bwd_inner_microstep: 1630.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623 [2024-06-11 04:04:46,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.02 | bwd_microstep: 1343.69 | bwd_inner_microstep: 1343.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3482 [2024-06-11 04:04:48,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.45 | bwd_microstep: 1413.08 | bwd_inner_microstep: 1413.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-11 04:04:50,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.28 | bwd_microstep: 1415.95 | bwd_inner_microstep: 1415.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2176 [2024-06-11 04:04:51,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1050.76 | bwd_inner_microstep: 1050.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-11 04:04:54,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.76 | bwd_microstep: 1482.11 | bwd_inner_microstep: 1482.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 04:04:55,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1380.47 | bwd_inner_microstep: 1380.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3514 [2024-06-11 04:04:57,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1345.26 | bwd_inner_microstep: 1345.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939 [2024-06-11 04:05:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.13 | bwd_microstep: 1690.37 | bwd_inner_microstep: 1690.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-11 04:05:02,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.27 | bwd_microstep: 1396.59 | bwd_inner_microstep: 1396.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3639 [2024-06-11 04:05:03,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.69 | bwd_microstep: 1408.71 | bwd_inner_microstep: 1408.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3740 [2024-06-11 04:05:05,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.40 | bwd_microstep: 1340.58 | bwd_inner_microstep: 1340.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 04:05:07,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.46 | bwd_microstep: 1391.18 | bwd_inner_microstep: 1391.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3693 [2024-06-11 04:05:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1426.71 | bwd_inner_microstep: 1426.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 04:05:11,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.31 | bwd_microstep: 1557.18 | bwd_inner_microstep: 1557.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3638 [2024-06-11 04:05:14,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1616.47 | bwd_inner_microstep: 1616.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2108 [2024-06-11 04:05:15,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.43 | bwd_microstep: 824.31 | bwd_inner_microstep: 824.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2811 [2024-06-11 04:05:16,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.39 | bwd_microstep: 1056.48 | bwd_inner_microstep: 1056.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3609 [2024-06-11 04:05:18,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.25 | bwd_microstep: 1469.81 | bwd_inner_microstep: 1469.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076 [2024-06-11 04:05:19,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.13 | bwd_microstep: 823.92 | bwd_inner_microstep: 823.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 04:05:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1401.83 | bwd_inner_microstep: 1401.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-11 04:05:23,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.23 | bwd_microstep: 1448.52 | bwd_inner_microstep: 1448.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-11 04:05:25,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1392.98 | bwd_inner_microstep: 1392.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2027 [2024-06-11 04:05:26,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.28 | bwd_microstep: 904.95 | bwd_inner_microstep: 904.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3582 [2024-06-11 04:05:29,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.03 | optimizer_step: 6.60 [2024-06-11 04:05:29,596] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.64 | bwd_microstep: 1980.68 | bwd_inner_microstep: 1761.39 | bwd_allreduce_microstep: 219.25 | step_microstep: 37.55 [2024-06-11 04:05:29,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16229.89 | bwd: 43800.65 | bwd_inner: 43580.51 | bwd_allreduce: 219.47 | step: 39.04 91%|█████████ | 1568/1726 [27:23:02<2:59:51, 68.30s/it] 91%|█████████ | 1569/1726 [27:24:03<2:52:34, 65.95s/it] 91%|█████████ | 1569/1726 [27:24:03<2:52:34, 65.95s/it] 91%|█████████ | 1570/1726 [27:25:04<2:47:51, 64.56s/it] 91%|█████████ | 1570/1726 [27:25:04<2:47:51, 64.56s/it] 91%|█████████ | 1571/1726 [27:26:05<2:43:44, 63.38s/it] 91%|█████████ | 1571/1726 [27:26:05<2:43:44, 63.38s/it] 91%|█████████ | 1572/1726 [27:27:05<2:40:42, 62.62s/it] 91%|█████████ | 1572/1726 [27:27:05<2:40:42, 62.62s/it] 91%|█████████ | 1573/1726 [27:28:06<2:37:56, {'loss': 1.1449, 'learning_rate': 8.188150419759577e-07, 'epoch': 0.91} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3394 [2024-06-11 04:05:31,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.19 | bwd_microstep: 1269.97 | bwd_inner_microstep: 1269.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1920 [2024-06-11 04:05:32,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.08 | bwd_microstep: 827.89 | bwd_inner_microstep: 827.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2689 [2024-06-11 04:05:34,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.91 | bwd_microstep: 1081.91 | bwd_inner_microstep: 1081.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758 [2024-06-11 04:05:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.43 | bwd_microstep: 1538.93 | bwd_inner_microstep: 1538.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1922 [2024-06-11 04:05:37,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.04 | bwd_microstep: 758.10 | bwd_inner_microstep: 758.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-11 04:05:39,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1529.89 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-11 04:05:41,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1436.44 | bwd_inner_microstep: 1436.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-11 04:05:43,216] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.69 | bwd_microstep: 1413.27 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 04:05:45,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.96 | bwd_microstep: 1380.22 | bwd_inner_microstep: 1380.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 04:05:47,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.24 | bwd_microstep: 1483.53 | bwd_inner_microstep: 1483.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3520 [2024-06-11 04:05:49,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.74 | bwd_microstep: 1482.81 | bwd_inner_microstep: 1482.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1990 [2024-06-11 04:05:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.88 | bwd_microstep: 829.85 | bwd_inner_microstep: 829.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 04:05:52,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.28 | bwd_microstep: 1479.73 | bwd_inner_microstep: 1479.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3691 [2024-06-11 04:05:54,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.55 | bwd_microstep: 1526.79 | bwd_inner_microstep: 1526.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030 [2024-06-11 04:05:55,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.35 | bwd_microstep: 903.44 | bwd_inner_microstep: 903.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971 [2024-06-11 04:05:56,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.77 | bwd_microstep: 703.63 | bwd_inner_microstep: 703.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-11 04:05:58,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.94 | bwd_microstep: 1548.22 | bwd_inner_microstep: 1548.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2142 [2024-06-11 04:06:00,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.27 | bwd_microstep: 833.60 | bwd_inner_microstep: 833.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 04:06:01,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.85 | bwd_microstep: 1376.61 | bwd_inner_microstep: 1376.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2176 [2024-06-11 04:06:03,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.37 | bwd_microstep: 889.13 | bwd_inner_microstep: 889.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 04:06:04,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.82 | bwd_microstep: 1295.49 | bwd_inner_microstep: 1295.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3692 [2024-06-11 04:06:06,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.87 | bwd_microstep: 1432.14 | bwd_inner_microstep: 1432.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3551 [2024-06-11 04:06:08,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.65 | bwd_microstep: 1206.51 | bwd_inner_microstep: 1206.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3464 [2024-06-11 04:06:10,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.70 | bwd_microstep: 1227.97 | bwd_inner_microstep: 1227.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3602 [2024-06-11 04:06:12,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1607.77 | bwd_inner_microstep: 1607.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 04:06:14,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.86 | bwd_microstep: 1660.50 | bwd_inner_microstep: 1660.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 04:06:16,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3569 [2024-06-11 04:06:18,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1300.82 | bwd_inner_microstep: 1300.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 04:06:20,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.69 | bwd_microstep: 1470.76 | bwd_inner_microstep: 1470.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-11 04:06:22,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.72 | bwd_microstep: 1405.44 | bwd_inner_microstep: 1405.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2088 [2024-06-11 04:06:24,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.61 | bwd_microstep: 1014.43 | bwd_inner_microstep: 1014.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 04:07:17,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-11 04:07:17,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 53194.54 | bwd_inner_microstep: 1750.65 | bwd_allreduce_microstep: 51443.82 | step_microstep: 39.12 [2024-06-11 04:07:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15349.76 | bwd: 92664.09 | bwd_inner: 41219.34 | bwd_allreduce: 51444.07 | step: 40.58 {'loss': 1.1917, 'learning_rate': 8.082190013908242e-07, 'epoch': 0.91} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-11 04:07:19,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.88 | bwd_microstep: 1468.36 | bwd_inner_microstep: 1468.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 04:07:21,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.37 | bwd_microstep: 786.16 | bwd_inner_microstep: 786.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4347 [2024-06-11 04:07:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.74 | bwd_microstep: 1793.21 | bwd_inner_microstep: 1793.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 04:07:25,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.13 | bwd_microstep: 1274.10 | bwd_inner_microstep: 1274.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 04:07:27,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1278.09 | bwd_inner_microstep: 1278.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3488 [2024-06-11 04:07:28,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.82 | bwd_microstep: 1342.76 | bwd_inner_microstep: 1342.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-11 04:07:30,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 790.34 | bwd_inner_microstep: 790.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3480 [2024-06-11 04:07:31,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.41 | bwd_microstep: 1182.89 | bwd_inner_microstep: 1182.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 04:07:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.76 | bwd_microstep: 1286.04 | bwd_inner_microstep: 1286.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2134 [2024-06-11 04:08:21,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.96 | bwd_microstep: 853.09 | bwd_inner_microstep: 853.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-11 04:08:23,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.94 | bwd_microstep: 1610.90 | bwd_inner_microstep: 1610.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-11 04:08:25,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.53 | bwd_microstep: 1314.13 | bwd_inner_microstep: 1314.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1949 [2024-06-11 04:08:26,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.35 | bwd_microstep: 879.89 | bwd_inner_microstep: 879.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3678 [2024-06-11 04:08:28,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.54 | bwd_microstep: 1361.96 | bwd_inner_microstep: 1361.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-11 04:08:30,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.56 | bwd_microstep: 1280.75 | bwd_inner_microstep: 1280.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-11 04:08:32,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.28 | bwd_microstep: 1398.03 | bwd_inner_microstep: 1398.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3395 [2024-06-11 04:08:34,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1238.33 | bwd_inner_microstep: 1238.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2098 [2024-06-11 04:08:35,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.76 | bwd_microstep: 817.94 | bwd_inner_microstep: 817.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451 [2024-06-11 04:08:36,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.01 | bwd_microstep: 1186.40 | bwd_inner_microstep: 1186.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 04:08:38,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.86 | bwd_microstep: 1452.07 | bwd_inner_microstep: 1452.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3611 [2024-06-11 04:08:40,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.80 | bwd_microstep: 1458.52 | bwd_inner_microstep: 1458.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1930 [2024-06-11 04:08:42,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.19 | bwd_microstep: 843.13 | bwd_inner_microstep: 843.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917 [2024-06-11 04:08:43,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.47 | bwd_microstep: 776.94 | bwd_inner_microstep: 776.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 04:08:45,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.42 | bwd_microstep: 1544.09 | bwd_inner_microstep: 1544.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-11 04:08:47,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1487.98 | bwd_inner_microstep: 1487.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-11 04:08:49,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.03 | bwd_microstep: 1395.35 | bwd_inner_microstep: 1395.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3557 [2024-06-11 04:08:51,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1518.91 | bwd_inner_microstep: 1518.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3704 [2024-06-11 04:08:53,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.33 | bwd_microstep: 1450.21 | bwd_inner_microstep: 1450.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 04:08:55,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.64 | bwd_microstep: 1486.29 | bwd_inner_microstep: 1486.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574 [2024-06-11 04:08:57,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.69 | bwd_microstep: 1496.24 | bwd_inner_microstep: 1496.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-11 04:08:59,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.26 | bwd_microstep: 1296.44 | bwd_inner_microstep: 1296.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3576 [2024-06-11 04:09:01,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.07 | optimizer_step: 6.67 [2024-06-11 04:09:01,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.25 | bwd_microstep: 1677.35 | bwd_inner_microstep: 1669.64 | bwd_allreduce_microstep: 7.65 | step_microstep: 37.48 [2024-06-11 04:09:01,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15399.48 | bwd: 41026.89 | bwd_inner: 41018.34 | bwd_allreduce: 7.88 | step: 38.96 {'loss': 1.179, 'learning_rate': 7.976905541585967e-07, 'epoch': 0.91} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1938 [2024-06-11 04:09:02,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 783.08 | bwd_inner_microstep: 783.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3899 [2024-06-11 04:09:04,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.52 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404 [2024-06-11 04:09:06,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.09 | bwd_microstep: 1182.00 | bwd_inner_microstep: 1181.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3855 [2024-06-11 04:09:08,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.62 | bwd_microstep: 1561.63 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767 [2024-06-11 04:09:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1570.80 | bwd_inner_microstep: 1570.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3782 [2024-06-11 04:09:12,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2263 [2024-06-11 04:09:13,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.90 | bwd_microstep: 872.93 | bwd_inner_microstep: 872.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-11 04:09:16,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.54 | bwd_microstep: 1534.11 | bwd_inner_microstep: 1534.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 04:09:17,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.85 | bwd_microstep: 1388.96 | bwd_inner_microstep: 1388.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 04:09:19,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1387.19 | bwd_inner_microstep: 1387.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 04:09:21,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1253.84 | bwd_inner_microstep: 1253.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-11 04:09:23,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1417.94 | bwd_inner_microstep: 1417.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674 [2024-06-11 04:09:25,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.22 | bwd_microstep: 1683.11 | bwd_inner_microstep: 1683.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3668 [2024-06-11 04:09:28,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 1584.10 | bwd_inner_microstep: 1584.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3497 [2024-06-11 04:09:30,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.34 | bwd_microstep: 1516.18 | bwd_inner_microstep: 1516.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-11 04:09:32,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1443.84 | bwd_inner_microstep: 1443.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3509 [2024-06-11 04:09:34,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.38 | bwd_microstep: 1683.38 | bwd_inner_microstep: 1683.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3526 [2024-06-11 04:09:36,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.33 | bwd_microstep: 1295.30 | bwd_inner_microstep: 1295.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2098 [2024-06-11 04:09:37,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.86 | bwd_microstep: 883.63 | bwd_inner_microstep: 883.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 04:09:39,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-11 04:09:40,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.05 | bwd_microstep: 726.33 | bwd_inner_microstep: 726.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3684 [2024-06-11 04:09:42,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.22 | bwd_microstep: 1328.29 | bwd_inner_microstep: 1328.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2281 [2024-06-11 04:09:43,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.71 | bwd_microstep: 974.15 | bwd_inner_microstep: 974.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2063 [2024-06-11 04:09:44,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.07 | bwd_microstep: 921.42 | bwd_inner_microstep: 921.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 04:09:46,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1375.34 | bwd_inner_microstep: 1375.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3533 [2024-06-11 04:09:48,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.29 | bwd_microstep: 1449.46 | bwd_inner_microstep: 1449.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820 [2024-06-11 04:09:51,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 671.84 | bwd_microstep: 1853.39 | bwd_inner_microstep: 1853.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 04:09:53,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.41 | bwd_microstep: 1664.03 | bwd_inner_microstep: 1664.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3491 [2024-06-11 04:09:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.87 | bwd_microstep: 1544.19 | bwd_inner_microstep: 1544.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 04:09:57,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.34 | bwd_microstep: 1383.84 | bwd_inner_microstep: 1383.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3629 [2024-06-11 04:09:59,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.48 | bwd_microstep: 1707.66 | bwd_inner_microstep: 1707.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3471 [2024-06-11 04:10:45,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.26 | optimizer_step: 6.59 [2024-06-11 04:10:45,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.16 | bwd_microstep: 44683.32 | bwd_inner_microstep: 1611.02 | bwd_allreduce_microstep: 43072.23 | step_microstep: 39.85 [2024-06-11 04:10:45,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16304.19 | bwd: 86978.97 | bwd_inner: 43905.79 | bwd_allreduce: 43072.47 | step: 41.34 {'loss': 1.1869, 'learning_rate': 7.872297373604154e-07, 'epoch': 0.91} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477 [2024-06-11 04:10:47,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.39 | bwd_microstep: 1397.97 | bwd_inner_microstep: 1397.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.11 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2454 [2024-06-11 04:10:48,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 941.41 | bwd_inner_microstep: 941.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 04:10:50,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.62 | bwd_microstep: 1336.14 | bwd_inner_microstep: 1336.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848 [2024-06-11 04:10:52,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.65 | bwd_microstep: 1552.01 | bwd_inner_microstep: 1551.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 04:10:54,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1379.58 | bwd_inner_microstep: 1379.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3872 [2024-06-11 04:10:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.60 | bwd_microstep: 1557.75 | bwd_inner_microstep: 1557.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525 [2024-06-11 04:10:58,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.38 | bwd_microstep: 1416.04 | bwd_inner_microstep: 1416.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 916 [2024-06-11 04:10:59,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.37 | bwd_microstep: 372.14 | bwd_inner_microstep: 372.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3732 [2024-06-11 04:11:18,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.69 | bwd_microstep: 1522.13 | bwd_inner_microstep: 1522.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1962 [2024-06-11 04:11:19,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.12 | bwd_microstep: 819.88 | bwd_inner_microstep: 819.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421 [2024-06-11 04:11:21,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1432.92 | bwd_inner_microstep: 1432.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 1952 [2024-06-11 04:11:23,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.86 | bwd_microstep: 910.97 | bwd_inner_microstep: 910.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3470 [2024-06-11 04:11:25,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.83 | bwd_microstep: 1338.18 | bwd_inner_microstep: 1338.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3934 [2024-06-11 04:11:27,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.70 | bwd_microstep: 1584.28 | bwd_inner_microstep: 1584.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3505 [2024-06-11 04:11:29,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.94 | bwd_microstep: 1670.32 | bwd_inner_microstep: 1670.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3829 [2024-06-11 04:11:31,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.23 | bwd_microstep: 1742.81 | bwd_inner_microstep: 1742.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-11 04:11:33,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.79 | bwd_microstep: 1507.73 | bwd_inner_microstep: 1507.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-11 04:11:35,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.85 | bwd_microstep: 1459.84 | bwd_inner_microstep: 1459.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1981 [2024-06-11 04:11:37,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.48 | bwd_microstep: 796.26 | bwd_inner_microstep: 796.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 04:11:39,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.31 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2089 [2024-06-11 04:11:40,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.91 | bwd_microstep: 824.99 | bwd_inner_microstep: 824.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 04:11:42,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1346.00 | bwd_inner_microstep: 1345.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3442 [2024-06-11 04:11:43,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.57 | bwd_microstep: 1444.01 | bwd_inner_microstep: 1443.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3638 [2024-06-11 04:11:46,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.40 | bwd_microstep: 1703.88 | bwd_inner_microstep: 1703.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3552 [2024-06-11 04:11:48,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1490.76 | bwd_inner_microstep: 1490.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3563 [2024-06-11 04:11:50,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1420.81 | bwd_inner_microstep: 1420.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2034 [2024-06-11 04:11:51,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.89 | bwd_microstep: 840.54 | bwd_inner_microstep: 840.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819 [2024-06-11 04:11:53,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.64 | bwd_microstep: 1716.93 | bwd_inner_microstep: 1716.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3569 [2024-06-11 04:11:55,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.34 | bwd_microstep: 1425.80 | bwd_inner_microstep: 1425.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3773 [2024-06-11 04:11:57,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1449.14 | bwd_inner_microstep: 1449.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 04:12:00,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.15 | bwd_microstep: 1651.47 | bwd_inner_microstep: 1651.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-11 04:12:03,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-11 04:12:03,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.00 | bwd_microstep: 3156.50 | bwd_inner_microstep: 995.03 | bwd_allreduce_microstep: 2161.41 | step_microstep: 38.25 [2024-06-11 04:12:03,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15790.42 | bwd: 44596.39 | bwd_inner: 42433.94 | bwd_allreduce: 2161.72 | step: 39.72 {'loss': 1.1548, 'learning_rate': 7.768365878392225e-07, 'epoch': 0.91} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2053 [2024-06-11 04:12:04,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 337.84 | bwd_microstep: 905.26 | bwd_inner_microstep: 905.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4017 [2024-06-11 04:12:07,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.32 | bwd_microstep: 1604.92 | bwd_inner_microstep: 1604.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-11 04:12:08,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.25 | bwd_microstep: 1277.24 | bwd_inner_microstep: 1277.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3792 [2024-06-11 04:12:10,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1450.85 | bwd_inner_microstep: 1450.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3521 [2024-06-11 04:12:12,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.33 | bwd_microstep: 1291.01 | bwd_inner_microstep: 1290.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3628 [2024-06-11 04:12:14,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.16 | bwd_microstep: 1377.41 | bwd_inner_microstep: 1377.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 04:12:16,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.76 | bwd_microstep: 1390.50 | bwd_inner_microstep: 1390.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897 [2024-06-11 04:12:18,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.03 | bwd_microstep: 1591.47 | bwd_inner_microstep: 1591.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1940 [2024-06-11 04:12:19,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.58 | bwd_microstep: 730.17 | bwd_inner_microstep: 730.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3417 [2024-06-11 04:12:21,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.45 | bwd_microstep: 1280.26 | bwd_inner_microstep: 1280.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 04:12:23,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.28 | bwd_microstep: 1248.86 | bwd_inner_microstep: 1248.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-11 04:12:25,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.33 | bwd_microstep: 1527.53 | bwd_inner_microstep: 1527.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-11 04:12:27,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1251.73 | bwd_inner_microstep: 1251.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2989 [2024-06-11 04:12:28,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.62 | bwd_microstep: 1013.98 | bwd_inner_microstep: 1013.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2480 [2024-06-11 04:12:29,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.58 | bwd_microstep: 1015.36 | bwd_inner_microstep: 1015.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-11 04:12:31,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.49 | bwd_microstep: 1437.35 | bwd_inner_microstep: 1437.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-11 04:12:34,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.77 | bwd_microstep: 1611.59 | bwd_inner_microstep: 1611.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-11 04:12:35,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.93 | bwd_microstep: 800.27 | bwd_inner_microstep: 800.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3636 [2024-06-11 04:12:37,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.28 | bwd_microstep: 1347.30 | bwd_inner_microstep: 1347.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-11 04:12:39,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.65 | bwd_microstep: 1656.23 | bwd_inner_microstep: 1656.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 04:12:41,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3665 [2024-06-11 04:12:43,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1325.31 | bwd_inner_microstep: 1325.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676 [2024-06-11 04:12:45,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.57 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2422 [2024-06-11 04:12:46,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 940.91 | bwd_inner_microstep: 940.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2184 [2024-06-11 04:12:47,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.32 | bwd_microstep: 827.53 | bwd_inner_microstep: 827.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3785 [2024-06-11 04:12:49,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.51 | bwd_microstep: 1647.72 | bwd_inner_microstep: 1647.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 04:12:51,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.52 | bwd_microstep: 1486.33 | bwd_inner_microstep: 1486.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3457 [2024-06-11 04:12:53,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3728 [2024-06-11 04:12:55,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.66 | bwd_microstep: 1536.61 | bwd_inner_microstep: 1536.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3575 [2024-06-11 04:12:57,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.01 | bwd_microstep: 1530.82 | bwd_inner_microstep: 1530.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562 [2024-06-11 04:13:00,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.41 | bwd_microstep: 1526.17 | bwd_inner_microstep: 1526.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3403 [2024-06-11 04:13:06,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.56 [2024-06-11 04:13:06,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.39 | bwd_microstep: 5841.22 | bwd_inner_microstep: 1632.15 | bwd_allreduce_microstep: 4209.02 | step_microstep: 37.83 [2024-06-11 04:13:06,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15838.13 | bwd: 46693.40 | bwd_inner: 42483.46 | bwd_allreduce: 4209.24 | step: 39.28 61.94s/it] 91%|█████████ | 1573/1726 [27:28:06<2:37:56, 61.94s/it] 91%|█████████ | 1574/1726 [27:29:54<3:12:11, 75.86s/it] 91%|█████████ | 1574/1726 [27:29:54<3:12:11, 75.86s/it] 91%|█████████▏| 1575/1726 [27:31:38<3:31:55, 84.21s/it] 91%|█████████▏| 1575/1726 [27:31:38<3:31:55, 84.21s/it] 91%|█████████▏| 1576/1726 [27:33:22<3:45:05, 90.04s/it] 91%|█████████▏| 1576/1726 [27:33:22<3:45:05, 90.04s/it] 91%|█████████▏| 1577/1726 [27:34:40<3:34:54, 86.54s/it] 91%|█████████▏| 1577/1726 [27:34:40<3:34:54, 86.54s/it] 91%|█████████▏|{'loss': 1.1629, 'learning_rate': 7.665111421996329e-07, 'epoch': 0.91} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-11 04:13:08,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.69 | bwd_microstep: 1570.39 | bwd_inner_microstep: 1570.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1342 [2024-06-11 04:13:09,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 198.10 | bwd_microstep: 516.22 | bwd_inner_microstep: 516.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 04:13:11,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.17 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 04:13:13,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1274.66 | bwd_inner_microstep: 1274.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 04:13:15,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.92 | bwd_microstep: 1383.17 | bwd_inner_microstep: 1383.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3481 [2024-06-11 04:13:17,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.71 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 04:13:18,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1378.23 | bwd_inner_microstep: 1378.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3412 [2024-06-11 04:13:20,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.68 | bwd_microstep: 1250.14 | bwd_inner_microstep: 1250.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 04:13:22,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.10 | bwd_microstep: 1386.84 | bwd_inner_microstep: 1386.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3397 [2024-06-11 04:13:24,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.86 | bwd_microstep: 1276.02 | bwd_inner_microstep: 1276.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3671 [2024-06-11 04:13:26,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.31 | bwd_microstep: 1480.50 | bwd_inner_microstep: 1480.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 04:13:28,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.62 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3659 [2024-06-11 04:13:30,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.05 | bwd_microstep: 1819.92 | bwd_inner_microstep: 1819.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 04:13:32,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1392.58 | bwd_inner_microstep: 1392.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1923 [2024-06-11 04:13:34,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.53 | bwd_microstep: 820.36 | bwd_inner_microstep: 820.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-11 04:13:36,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1612.01 | bwd_inner_microstep: 1611.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3516 [2024-06-11 04:13:38,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.24 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-11 04:13:40,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.86 | bwd_microstep: 1631.84 | bwd_inner_microstep: 1631.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-11 04:13:42,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.42 | bwd_microstep: 1313.76 | bwd_inner_microstep: 1313.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3802 [2024-06-11 04:13:44,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.23 | bwd_microstep: 1753.72 | bwd_inner_microstep: 1753.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 04:13:46,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.76 | bwd_microstep: 1397.13 | bwd_inner_microstep: 1397.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2109 [2024-06-11 04:13:47,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.32 | bwd_microstep: 920.06 | bwd_inner_microstep: 920.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3618 [2024-06-11 04:13:49,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1414.93 | bwd_inner_microstep: 1414.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3710 [2024-06-11 04:13:51,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1429.82 | bwd_inner_microstep: 1429.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3606 [2024-06-11 04:13:53,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1480.76 | bwd_inner_microstep: 1480.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3806 [2024-06-11 04:13:55,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.81 | bwd_microstep: 1497.63 | bwd_inner_microstep: 1497.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3468 [2024-06-11 04:13:57,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.91 | bwd_microstep: 1245.47 | bwd_inner_microstep: 1245.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3823 [2024-06-11 04:13:59,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1357.42 | bwd_inner_microstep: 1357.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934 [2024-06-11 04:14:00,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.25 | bwd_microstep: 726.88 | bwd_inner_microstep: 726.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3456 [2024-06-11 04:14:02,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1377.96 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2040 [2024-06-11 04:14:03,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 286.11 | bwd_microstep: 745.44 | bwd_inner_microstep: 745.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-11 04:14:07,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 04:14:07,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.47 | bwd_microstep: 3273.01 | bwd_inner_microstep: 1697.45 | bwd_allreduce_microstep: 1575.51 | step_microstep: 37.54 [2024-06-11 04:14:07,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15991.09 | bwd: 44562.57 | bwd_inner: 42986.16 | bwd_allreduce: 1575.74 | step: 39.05 {'loss': 1.1981, 'learning_rate': 7.562534368078167e-07, 'epoch': 0.91} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-11 04:14:09,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1488.55 | bwd_inner_microstep: 1488.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 04:14:11,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.54 | bwd_microstep: 1379.23 | bwd_inner_microstep: 1379.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3881 [2024-06-11 04:14:13,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.28 | bwd_microstep: 1682.94 | bwd_inner_microstep: 1682.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-11 04:14:15,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.40 | bwd_microstep: 1653.30 | bwd_inner_microstep: 1653.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-11 04:14:16,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.33 | bwd_microstep: 678.68 | bwd_inner_microstep: 678.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 04:14:18,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1388.94 | bwd_inner_microstep: 1388.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3489 [2024-06-11 04:14:20,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.41 | bwd_microstep: 1232.30 | bwd_inner_microstep: 1232.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3403 [2024-06-11 04:14:22,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.16 | bwd_microstep: 1370.77 | bwd_inner_microstep: 1370.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-11 04:14:24,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.37 | bwd_microstep: 1182.33 | bwd_inner_microstep: 1182.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 04:14:25,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1250.49 | bwd_inner_microstep: 1250.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-11 04:14:27,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.89 | bwd_microstep: 1524.46 | bwd_inner_microstep: 1524.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3505 [2024-06-11 04:14:29,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.84 | bwd_microstep: 1317.25 | bwd_inner_microstep: 1317.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3502 [2024-06-11 04:14:31,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.29 | bwd_microstep: 1410.81 | bwd_inner_microstep: 1410.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 04:14:33,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.77 | bwd_microstep: 1485.84 | bwd_inner_microstep: 1485.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3441 [2024-06-11 04:14:35,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.96 | bwd_microstep: 1444.62 | bwd_inner_microstep: 1444.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3404 [2024-06-11 04:14:37,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.36 | bwd_microstep: 1436.09 | bwd_inner_microstep: 1436.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-11 04:14:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.39 | bwd_microstep: 1486.19 | bwd_inner_microstep: 1486.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-11 04:14:41,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.40 | bwd_microstep: 1649.14 | bwd_inner_microstep: 1649.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 04:14:43,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.02 | bwd_microstep: 1288.33 | bwd_inner_microstep: 1288.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2053 [2024-06-11 04:14:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.19 | bwd_microstep: 722.25 | bwd_inner_microstep: 722.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3623 [2024-06-11 04:14:46,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.13 | bwd_microstep: 1475.72 | bwd_inner_microstep: 1475.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3815 [2024-06-11 04:14:48,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.10 | bwd_microstep: 1530.86 | bwd_inner_microstep: 1530.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3668 [2024-06-11 04:14:50,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.51 | bwd_microstep: 1427.89 | bwd_inner_microstep: 1427.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3722 [2024-06-11 04:14:52,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.87 | bwd_microstep: 1369.39 | bwd_inner_microstep: 1369.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3488 [2024-06-11 04:14:54,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1317.98 | bwd_inner_microstep: 1317.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2018 [2024-06-11 04:14:55,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 777.16 | bwd_inner_microstep: 777.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-11 04:14:57,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1511.68 | bwd_inner_microstep: 1511.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039 [2024-06-11 04:14:58,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.72 | bwd_microstep: 811.79 | bwd_inner_microstep: 811.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2043 [2024-06-11 04:15:00,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 907.12 | bwd_inner_microstep: 907.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3602 [2024-06-11 04:15:01,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1245.36 | bwd_inner_microstep: 1245.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 04:15:03,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1494.50 | bwd_inner_microstep: 1494.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4091 [2024-06-11 04:15:08,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 04:15:08,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 670.69 | bwd_microstep: 3656.31 | bwd_inner_microstep: 2090.61 | bwd_allreduce_microstep: 1565.65 | step_microstep: 38.04 [2024-06-11 04:15:08,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15984.10 | bwd: 44598.30 | bwd_inner: 43031.73 | bwd_allreduce: 1565.88 | step: 39.58 {'loss': 1.2043, 'learning_rate': 7.46063507791357e-07, 'epoch': 0.92} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 04:15:10,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.90 | bwd_microstep: 1290.21 | bwd_inner_microstep: 1290.05 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 04:15:12,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1378.41 | bwd_inner_microstep: 1378.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3876 [2024-06-11 04:15:14,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1581.38 | bwd_inner_microstep: 1581.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3514 [2024-06-11 04:15:15,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.10 | bwd_microstep: 1188.22 | bwd_inner_microstep: 1188.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 04:15:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.02 | bwd_microstep: 1546.26 | bwd_inner_microstep: 1546.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 04:15:19,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.83 | bwd_microstep: 1247.80 | bwd_inner_microstep: 1247.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3535 [2024-06-11 04:15:21,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.79 | bwd_microstep: 1441.85 | bwd_inner_microstep: 1441.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3727 [2024-06-11 04:15:23,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.70 | bwd_microstep: 1635.31 | bwd_inner_microstep: 1635.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 04:15:25,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1250.21 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1942 [2024-06-11 04:15:26,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.85 | bwd_microstep: 698.02 | bwd_inner_microstep: 698.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3526 [2024-06-11 04:15:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.54 | bwd_microstep: 1417.99 | bwd_inner_microstep: 1417.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3762 [2024-06-11 04:15:30,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.73 | bwd_microstep: 1633.32 | bwd_inner_microstep: 1633.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3506 [2024-06-11 04:15:32,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.23 | bwd_microstep: 1418.11 | bwd_inner_microstep: 1418.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 04:15:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.31 | bwd_microstep: 1340.25 | bwd_inner_microstep: 1340.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917 [2024-06-11 04:15:35,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.66 | bwd_microstep: 781.41 | bwd_inner_microstep: 781.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3671 [2024-06-11 04:15:37,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.99 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 04:15:39,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.44 | bwd_microstep: 1388.36 | bwd_inner_microstep: 1388.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-11 04:15:40,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.29 | bwd_microstep: 812.11 | bwd_inner_microstep: 812.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1994 [2024-06-11 04:15:41,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.25 | bwd_microstep: 708.25 | bwd_inner_microstep: 708.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3979 [2024-06-11 04:15:44,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 664.20 | bwd_microstep: 1809.71 | bwd_inner_microstep: 1809.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3842 [2024-06-11 04:15:46,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.33 | bwd_microstep: 1564.12 | bwd_inner_microstep: 1564.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3734 [2024-06-11 04:15:48,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.05 | bwd_microstep: 1440.64 | bwd_inner_microstep: 1440.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3715 [2024-06-11 04:15:50,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1636.71 | bwd_inner_microstep: 1636.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3555 [2024-06-11 04:15:52,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.80 | bwd_microstep: 1329.88 | bwd_inner_microstep: 1329.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-11 04:15:53,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.64 | bwd_microstep: 688.83 | bwd_inner_microstep: 688.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2280 [2024-06-11 04:15:54,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.01 | bwd_microstep: 784.41 | bwd_inner_microstep: 784.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3467 [2024-06-11 04:15:56,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.40 | bwd_microstep: 1542.72 | bwd_inner_microstep: 1542.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 04:15:58,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.24 | bwd_microstep: 1503.29 | bwd_inner_microstep: 1503.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3555 [2024-06-11 04:16:00,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.28 | bwd_microstep: 1377.52 | bwd_inner_microstep: 1377.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 04:16:03,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.86 | bwd_microstep: 1660.33 | bwd_inner_microstep: 1660.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3597 [2024-06-11 04:16:05,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.61 | bwd_microstep: 1608.31 | bwd_inner_microstep: 1608.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-11 04:16:10,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 04:16:10,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.10 | bwd_microstep: 5028.04 | bwd_inner_microstep: 1131.25 | bwd_allreduce_microstep: 3896.74 | step_microstep: 37.81 [2024-06-11 04:16:10,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15777.60 | bwd: 46283.03 | bwd_inner: 42385.22 | bwd_allreduce: 3897.05 | step: 39.34 {'loss': 1.1591, 'learning_rate': 7.359413910391322e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 04:16:12,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 04:16:14,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.90 | bwd_microstep: 1471.61 | bwd_inner_microstep: 1471.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 04:16:16,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1397.11 | bwd_inner_microstep: 1397.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-11 04:16:17,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.77 | bwd_microstep: 787.00 | bwd_inner_microstep: 786.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1863 [2024-06-11 04:16:18,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.37 | bwd_microstep: 675.90 | bwd_inner_microstep: 675.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-11 04:16:20,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.27 | bwd_microstep: 1473.82 | bwd_inner_microstep: 1473.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 04:16:22,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.19 | bwd_microstep: 1379.63 | bwd_inner_microstep: 1379.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 04:16:24,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.23 | bwd_microstep: 1281.14 | bwd_inner_microstep: 1281.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3733 [2024-06-11 04:16:26,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.29 | bwd_microstep: 1629.58 | bwd_inner_microstep: 1629.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3689 [2024-06-11 04:16:28,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.43 | bwd_microstep: 1549.57 | bwd_inner_microstep: 1549.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-11 04:16:30,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.05 | bwd_microstep: 1526.61 | bwd_inner_microstep: 1526.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3586 [2024-06-11 04:16:32,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.65 | bwd_microstep: 1368.98 | bwd_inner_microstep: 1368.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-11 04:16:34,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.67 | bwd_microstep: 1483.89 | bwd_inner_microstep: 1483.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3505 [2024-06-11 04:16:36,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1511.53 | bwd_inner_microstep: 1511.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3382 [2024-06-11 04:16:38,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.17 | bwd_microstep: 1438.04 | bwd_inner_microstep: 1438.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3832 [2024-06-11 04:16:41,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 642.07 | bwd_microstep: 1760.48 | bwd_inner_microstep: 1760.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3392 [2024-06-11 04:16:42,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.78 | bwd_microstep: 1243.78 | bwd_inner_microstep: 1243.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1931 [2024-06-11 04:16:44,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.91 | bwd_microstep: 849.68 | bwd_inner_microstep: 849.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-11 04:16:45,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.73 | bwd_microstep: 1307.78 | bwd_inner_microstep: 1307.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3834 [2024-06-11 04:16:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.35 | bwd_microstep: 1516.73 | bwd_inner_microstep: 1516.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-11 04:16:50,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.88 | bwd_microstep: 1553.21 | bwd_inner_microstep: 1553.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2138 [2024-06-11 04:16:51,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.59 | bwd_microstep: 832.95 | bwd_inner_microstep: 832.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2210 [2024-06-11 04:16:52,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.61 | bwd_microstep: 957.72 | bwd_inner_microstep: 957.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3545 [2024-06-11 04:16:54,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.00 | bwd_microstep: 1690.35 | bwd_inner_microstep: 1690.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2066 [2024-06-11 04:16:56,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 370.88 | bwd_microstep: 1009.84 | bwd_inner_microstep: 1009.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3424 [2024-06-11 04:16:58,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3632 [2024-06-11 04:17:00,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.33 | bwd_microstep: 1441.21 | bwd_inner_microstep: 1441.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 575 [2024-06-11 04:17:00,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 99.82 | bwd_microstep: 250.59 | bwd_inner_microstep: 250.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3811 [2024-06-11 04:17:02,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.12 | bwd_microstep: 1485.07 | bwd_inner_microstep: 1485.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3609 [2024-06-11 04:17:04,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.54 | bwd_microstep: 1609.23 | bwd_inner_microstep: 1609.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3591 [2024-06-11 04:17:06,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.63 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3575 [2024-06-11 04:17:11,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 04:17:11,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.73 | bwd_microstep: 3918.20 | bwd_inner_microstep: 1809.56 | bwd_allreduce_microstep: 2108.59 | step_microstep: 37.77 [2024-06-11 04:17:11,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15695.13 | bwd: 44388.59 | bwd_inner: 42279.10 | bwd_allreduce: 2108.82 | step: 39.17 {'loss': 1.1987, 'learning_rate': 7.258871222011832e-07, 'epoch': 0.92} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1863 [2024-06-11 04:17:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.19 | bwd_microstep: 760.16 | bwd_inner_microstep: 760.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471 [2024-06-11 04:17:14,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.14 | bwd_microstep: 1313.90 | bwd_inner_microstep: 1313.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871 [2024-06-11 04:17:16,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.49 | bwd_microstep: 1471.54 | bwd_inner_microstep: 1471.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 04:17:18,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.33 | bwd_microstep: 1482.63 | bwd_inner_microstep: 1482.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3400 [2024-06-11 04:17:19,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.47 | bwd_microstep: 1276.00 | bwd_inner_microstep: 1275.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3750 [2024-06-11 04:17:21,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.57 | bwd_microstep: 1539.14 | bwd_inner_microstep: 1539.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 04:17:24,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.14 | bwd_microstep: 1486.74 | bwd_inner_microstep: 1486.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943 [2024-06-11 04:17:25,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.50 | bwd_microstep: 759.91 | bwd_inner_microstep: 759.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-11 04:17:27,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1417.24 | bwd_inner_microstep: 1417.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2075 [2024-06-11 04:17:28,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.10 | bwd_microstep: 822.79 | bwd_inner_microstep: 822.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441 [2024-06-11 04:17:29,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.06 | bwd_microstep: 1188.13 | bwd_inner_microstep: 1188.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-11 04:17:31,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.19 | bwd_microstep: 1509.64 | bwd_inner_microstep: 1509.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 04:17:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.19 | bwd_microstep: 1399.10 | bwd_inner_microstep: 1399.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 04:17:35,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1506.52 | bwd_inner_microstep: 1506.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:17:37,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.32 | bwd_microstep: 1376.50 | bwd_inner_microstep: 1376.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 04:17:39,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1403.89 | bwd_inner_microstep: 1403.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3525 [2024-06-11 04:17:41,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.00 | bwd_microstep: 1423.81 | bwd_inner_microstep: 1423.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3984 [2024-06-11 04:17:43,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.41 | bwd_microstep: 1504.31 | bwd_inner_microstep: 1504.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2931 [2024-06-11 04:17:45,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.88 | bwd_microstep: 1284.11 | bwd_inner_microstep: 1284.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 04:17:47,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.64 | bwd_microstep: 1484.85 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3547 [2024-06-11 04:17:49,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.58 | bwd_microstep: 1442.92 | bwd_inner_microstep: 1442.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 04:17:51,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.14 | bwd_microstep: 1249.88 | bwd_inner_microstep: 1249.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 04:17:53,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.86 | bwd_microstep: 1513.46 | bwd_inner_microstep: 1513.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3580 [2024-06-11 04:17:55,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.77 | bwd_microstep: 1502.55 | bwd_inner_microstep: 1502.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-11 04:17:57,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1405.00 | bwd_inner_microstep: 1404.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3607 [2024-06-11 04:17:59,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.61 | bwd_microstep: 1312.02 | bwd_inner_microstep: 1312.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3808 [2024-06-11 04:18:01,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.50 | bwd_microstep: 1384.27 | bwd_inner_microstep: 1384.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594 [2024-06-11 04:18:02,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.40 | bwd_microstep: 1213.79 | bwd_inner_microstep: 1213.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2210 [2024-06-11 04:18:04,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.29 | bwd_microstep: 863.45 | bwd_inner_microstep: 863.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2497 [2024-06-11 04:18:05,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.84 | bwd_microstep: 1057.91 | bwd_inner_microstep: 1057.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3604 [2024-06-11 04:18:07,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.90 | bwd_microstep: 1706.45 | bwd_inner_microstep: 1706.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2263 [2024-06-11 04:18:11,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.08 | optimizer_step: 6.63 [2024-06-11 04:18:11,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.68 | bwd_microstep: 3496.30 | bwd_inner_microstep: 1099.16 | bwd_allreduce_microstep: 2397.09 | step_microstep: 37.70 [2024-06-11 04:18:11,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15748.91 | bwd: 44558.94 | bwd_inner: 42160.91 | bwd_allreduce: 2397.33 | step: 39.18 1578/1726 [27:35:43<3:15:56, 79.44s/it] 91%|█████████▏| 1578/1726 [27:35:43<3:15:56, 79.44s/it] 91%|█████████▏| 1579/1726 [27:36:44<3:00:59, 73.87s/it] 91%|█████████▏| 1579/1726 [27:36:44<3:00:59, 73.87s/it] 92%|█████████▏| 1580/1726 [27:37:45<2:50:18, 69.99s/it] 92%|█████████▏| 1580/1726 [27:37:45<2:50:18, 69.99s/it] 92%|█████████▏| 1581/1726 [27:38:47<2:43:37, 67.71s/it] 92%|█████████▏| 1581/1726 [27:38:47<2:43:37, 67.71s/it] 92%|█████████▏| 1582/1726 [27:39:47<2:37:14, 65.52s/it] 92%|█████████▏| 1582/1726 [27:39:47<2:37:14, 65.52s/it] {'loss': 1.1493, 'learning_rate': 7.15900736688595e-07, 'epoch': 0.92} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2459 [2024-06-11 04:18:13,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 389.99 | bwd_microstep: 1030.88 | bwd_inner_microstep: 1030.80 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:18:15,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.46 | bwd_microstep: 1375.30 | bwd_inner_microstep: 1375.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 04:18:17,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.73 | bwd_microstep: 1382.89 | bwd_inner_microstep: 1382.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 04:18:19,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.57 | bwd_microstep: 1651.67 | bwd_inner_microstep: 1651.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 528 [2024-06-11 04:18:19,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.45 | bwd_microstep: 240.75 | bwd_inner_microstep: 240.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729 [2024-06-11 04:18:21,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.16 | bwd_microstep: 1462.60 | bwd_inner_microstep: 1462.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 04:18:23,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.60 | bwd_microstep: 1244.00 | bwd_inner_microstep: 1243.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-11 04:18:24,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.51 | bwd_microstep: 788.20 | bwd_inner_microstep: 788.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 04:18:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.81 | bwd_microstep: 1246.57 | bwd_inner_microstep: 1246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-11 04:18:28,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.93 | bwd_microstep: 1427.59 | bwd_inner_microstep: 1427.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-11 04:18:30,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1345.01 | bwd_inner_microstep: 1344.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3487 [2024-06-11 04:18:31,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1394.20 | bwd_inner_microstep: 1394.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 04:18:33,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.90 | bwd_microstep: 1340.84 | bwd_inner_microstep: 1340.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 04:18:35,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.15 | bwd_microstep: 1475.06 | bwd_inner_microstep: 1475.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-11 04:18:37,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.85 | bwd_microstep: 1434.56 | bwd_inner_microstep: 1434.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3607 [2024-06-11 04:18:39,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.00 | bwd_microstep: 1460.38 | bwd_inner_microstep: 1460.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1987 [2024-06-11 04:18:40,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.04 | bwd_microstep: 768.48 | bwd_inner_microstep: 768.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-11 04:18:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1317.94 | bwd_inner_microstep: 1317.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 04:18:44,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1285.12 | bwd_inner_microstep: 1285.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3818 [2024-06-11 04:18:46,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.00 | bwd_microstep: 1415.58 | bwd_inner_microstep: 1415.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 04:18:48,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.99 | bwd_microstep: 1289.75 | bwd_inner_microstep: 1289.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3441 [2024-06-11 04:18:49,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1251.22 | bwd_inner_microstep: 1251.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-11 04:18:51,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.48 | bwd_inner_microstep: 1459.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797 [2024-06-11 04:18:54,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.93 | bwd_microstep: 1548.49 | bwd_inner_microstep: 1548.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3539 [2024-06-11 04:18:56,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.55 | bwd_microstep: 1524.70 | bwd_inner_microstep: 1524.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 04:18:57,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 973.51 | bwd_inner_microstep: 973.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 04:18:59,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.63 | bwd_microstep: 1404.48 | bwd_inner_microstep: 1404.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 925 [2024-06-11 04:19:00,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.21 | bwd_microstep: 375.61 | bwd_inner_microstep: 375.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3429 [2024-06-11 04:19:01,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.45 | bwd_microstep: 1198.53 | bwd_inner_microstep: 1198.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 04:19:03,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.47 | bwd_microstep: 1393.52 | bwd_inner_microstep: 1393.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3579 [2024-06-11 04:19:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.94 | bwd_microstep: 1249.97 | bwd_inner_microstep: 1249.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2046 [2024-06-11 04:19:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.55 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 04:19:13,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.46 | bwd_microstep: 7926.13 | bwd_inner_microstep: 1036.85 | bwd_allreduce_microstep: 6889.22 | step_microstep: 38.60 [2024-06-11 04:19:13,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14904.92 | bwd: 46683.01 | bwd_inner: 39792.82 | bwd_allreduce: 6889.49 | step: 40.06 {'loss': 1.1881, 'learning_rate': 7.059822696733598e-07, 'epoch': 0.92} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3491 [2024-06-11 04:19:15,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1392.72 | bwd_inner_microstep: 1392.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 04:19:17,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.57 | bwd_microstep: 1393.60 | bwd_inner_microstep: 1393.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3880 [2024-06-11 04:19:19,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.87 | bwd_microstep: 1485.43 | bwd_inner_microstep: 1485.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3471 [2024-06-11 04:19:21,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.87 | bwd_microstep: 1408.34 | bwd_inner_microstep: 1408.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262 [2024-06-11 04:19:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.35 | bwd_microstep: 966.37 | bwd_inner_microstep: 966.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3758 [2024-06-11 04:19:24,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.66 | bwd_microstep: 1340.12 | bwd_inner_microstep: 1340.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 04:19:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.34 | bwd_microstep: 1279.49 | bwd_inner_microstep: 1279.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 04:19:28,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.17 | bwd_microstep: 1248.51 | bwd_inner_microstep: 1248.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3749 [2024-06-11 04:19:30,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.00 | bwd_microstep: 1538.96 | bwd_inner_microstep: 1538.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 04:19:31,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.62 | bwd_microstep: 1186.65 | bwd_inner_microstep: 1186.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3694 [2024-06-11 04:19:34,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.02 | bwd_microstep: 1690.94 | bwd_inner_microstep: 1690.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3512 [2024-06-11 04:19:36,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1450.46 | bwd_inner_microstep: 1450.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2217 [2024-06-11 04:19:37,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.11 | bwd_microstep: 960.02 | bwd_inner_microstep: 959.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681 [2024-06-11 04:19:39,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.92 | bwd_microstep: 1613.90 | bwd_inner_microstep: 1613.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 04:19:41,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1374.47 | bwd_inner_microstep: 1374.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 04:19:44,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.03 | bwd_microstep: 1659.37 | bwd_inner_microstep: 1659.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-11 04:19:45,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.40 | bwd_microstep: 1417.39 | bwd_inner_microstep: 1417.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 04:19:47,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.13 | bwd_microstep: 1276.90 | bwd_inner_microstep: 1276.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-11 04:19:49,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.05 | bwd_microstep: 1182.92 | bwd_inner_microstep: 1182.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 04:19:51,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.31 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818 [2024-06-11 04:19:53,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.06 | bwd_microstep: 1603.22 | bwd_inner_microstep: 1603.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3624 [2024-06-11 04:19:55,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.28 | bwd_microstep: 1311.67 | bwd_inner_microstep: 1311.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3608 [2024-06-11 04:19:57,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.29 | bwd_microstep: 1674.53 | bwd_inner_microstep: 1674.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 04:19:59,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.83 | bwd_microstep: 1345.67 | bwd_inner_microstep: 1345.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 04:20:01,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3553 [2024-06-11 04:20:03,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.57 | bwd_microstep: 1555.78 | bwd_inner_microstep: 1555.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3421 [2024-06-11 04:20:05,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.00 | bwd_microstep: 1402.54 | bwd_inner_microstep: 1402.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1900 [2024-06-11 04:20:06,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.56 | bwd_microstep: 728.30 | bwd_inner_microstep: 728.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3571 [2024-06-11 04:20:08,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.60 | bwd_microstep: 1304.86 | bwd_inner_microstep: 1304.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-11 04:20:10,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1393.75 | bwd_inner_microstep: 1393.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 04:20:12,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.81 | bwd_microstep: 1285.57 | bwd_inner_microstep: 1285.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192 [2024-06-11 04:20:16,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.83 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 04:20:16,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.35 | bwd_microstep: 4153.90 | bwd_inner_microstep: 1082.67 | bwd_allreduce_microstep: 3071.18 | step_microstep: 38.23 [2024-06-11 04:20:16,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16187.47 | bwd: 46403.39 | bwd_inner: 43331.32 | bwd_allreduce: 3071.40 | step: 39.74 {'loss': 1.2365, 'learning_rate': 6.961317560882741e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 04:20:18,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.15 | bwd_microstep: 1370.96 | bwd_inner_microstep: 1370.88 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.12 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943 [2024-06-11 04:20:20,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.85 | bwd_microstep: 1193.95 | bwd_inner_microstep: 1193.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 04:20:22,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.84 | bwd_microstep: 1380.03 | bwd_inner_microstep: 1380.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266 [2024-06-11 04:20:23,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.93 | bwd_microstep: 968.04 | bwd_inner_microstep: 968.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-11 04:20:25,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.27 | bwd_microstep: 1454.46 | bwd_inner_microstep: 1454.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 04:20:27,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.61 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 04:20:29,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.73 | bwd_microstep: 1545.58 | bwd_inner_microstep: 1545.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-11 04:20:30,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.38 | bwd_microstep: 794.31 | bwd_inner_microstep: 794.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:20:32,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.83 | bwd_microstep: 1385.39 | bwd_inner_microstep: 1385.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-11 04:20:34,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.73 | bwd_microstep: 1529.00 | bwd_inner_microstep: 1528.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 04:20:36,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.21 | bwd_microstep: 1285.18 | bwd_inner_microstep: 1285.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 04:20:38,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.56 | bwd_microstep: 1380.31 | bwd_inner_microstep: 1380.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3473 [2024-06-11 04:20:40,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.37 | bwd_microstep: 1404.03 | bwd_inner_microstep: 1404.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-11 04:20:42,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.45 | bwd_microstep: 1448.76 | bwd_inner_microstep: 1448.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 04:20:44,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1481.75 | bwd_inner_microstep: 1481.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3946 [2024-06-11 04:20:46,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.31 | bwd_microstep: 1601.87 | bwd_inner_microstep: 1601.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-11 04:20:48,003] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.19 | bwd_microstep: 1256.11 | bwd_inner_microstep: 1256.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 04:20:49,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.60 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3648 [2024-06-11 04:20:51,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.60 | bwd_microstep: 1407.96 | bwd_inner_microstep: 1407.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3669 [2024-06-11 04:20:53,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1425.58 | bwd_inner_microstep: 1425.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-11 04:20:55,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.10 | bwd_microstep: 1281.73 | bwd_inner_microstep: 1281.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 04:20:57,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.73 | bwd_microstep: 1455.69 | bwd_inner_microstep: 1455.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2512 [2024-06-11 04:20:58,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.80 | bwd_microstep: 1058.75 | bwd_inner_microstep: 1058.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3674 [2024-06-11 04:21:00,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.30 | bwd_microstep: 1454.97 | bwd_inner_microstep: 1454.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3723 [2024-06-11 04:21:03,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.00 | bwd_microstep: 1638.03 | bwd_inner_microstep: 1638.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-11 04:21:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.02 | bwd_microstep: 1544.40 | bwd_inner_microstep: 1544.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3588 [2024-06-11 04:21:07,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.79 | bwd_microstep: 1410.24 | bwd_inner_microstep: 1410.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2241 [2024-06-11 04:21:08,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.31 | bwd_microstep: 964.90 | bwd_inner_microstep: 964.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007 [2024-06-11 04:21:09,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.21 | bwd_microstep: 907.22 | bwd_inner_microstep: 907.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-11 04:21:11,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.29 | bwd_microstep: 1438.07 | bwd_inner_microstep: 1438.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2264 [2024-06-11 04:21:13,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.37 | bwd_microstep: 1066.56 | bwd_inner_microstep: 1066.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-11 04:21:18,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.11 | optimizer_step: 6.63 [2024-06-11 04:21:18,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.43 | bwd_microstep: 4177.44 | bwd_inner_microstep: 1740.79 | bwd_allreduce_microstep: 2436.60 | step_microstep: 37.95 [2024-06-11 04:21:18,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15934.46 | bwd: 45238.55 | bwd_inner: 42800.98 | bwd_allreduce: 2436.86 | step: 39.48 {'loss': 1.1499, 'learning_rate': 6.863492306267927e-07, 'epoch': 0.92} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-11 04:21:20,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.67 | bwd_microstep: 1475.47 | bwd_inner_microstep: 1475.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 04:21:22,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1380.86 | bwd_inner_microstep: 1380.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3835 [2024-06-11 04:21:24,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.97 | bwd_microstep: 1416.94 | bwd_inner_microstep: 1416.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-11 04:21:25,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.43 | bwd_microstep: 1342.35 | bwd_inner_microstep: 1342.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2503 [2024-06-11 04:21:27,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.62 | bwd_microstep: 1024.81 | bwd_inner_microstep: 1024.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 04:21:29,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1496.02 | bwd_inner_microstep: 1496.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-11 04:21:31,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.74 | bwd_microstep: 1531.92 | bwd_inner_microstep: 1531.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3710 [2024-06-11 04:21:33,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.58 | bwd_microstep: 1630.19 | bwd_inner_microstep: 1630.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 04:21:35,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1250.76 | bwd_inner_microstep: 1250.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3618 [2024-06-11 04:21:37,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.96 | bwd_microstep: 1445.00 | bwd_inner_microstep: 1444.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4099 [2024-06-11 04:21:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.65 | bwd_microstep: 1531.87 | bwd_inner_microstep: 1531.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3633 [2024-06-11 04:21:41,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.10 | bwd_microstep: 1706.37 | bwd_inner_microstep: 1706.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 04:21:44,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.80 | bwd_microstep: 1587.93 | bwd_inner_microstep: 1587.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2016 [2024-06-11 04:21:45,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.18 | bwd_microstep: 710.74 | bwd_inner_microstep: 710.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2175 [2024-06-11 04:21:46,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.17 | bwd_microstep: 887.98 | bwd_inner_microstep: 887.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3705 [2024-06-11 04:21:48,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1617.81 | bwd_inner_microstep: 1617.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2110 [2024-06-11 04:21:49,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.33 | bwd_microstep: 824.48 | bwd_inner_microstep: 824.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2052 [2024-06-11 04:21:50,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.50 | bwd_microstep: 723.42 | bwd_inner_microstep: 723.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-11 04:21:52,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.27 | bwd_microstep: 1508.66 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3616 [2024-06-11 04:21:54,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.35 | bwd_microstep: 1508.42 | bwd_inner_microstep: 1508.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3621 [2024-06-11 04:21:56,765] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.07 | bwd_microstep: 1405.59 | bwd_inner_microstep: 1405.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3829 [2024-06-11 04:21:58,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1520.68 | bwd_inner_microstep: 1520.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-11 04:22:00,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.00 | bwd_microstep: 1286.08 | bwd_inner_microstep: 1286.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 04:22:02,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1377.09 | bwd_inner_microstep: 1377.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2009 [2024-06-11 04:22:03,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 285.23 | bwd_microstep: 742.55 | bwd_inner_microstep: 742.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3820 [2024-06-11 04:22:05,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.88 | bwd_microstep: 1461.19 | bwd_inner_microstep: 1461.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3578 [2024-06-11 04:22:07,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1349.06 | bwd_inner_microstep: 1349.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3919 [2024-06-11 04:22:09,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.80 | bwd_microstep: 1596.29 | bwd_inner_microstep: 1596.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 585 [2024-06-11 04:22:10,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.25 | bwd_microstep: 255.97 | bwd_inner_microstep: 255.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3594 [2024-06-11 04:22:11,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.54 | bwd_microstep: 1307.55 | bwd_inner_microstep: 1307.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3589 [2024-06-11 04:22:14,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.90 | bwd_microstep: 1648.20 | bwd_inner_microstep: 1648.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-11 04:22:18,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.91 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-11 04:22:18,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.63 | bwd_microstep: 4232.43 | bwd_inner_microstep: 1614.25 | bwd_allreduce_microstep: 2618.13 | step_microstep: 37.98 [2024-06-11 04:22:18,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15627.64 | bwd: 44784.70 | bwd_inner: 42165.66 | bwd_allreduce: 2618.35 | step: 39.46 {'loss': 1.1387, 'learning_rate': 6.766347277429175e-07, 'epoch': 0.92} 92%|█████████▏| 1583/1726 [27:40:48<2:32:39, 64.06s/it] 92%|█████████▏| 1583/1726 [27:40:48<2:32:39, 64.06s/it] 92%|█████████▏| 1584/1726 [27:41:50<2:30:04, 63.41s/it] 92%|█████████▏| 1584/1726 [27:41:50<2:30:04, 63.41s/it] 92%|█████████▏| 1585/1726 [27:42:53<2:28:40, 63.27s/it] 92%|█████████▏| 1585/1726 [27:42:53<2:28:40, 63.27s/it] 92%|█████████▏| 1586/1726 [27:43:54<2:26:23, 62.74s/it] 92%|█████████▏| 1586/1726 [27:43:54<2:26:23, 62.74s/it] 92%|█████████▏| 1587/1726 [27:44:55<2:23:57, 62.14s/it] 92%|█████████▏| 1587dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-11 04:22:20,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1465.98 | bwd_inner_microstep: 1465.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3948 [2024-06-11 04:22:23,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.52 | bwd_microstep: 1592.12 | bwd_inner_microstep: 1592.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3777 [2024-06-11 04:22:25,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.65 | bwd_microstep: 1442.09 | bwd_inner_microstep: 1442.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 04:22:26,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.89 | bwd_microstep: 1282.28 | bwd_inner_microstep: 1282.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-11 04:22:28,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1545.46 | bwd_inner_microstep: 1545.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-11 04:22:30,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.83 | bwd_microstep: 1276.72 | bwd_inner_microstep: 1276.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3445 [2024-06-11 04:22:32,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.49 | bwd_microstep: 1447.90 | bwd_inner_microstep: 1447.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 04:22:33,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.75 | bwd_microstep: 789.47 | bwd_inner_microstep: 789.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 04:22:35,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.25 | bwd_microstep: 1402.36 | bwd_inner_microstep: 1402.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-11 04:22:37,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1247.05 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 04:22:39,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.26 | bwd_microstep: 1384.79 | bwd_inner_microstep: 1384.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-11 04:22:41,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.99 | bwd_microstep: 1402.24 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-11 04:22:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.20 | bwd_microstep: 1283.76 | bwd_inner_microstep: 1283.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4189 [2024-06-11 04:22:45,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.73 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 04:22:47,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.14 | bwd_microstep: 1283.72 | bwd_inner_microstep: 1283.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3601 [2024-06-11 04:22:49,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.68 | bwd_microstep: 1459.76 | bwd_inner_microstep: 1459.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-11 04:22:51,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.82 | bwd_microstep: 1489.13 | bwd_inner_microstep: 1489.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-11 04:22:52,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 803.52 | bwd_inner_microstep: 803.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-11 04:22:54,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1589.02 | bwd_inner_microstep: 1589.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1996 [2024-06-11 04:22:55,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.61 | bwd_microstep: 892.83 | bwd_inner_microstep: 892.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2103 [2024-06-11 04:22:56,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.80 | bwd_microstep: 920.71 | bwd_inner_microstep: 920.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3647 [2024-06-11 04:22:58,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.74 | bwd_microstep: 1442.09 | bwd_inner_microstep: 1442.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-11 04:23:01,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.30 | bwd_microstep: 1499.13 | bwd_inner_microstep: 1499.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3721 [2024-06-11 04:23:03,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.55 | bwd_microstep: 1601.63 | bwd_inner_microstep: 1601.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2048 [2024-06-11 04:23:04,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.39 | bwd_microstep: 809.14 | bwd_inner_microstep: 809.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920 [2024-06-11 04:23:05,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.43 | bwd_microstep: 716.77 | bwd_inner_microstep: 716.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3637 [2024-06-11 04:23:07,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.67 | bwd_microstep: 1662.42 | bwd_inner_microstep: 1662.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3817 [2024-06-11 04:23:09,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.96 | bwd_microstep: 1527.26 | bwd_inner_microstep: 1527.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 04:23:11,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.18 | bwd_microstep: 1254.74 | bwd_inner_microstep: 1254.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-11 04:23:13,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.17 | bwd_microstep: 1512.58 | bwd_inner_microstep: 1512.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3800 [2024-06-11 04:23:15,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.70 | bwd_microstep: 1450.35 | bwd_inner_microstep: 1450.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3569 [2024-06-11 04:23:19,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.77 | optimizer_gradients: 4.04 | optimizer_step: 6.59 [2024-06-11 04:23:19,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.85 | bwd_microstep: 3564.47 | bwd_inner_microstep: 1772.84 | bwd_allreduce_microstep: 1791.59 | step_microstep: 39.46 [2024-06-11 04:23:19,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15937.38 | bwd: 44595.26 | bwd_inner: 42802.76 | bwd_allreduce: 1791.81 | step: 41.00 {'loss': 1.1754, 'learning_rate': 6.669882816510776e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 04:23:21,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.86 | bwd_microstep: 1364.81 | bwd_inner_microstep: 1364.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:23:23,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1378.26 | bwd_inner_microstep: 1378.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-11 04:23:25,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.00 | bwd_microstep: 1275.68 | bwd_inner_microstep: 1275.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2262 [2024-06-11 04:23:26,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 971.54 | bwd_inner_microstep: 971.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 04:23:28,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.84 | bwd_microstep: 1400.82 | bwd_inner_microstep: 1400.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3745 [2024-06-11 04:23:30,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.96 | bwd_microstep: 1640.03 | bwd_inner_microstep: 1640.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2190 [2024-06-11 04:23:32,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.70 | bwd_microstep: 954.48 | bwd_inner_microstep: 954.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2179 [2024-06-11 04:23:33,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.67 | bwd_microstep: 856.89 | bwd_inner_microstep: 856.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691 [2024-06-11 04:23:35,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.06 | bwd_microstep: 1629.37 | bwd_inner_microstep: 1629.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911 [2024-06-11 04:23:36,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.17 | bwd_microstep: 686.28 | bwd_inner_microstep: 686.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 04:23:38,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.67 | bwd_microstep: 1288.78 | bwd_inner_microstep: 1288.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3649 [2024-06-11 04:23:40,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.62 | bwd_microstep: 1548.56 | bwd_inner_microstep: 1548.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3086 [2024-06-11 04:23:42,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.59 | bwd_microstep: 1292.71 | bwd_inner_microstep: 1292.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505 [2024-06-11 04:23:44,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.36 | bwd_microstep: 1585.40 | bwd_inner_microstep: 1585.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3910 [2024-06-11 04:23:46,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.42 | bwd_microstep: 1622.09 | bwd_inner_microstep: 1622.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 04:23:48,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.03 | bwd_microstep: 1391.29 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 04:23:50,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.74 | bwd_microstep: 1492.40 | bwd_inner_microstep: 1492.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 04:23:52,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.59 | bwd_microstep: 1253.79 | bwd_inner_microstep: 1253.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 04:23:54,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.12 | bwd_microstep: 1256.94 | bwd_inner_microstep: 1256.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3625 [2024-06-11 04:23:56,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1414.98 | bwd_inner_microstep: 1414.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-11 04:23:58,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.26 | bwd_microstep: 1460.98 | bwd_inner_microstep: 1460.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-11 04:23:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.49 | bwd_microstep: 794.78 | bwd_inner_microstep: 794.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 04:24:00,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.24 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 04:24:03,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.99 | bwd_microstep: 1555.25 | bwd_inner_microstep: 1555.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273 [2024-06-11 04:24:04,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.88 | bwd_microstep: 874.90 | bwd_inner_microstep: 874.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 04:24:06,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.17 | bwd_microstep: 1473.97 | bwd_inner_microstep: 1473.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 04:24:08,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.18 | bwd_microstep: 1655.45 | bwd_inner_microstep: 1655.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-11 04:24:09,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.34 | bwd_microstep: 877.72 | bwd_inner_microstep: 877.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-11 04:24:11,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1493.09 | bwd_inner_microstep: 1493.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 04:24:13,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.70 | bwd_microstep: 1473.03 | bwd_inner_microstep: 1473.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-11 04:24:16,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.36 | bwd_microstep: 1645.53 | bwd_inner_microstep: 1645.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3736 [2024-06-11 04:24:24,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.21 | optimizer_step: 6.61 [2024-06-11 04:24:24,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 667.28 | bwd_microstep: 7929.94 | bwd_inner_microstep: 2079.60 | bwd_allreduce_microstep: 5850.27 | step_microstep: 38.37 [2024-06-11 04:24:24,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15943.50 | bwd: 48791.37 | bwd_inner: 42940.17 | bwd_allreduce: 5850.51 | step: 39.91 {'loss': 1.1902, 'learning_rate': 6.574099263260092e-07, 'epoch': 0.92} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2440 [2024-06-11 04:24:26,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.55 | bwd_microstep: 1000.32 | bwd_inner_microstep: 1000.22 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-11 04:24:27,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 800.93 | bwd_inner_microstep: 800.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3875 [2024-06-11 04:24:29,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.07 | bwd_microstep: 1578.48 | bwd_inner_microstep: 1578.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 04:24:31,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.03 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3922 [2024-06-11 04:24:33,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1485.77 | bwd_inner_microstep: 1485.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 702 [2024-06-11 04:24:33,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.82 | bwd_microstep: 285.06 | bwd_inner_microstep: 285.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 04:24:35,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.78 | bwd_microstep: 1280.42 | bwd_inner_microstep: 1280.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3747 [2024-06-11 04:24:37,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.65 | bwd_microstep: 1635.02 | bwd_inner_microstep: 1635.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3616 [2024-06-11 04:24:39,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.06 | bwd_microstep: 1342.28 | bwd_inner_microstep: 1342.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 04:24:41,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1388.11 | bwd_inner_microstep: 1388.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-11 04:24:42,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.07 | bwd_microstep: 696.27 | bwd_inner_microstep: 696.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1981 [2024-06-11 04:24:43,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.07 | bwd_microstep: 828.46 | bwd_inner_microstep: 828.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2088 [2024-06-11 04:24:44,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.93 | bwd_microstep: 851.63 | bwd_inner_microstep: 851.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 04:24:46,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.51 | bwd_microstep: 1295.07 | bwd_inner_microstep: 1295.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2146 [2024-06-11 04:24:48,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 945.18 | bwd_inner_microstep: 945.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-11 04:24:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.03 | bwd_microstep: 1284.45 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-11 04:24:50,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.43 | bwd_microstep: 788.33 | bwd_inner_microstep: 788.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3511 [2024-06-11 04:24:52,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.97 | bwd_microstep: 1316.55 | bwd_inner_microstep: 1316.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3673 [2024-06-11 04:24:54,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.75 | bwd_microstep: 1515.33 | bwd_inner_microstep: 1515.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 04:24:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1476.95 | bwd_inner_microstep: 1476.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 04:24:58,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1376.63 | bwd_inner_microstep: 1376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615 [2024-06-11 04:25:00,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.71 | bwd_microstep: 1431.00 | bwd_inner_microstep: 1430.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 04:25:02,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1279.28 | bwd_inner_microstep: 1279.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3692 [2024-06-11 04:25:04,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1329.91 | bwd_inner_microstep: 1329.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634 [2024-06-11 04:25:06,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.18 | bwd_microstep: 1542.90 | bwd_inner_microstep: 1542.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-11 04:25:08,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.68 | bwd_microstep: 1288.90 | bwd_inner_microstep: 1288.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3820 [2024-06-11 04:25:10,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.48 | bwd_microstep: 1292.73 | bwd_inner_microstep: 1292.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 04:25:11,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.38 | bwd_microstep: 1313.24 | bwd_inner_microstep: 1313.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273 [2024-06-11 04:25:13,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.80 | bwd_microstep: 879.85 | bwd_inner_microstep: 879.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 04:25:14,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.16 | bwd_microstep: 1374.62 | bwd_inner_microstep: 1374.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3300 [2024-06-11 04:25:16,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.22 | bwd_microstep: 1230.72 | bwd_inner_microstep: 1230.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 04:25:26,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.09 | optimizer_step: 6.63 [2024-06-11 04:25:26,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.08 | bwd_microstep: 9599.72 | bwd_inner_microstep: 1436.12 | bwd_allreduce_microstep: 8163.55 | step_microstep: 38.14 [2024-06-11 04:25:26,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14586.61 | bwd: 47109.64 | bwd_inner: 38945.10 | bwd_allreduce: 8163.83 | step: 39.64 {'loss': 1.1754, 'learning_rate': 6.478996955026251e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 04:25:28,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.86 | bwd_microstep: 1372.60 | bwd_inner_microstep: 1372.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 04:25:30,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.92 | bwd_microstep: 1269.72 | bwd_inner_microstep: 1269.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 04:25:32,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.16 | bwd_microstep: 1480.53 | bwd_inner_microstep: 1480.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2961 [2024-06-11 04:25:34,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.91 | bwd_microstep: 1096.24 | bwd_inner_microstep: 1096.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 04:25:35,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.41 | bwd_microstep: 1244.54 | bwd_inner_microstep: 1244.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3496 [2024-06-11 04:25:37,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1188.04 | bwd_inner_microstep: 1188.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 04:25:39,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 1278.94 | bwd_inner_microstep: 1278.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3488 [2024-06-11 04:25:41,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1408.93 | bwd_inner_microstep: 1408.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3718 [2024-06-11 04:25:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.08 | bwd_microstep: 1628.04 | bwd_inner_microstep: 1628.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-11 04:25:44,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.33 | bwd_microstep: 790.99 | bwd_inner_microstep: 790.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-11 04:25:46,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.39 | bwd_microstep: 1516.61 | bwd_inner_microstep: 1516.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4144 [2024-06-11 04:25:49,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.66 | bwd_microstep: 1839.25 | bwd_inner_microstep: 1839.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 04:25:51,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.24 | bwd_microstep: 1390.54 | bwd_inner_microstep: 1390.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3519 [2024-06-11 04:25:52,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.44 | bwd_microstep: 1333.09 | bwd_inner_microstep: 1333.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-11 04:25:54,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.13 | bwd_microstep: 1521.49 | bwd_inner_microstep: 1521.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2297 [2024-06-11 04:25:56,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.58 | bwd_microstep: 1069.16 | bwd_inner_microstep: 1069.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3469 [2024-06-11 04:25:58,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.01 | bwd_microstep: 1503.50 | bwd_inner_microstep: 1503.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-11 04:25:59,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.50 | bwd_microstep: 799.05 | bwd_inner_microstep: 799.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2300 [2024-06-11 04:26:00,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.35 | bwd_microstep: 879.08 | bwd_inner_microstep: 879.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-11 04:26:03,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.81 | bwd_microstep: 1605.49 | bwd_inner_microstep: 1605.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-11 04:26:04,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.18 | bwd_microstep: 914.31 | bwd_inner_microstep: 914.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 04:26:06,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.12 | bwd_microstep: 1352.25 | bwd_inner_microstep: 1352.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 04:26:07,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.61 | bwd_microstep: 1255.44 | bwd_inner_microstep: 1255.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 04:26:09,819] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.45 | bwd_microstep: 1394.95 | bwd_inner_microstep: 1394.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-11 04:26:11,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.39 | bwd_microstep: 1321.89 | bwd_inner_microstep: 1321.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-11 04:26:13,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1512.28 | bwd_inner_microstep: 1512.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3811 [2024-06-11 04:26:15,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.76 | bwd_microstep: 1450.98 | bwd_inner_microstep: 1450.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-11 04:26:17,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.67 | bwd_microstep: 1627.60 | bwd_inner_microstep: 1627.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3582 [2024-06-11 04:26:19,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.04 | bwd_microstep: 1404.96 | bwd_inner_microstep: 1404.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3589 [2024-06-11 04:26:22,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.09 | bwd_microstep: 1606.52 | bwd_inner_microstep: 1606.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-11 04:26:24,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.59 | bwd_microstep: 1484.05 | bwd_inner_microstep: 1484.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3576 [2024-06-11 04:26:29,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.13 | optimizer_step: 6.58 [2024-06-11 04:26:29,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.40 | bwd_microstep: 4946.77 | bwd_inner_microstep: 1349.18 | bwd_allreduce_microstep: 3597.53 | step_microstep: 38.34 [2024-06-11 04:26:29,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15987.43 | bwd: 46487.85 | bwd_inner: 42889.41 | bwd_allreduce: 3597.77 | step: 39.79 {'loss': 1.1658, 'learning_rate': 6.384576226759165e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 04:26:31,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.07 | bwd_microstep: 1345.62 | bwd_inner_microstep: 1345.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4034 [2024-06-11 04:26:33,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.82 | bwd_microstep: 1714.35 | bwd_inner_microstep: 1714.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3852 [2024-06-11 04:26:35,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.73 | bwd_microstep: 1457.38 | bwd_inner_microstep: 1457.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1870 [2024-06-11 04:26:36,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.26 | bwd_microstep: 680.74 | bwd_inner_microstep: 680.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3476 [2024-06-11 04:26:38,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.87 | bwd_microstep: 1214.64 | bwd_inner_microstep: 1214.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 04:26:40,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1385.25 | bwd_inner_microstep: 1385.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 04:26:42,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1388.78 | bwd_inner_microstep: 1388.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 04:26:44,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1281.82 | bwd_inner_microstep: 1281.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3681 [2024-06-11 04:26:45,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.59 | bwd_microstep: 1323.25 | bwd_inner_microstep: 1323.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3487 [2024-06-11 04:26:47,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.77 | bwd_microstep: 1187.78 | bwd_inner_microstep: 1187.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-11 04:26:49,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.00 | bwd_microstep: 1401.33 | bwd_inner_microstep: 1401.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3498 [2024-06-11 04:26:51,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.55 | bwd_microstep: 1412.51 | bwd_inner_microstep: 1412.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3640 [2024-06-11 04:26:53,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.55 | bwd_microstep: 1603.17 | bwd_inner_microstep: 1603.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 04:26:55,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.09 | bwd_microstep: 1357.31 | bwd_inner_microstep: 1357.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 665 [2024-06-11 04:26:55,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 110.94 | bwd_microstep: 278.00 | bwd_inner_microstep: 277.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2685 [2024-06-11 04:26:57,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.43 | bwd_microstep: 1024.28 | bwd_inner_microstep: 1024.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2041 [2024-06-11 04:26:58,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.57 | bwd_microstep: 838.55 | bwd_inner_microstep: 838.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3515 [2024-06-11 04:27:00,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.85 | bwd_microstep: 1319.17 | bwd_inner_microstep: 1319.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 04:27:02,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.03 | bwd_microstep: 1385.34 | bwd_inner_microstep: 1385.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-11 04:27:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.63 | bwd_microstep: 1658.85 | bwd_inner_microstep: 1658.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 04:27:06,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.21 | bwd_microstep: 1553.09 | bwd_inner_microstep: 1553.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3459 [2024-06-11 04:27:08,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.08 | bwd_microstep: 1241.64 | bwd_inner_microstep: 1241.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 04:27:10,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.34 | bwd_microstep: 1291.87 | bwd_inner_microstep: 1291.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 04:27:12,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.36 | bwd_microstep: 1398.89 | bwd_inner_microstep: 1398.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3673 [2024-06-11 04:27:13,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.61 | bwd_microstep: 1326.91 | bwd_inner_microstep: 1326.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 04:27:15,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.56 | bwd_microstep: 1377.61 | bwd_inner_microstep: 1377.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-11 04:27:17,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.44 | bwd_microstep: 1282.15 | bwd_inner_microstep: 1282.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3593 [2024-06-11 04:27:19,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.40 | bwd_microstep: 1456.17 | bwd_inner_microstep: 1456.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3635 [2024-06-11 04:27:21,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1710.46 | bwd_inner_microstep: 1710.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-11 04:27:23,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1395.16 | bwd_inner_microstep: 1395.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2033 [2024-06-11 04:27:25,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.80 | bwd_microstep: 931.68 | bwd_inner_microstep: 931.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3580 [2024-06-11 04:27:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.07 | optimizer_step: 6.61 [2024-06-11 04:27:30,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.15 | bwd_microstep: 4188.25 | bwd_inner_microstep: 1915.80 | bwd_allreduce_microstep: 2272.40 | step_microstep: 37.68 [2024-06-11 04:27:30,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15717.72 | bwd: 44412.01 | bwd_inner: 42138.70 | bwd_allreduce: 2272.63 | step: 39.11 {'loss': 1.1994, 'learning_rate': 6.290837411008044e-07, 'epoch': 0.92} /1726 [27:44:55<2:23:57, 62.14s/it] 92%|█████████▏| 1588/1726 [27:45:56<2:22:03, 61.76s/it] 92%|█████████▏| 1588/1726 [27:45:56<2:22:03, 61.76s/it] 92%|█████████▏| 1589/1726 [27:47:01<2:23:17, 62.76s/it] 92%|█████████▏| 1589/1726 [27:47:01<2:23:17, 62.76s/it] 92%|█████████▏| 1590/1726 [27:48:03<2:21:44, 62.53s/it] 92%|█████████▏| 1590/1726 [27:48:03<2:21:44, 62.53s/it] 92%|█████████▏| 1591/1726 [27:49:06<2:20:52, 62.61s/it] 92%|█████████▏| 1591/1726 [27:49:06<2:20:52, 62.61s/it] 92%|█████████▏| 1592/1726 [27:50:06<2:18:23, 61.97s/it] 92%|dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027 [2024-06-11 04:27:31,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.88 | bwd_microstep: 804.14 | bwd_inner_microstep: 804.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3412 [2024-06-11 04:27:32,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.12 | bwd_microstep: 1279.07 | bwd_inner_microstep: 1279.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-11 04:27:34,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.38 | bwd_microstep: 1371.71 | bwd_inner_microstep: 1371.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 04:27:36,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.64 | bwd_microstep: 1245.71 | bwd_inner_microstep: 1245.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 04:27:38,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1384.34 | bwd_inner_microstep: 1384.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-11 04:27:40,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.50 | bwd_microstep: 1149.33 | bwd_inner_microstep: 1149.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 04:27:41,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.54 | bwd_microstep: 1245.76 | bwd_inner_microstep: 1245.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3748 [2024-06-11 04:27:43,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1373.91 | bwd_inner_microstep: 1373.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-11 04:27:45,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.67 | bwd_microstep: 1151.85 | bwd_inner_microstep: 1151.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-11 04:27:47,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.30 | bwd_microstep: 1355.38 | bwd_inner_microstep: 1355.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3549 [2024-06-11 04:27:48,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 1297.99 | bwd_inner_microstep: 1297.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3439 [2024-06-11 04:27:50,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.15 | bwd_microstep: 1185.75 | bwd_inner_microstep: 1185.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 04:27:51,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.33 | bwd_microstep: 792.63 | bwd_inner_microstep: 792.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2131 [2024-06-11 04:27:53,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.40 | bwd_microstep: 930.91 | bwd_inner_microstep: 930.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3434 [2024-06-11 04:27:55,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.71 | bwd_microstep: 1476.64 | bwd_inner_microstep: 1476.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-11 04:27:57,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1492.79 | bwd_inner_microstep: 1492.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663 [2024-06-11 04:27:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.85 | bwd_microstep: 1619.94 | bwd_inner_microstep: 1619.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3379 [2024-06-11 04:28:01,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1400.80 | bwd_inner_microstep: 1400.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1953 [2024-06-11 04:28:02,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.21 | bwd_microstep: 779.01 | bwd_inner_microstep: 778.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2277 [2024-06-11 04:28:03,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.63 | bwd_microstep: 936.70 | bwd_inner_microstep: 936.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-11 04:28:05,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.57 | bwd_microstep: 1491.59 | bwd_inner_microstep: 1491.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-11 04:28:07,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1296.47 | bwd_inner_microstep: 1296.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3618 [2024-06-11 04:28:09,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.26 | bwd_microstep: 1610.62 | bwd_inner_microstep: 1610.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3558 [2024-06-11 04:28:11,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.04 | bwd_microstep: 1249.71 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 04:28:13,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.98 | bwd_microstep: 1385.11 | bwd_inner_microstep: 1385.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-11 04:28:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1401.48 | bwd_inner_microstep: 1401.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-11 04:28:17,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.98 | bwd_microstep: 1631.02 | bwd_inner_microstep: 1630.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-11 04:28:19,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.54 | bwd_microstep: 1508.11 | bwd_inner_microstep: 1508.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469 [2024-06-11 04:28:21,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.90 | bwd_microstep: 1242.26 | bwd_inner_microstep: 1242.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3468 [2024-06-11 04:28:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.18 | bwd_microstep: 1359.80 | bwd_inner_microstep: 1359.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 04:28:24,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.09 | bwd_microstep: 1255.02 | bwd_inner_microstep: 1254.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3579 [2024-06-11 04:28:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.22 | optimizer_step: 6.59 [2024-06-11 04:28:30,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.94 | bwd_microstep: 4588.49 | bwd_inner_microstep: 1456.15 | bwd_allreduce_microstep: 3132.28 | step_microstep: 37.96 [2024-06-11 04:28:30,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15412.82 | bwd: 44294.04 | bwd_inner: 41160.85 | bwd_allreduce: 3132.51 | step: 39.45 {'loss': 1.2023, 'learning_rate': 6.197780837920598e-07, 'epoch': 0.92} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3426 [2024-06-11 04:28:31,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.63 | bwd_microstep: 1332.58 | bwd_inner_microstep: 1332.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4020 [2024-06-11 04:28:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.11 | bwd_microstep: 1610.98 | bwd_inner_microstep: 1610.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 04:28:35,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.20 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 04:28:37,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1245.97 | bwd_inner_microstep: 1245.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3851 [2024-06-11 04:28:39,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.07 | bwd_microstep: 1563.15 | bwd_inner_microstep: 1563.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3446 [2024-06-11 04:28:41,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.44 | bwd_microstep: 1449.45 | bwd_inner_microstep: 1449.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 04:28:43,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1253.95 | bwd_inner_microstep: 1253.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3782 [2024-06-11 04:28:45,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.40 | bwd_microstep: 1506.58 | bwd_inner_microstep: 1506.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-11 04:28:47,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.75 | bwd_microstep: 1285.83 | bwd_inner_microstep: 1285.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 04:28:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.77 | bwd_microstep: 1386.44 | bwd_inner_microstep: 1386.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418 [2024-06-11 04:28:50,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.12 | bwd_microstep: 1183.38 | bwd_inner_microstep: 1183.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 04:28:52,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1384.04 | bwd_inner_microstep: 1384.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-11 04:28:54,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.36 | bwd_microstep: 1523.75 | bwd_inner_microstep: 1523.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656 [2024-06-11 04:28:57,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.88 | bwd_microstep: 1541.21 | bwd_inner_microstep: 1541.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 04:28:58,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1284.74 | bwd_inner_microstep: 1284.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1960 [2024-06-11 04:29:00,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.53 | bwd_microstep: 894.60 | bwd_inner_microstep: 894.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3460 [2024-06-11 04:29:02,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.87 | bwd_microstep: 1537.11 | bwd_inner_microstep: 1537.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3520 [2024-06-11 04:29:04,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.78 | bwd_microstep: 1684.74 | bwd_inner_microstep: 1684.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3661 [2024-06-11 04:29:06,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1481.40 | bwd_inner_microstep: 1481.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 04:29:08,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1285.41 | bwd_inner_microstep: 1285.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 04:29:10,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.96 | bwd_microstep: 1390.56 | bwd_inner_microstep: 1390.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509 [2024-06-11 04:29:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.77 | bwd_microstep: 1223.41 | bwd_inner_microstep: 1223.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3827 [2024-06-11 04:29:14,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.76 | bwd_microstep: 1521.79 | bwd_inner_microstep: 1521.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3732 [2024-06-11 04:29:16,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.24 | bwd_microstep: 1607.72 | bwd_inner_microstep: 1607.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3728 [2024-06-11 04:29:18,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1481.71 | bwd_inner_microstep: 1481.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814 [2024-06-11 04:29:20,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 644.40 | bwd_microstep: 1759.52 | bwd_inner_microstep: 1759.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3738 [2024-06-11 04:29:22,790] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.14 | bwd_microstep: 1460.92 | bwd_inner_microstep: 1460.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3449 [2024-06-11 04:29:24,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1455.38 | bwd_inner_microstep: 1455.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3582 [2024-06-11 04:29:26,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1430.61 | bwd_inner_microstep: 1430.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 04:29:29,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.29 | bwd_microstep: 1650.33 | bwd_inner_microstep: 1650.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-11 04:29:30,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1406.06 | bwd_inner_microstep: 1406.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-11 04:29:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.02 | optimizer_step: 6.61 [2024-06-11 04:29:33,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.02 | bwd_microstep: 1599.10 | bwd_inner_microstep: 1556.35 | bwd_allreduce_microstep: 42.71 | step_microstep: 37.45 [2024-06-11 04:29:33,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17044.83 | bwd: 45703.28 | bwd_inner: 45659.68 | bwd_allreduce: 42.93 | step: 38.89 {'loss': 1.1508, 'learning_rate': 6.105406835241545e-07, 'epoch': 0.92} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-11 04:29:35,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.54 | bwd_microstep: 1576.46 | bwd_inner_microstep: 1576.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-11 04:29:37,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.16 | bwd_microstep: 1478.94 | bwd_inner_microstep: 1478.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3862 [2024-06-11 04:29:39,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.80 | bwd_microstep: 1665.28 | bwd_inner_microstep: 1665.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-11 04:29:41,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.96 | bwd_microstep: 1546.75 | bwd_inner_microstep: 1546.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-11 04:29:43,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.36 | bwd_microstep: 1483.12 | bwd_inner_microstep: 1483.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3735 [2024-06-11 04:29:46,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.26 | bwd_microstep: 1635.63 | bwd_inner_microstep: 1635.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964 [2024-06-11 04:29:47,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.85 | bwd_microstep: 702.35 | bwd_inner_microstep: 702.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-11 04:29:48,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.84 | bwd_microstep: 1286.60 | bwd_inner_microstep: 1286.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-11 04:29:50,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.13 | bwd_microstep: 1278.53 | bwd_inner_microstep: 1278.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 04:29:52,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.90 | bwd_microstep: 1384.22 | bwd_inner_microstep: 1384.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3476 [2024-06-11 04:29:54,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.41 | bwd_microstep: 1341.62 | bwd_inner_microstep: 1341.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2826 [2024-06-11 04:29:55,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 402.47 | bwd_microstep: 1061.75 | bwd_inner_microstep: 1061.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3503 [2024-06-11 04:29:58,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.14 | bwd_microstep: 1575.88 | bwd_inner_microstep: 1575.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2145 [2024-06-11 04:29:59,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.14 | bwd_microstep: 847.42 | bwd_inner_microstep: 847.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1970 [2024-06-11 04:30:00,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.07 | bwd_microstep: 842.85 | bwd_inner_microstep: 842.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3666 [2024-06-11 04:30:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.78 | bwd_microstep: 1329.71 | bwd_inner_microstep: 1329.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 04:30:04,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.90 | bwd_microstep: 1388.14 | bwd_inner_microstep: 1388.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 04:30:06,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1377.36 | bwd_inner_microstep: 1377.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3629 [2024-06-11 04:30:08,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.50 | bwd_microstep: 1612.00 | bwd_inner_microstep: 1611.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 04:30:10,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1295.35 | bwd_inner_microstep: 1295.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 04:30:12,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3820 [2024-06-11 04:30:14,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.68 | bwd_microstep: 1625.65 | bwd_inner_microstep: 1625.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3719 [2024-06-11 04:30:16,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.42 | bwd_microstep: 1368.05 | bwd_inner_microstep: 1368.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-11 04:30:18,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1299.19 | bwd_inner_microstep: 1299.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 04:30:20,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.98 | bwd_microstep: 1560.17 | bwd_inner_microstep: 1560.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3600 [2024-06-11 04:30:21,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.81 | bwd_microstep: 1213.64 | bwd_inner_microstep: 1213.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2382 [2024-06-11 04:30:23,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.21 | bwd_microstep: 966.62 | bwd_inner_microstep: 966.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3564 [2024-06-11 04:30:25,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1529.63 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3448 [2024-06-11 04:30:27,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.76 | bwd_microstep: 1481.84 | bwd_inner_microstep: 1481.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3590 [2024-06-11 04:30:29,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.96 | bwd_microstep: 1339.26 | bwd_inner_microstep: 1339.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3837 [2024-06-11 04:30:31,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.20 | bwd_microstep: 1620.95 | bwd_inner_microstep: 1620.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3805 [2024-06-11 04:30:34,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.04 | optimizer_gradients: 4.04 | optimizer_step: 6.58 [2024-06-11 04:30:34,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.59 | bwd_microstep: 2120.66 | bwd_inner_microstep: 1881.32 | bwd_allreduce_microstep: 239.29 | step_microstep: 37.74 [2024-06-11 04:30:34,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16423.28 | bwd: 44294.94 | bwd_inner: 44054.76 | bwd_allreduce: 239.52 | step: 39.21 {'loss': 1.1776, 'learning_rate': 6.013715728311664e-07, 'epoch': 0.92} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3462 [2024-06-11 04:30:36,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.80 | bwd_microstep: 1472.50 | bwd_inner_microstep: 1472.34 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 04:30:38,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.92 | bwd_microstep: 1257.96 | bwd_inner_microstep: 1257.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3470 [2024-06-11 04:30:39,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.42 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 04:30:41,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1247.39 | bwd_inner_microstep: 1247.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748 [2024-06-11 04:30:43,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.56 | bwd_microstep: 1638.65 | bwd_inner_microstep: 1638.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-11 04:30:44,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.94 | bwd_microstep: 788.25 | bwd_inner_microstep: 788.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 04:30:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.71 | bwd_microstep: 1385.93 | bwd_inner_microstep: 1385.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3699 [2024-06-11 04:30:49,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.72 | bwd_microstep: 1627.54 | bwd_inner_microstep: 1627.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2022 [2024-06-11 04:30:50,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.33 | bwd_microstep: 807.12 | bwd_inner_microstep: 807.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2449 [2024-06-11 04:30:51,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.20 | bwd_microstep: 978.29 | bwd_inner_microstep: 978.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1901 [2024-06-11 04:30:52,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.62 | bwd_microstep: 717.86 | bwd_inner_microstep: 717.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3560 [2024-06-11 04:31:02,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.55 | bwd_microstep: 1292.61 | bwd_inner_microstep: 1292.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 04:31:04,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.92 | bwd_microstep: 1368.06 | bwd_inner_microstep: 1368.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 04:31:06,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.02 | bwd_microstep: 1338.58 | bwd_inner_microstep: 1338.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3652 [2024-06-11 04:31:08,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.05 | bwd_microstep: 1611.52 | bwd_inner_microstep: 1611.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3631 [2024-06-11 04:31:10,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.47 | bwd_microstep: 1650.13 | bwd_inner_microstep: 1650.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3630 [2024-06-11 04:31:12,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.88 | bwd_microstep: 1260.13 | bwd_inner_microstep: 1260.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 04:31:14,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.53 | bwd_microstep: 1551.63 | bwd_inner_microstep: 1551.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3511 [2024-06-11 04:31:16,513] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.52 | bwd_microstep: 1413.00 | bwd_inner_microstep: 1412.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 04:31:18,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.75 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3497 [2024-06-11 04:31:20,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.06 | bwd_microstep: 1217.83 | bwd_inner_microstep: 1217.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 04:31:21,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.56 | bwd_microstep: 1291.17 | bwd_inner_microstep: 1291.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 04:31:23,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1259.70 | bwd_inner_microstep: 1259.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 04:31:25,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1552.93 | bwd_inner_microstep: 1552.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 04:31:27,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.48 | bwd_microstep: 1256.80 | bwd_inner_microstep: 1256.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-11 04:31:28,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.71 | bwd_microstep: 878.39 | bwd_inner_microstep: 878.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2063 [2024-06-11 04:31:29,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.64 | bwd_microstep: 877.28 | bwd_inner_microstep: 877.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3529 [2024-06-11 04:31:31,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.29 | bwd_microstep: 1197.73 | bwd_inner_microstep: 1197.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3584 [2024-06-11 04:31:33,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.47 | bwd_microstep: 1429.18 | bwd_inner_microstep: 1429.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3426 [2024-06-11 04:31:35,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1480.95 | bwd_inner_microstep: 1480.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3565 [2024-06-11 04:31:37,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.79 | bwd_microstep: 1593.86 | bwd_inner_microstep: 1593.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2160 [2024-06-11 04:31:40,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.09 | optimizer_gradients: 4.03 | optimizer_step: 6.57 [2024-06-11 04:31:40,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.42 | bwd_microstep: 2412.02 | bwd_inner_microstep: 963.45 | bwd_allreduce_microstep: 1448.53 | step_microstep: 38.05 [2024-06-11 04:31:40,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15418.78 | bwd: 42557.82 | bwd_inner: 41108.27 | bwd_allreduce: 1448.82 | step: 39.58 {'loss': 1.2242, 'learning_rate': 5.922707840066544e-07, 'epoch': 0.92} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 04:31:42,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.62 | bwd_microstep: 1271.22 | bwd_inner_microstep: 1271.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 04:31:44,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1478.05 | bwd_inner_microstep: 1478.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3844 [2024-06-11 04:31:46,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.21 | bwd_microstep: 1558.81 | bwd_inner_microstep: 1558.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 04:31:48,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1276.99 | bwd_inner_microstep: 1276.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-11 04:31:50,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.12 | bwd_microstep: 1441.57 | bwd_inner_microstep: 1441.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1893 [2024-06-11 04:31:51,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.49 | bwd_microstep: 713.82 | bwd_inner_microstep: 713.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3766 [2024-06-11 04:31:53,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.22 | bwd_microstep: 1590.48 | bwd_inner_microstep: 1590.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 04:31:55,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1385.00 | bwd_inner_microstep: 1384.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 04:31:57,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.37 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021 [2024-06-11 04:31:58,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.81 | bwd_microstep: 718.42 | bwd_inner_microstep: 718.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 04:32:00,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.70 | bwd_microstep: 1248.19 | bwd_inner_microstep: 1248.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3435 [2024-06-11 04:32:02,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.31 | bwd_microstep: 1442.54 | bwd_inner_microstep: 1442.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 04:32:04,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.79 | bwd_microstep: 1480.90 | bwd_inner_microstep: 1480.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3464 [2024-06-11 04:32:06,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.21 | bwd_microstep: 1466.01 | bwd_inner_microstep: 1465.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3888 [2024-06-11 04:32:08,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.32 | bwd_microstep: 1528.62 | bwd_inner_microstep: 1528.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 04:32:10,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.76 | bwd_microstep: 1283.99 | bwd_inner_microstep: 1283.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 04:32:11,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.73 | bwd_microstep: 1284.08 | bwd_inner_microstep: 1284.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3515 [2024-06-11 04:32:13,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1392.75 | bwd_inner_microstep: 1392.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3644 [2024-06-11 04:32:15,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.75 | bwd_microstep: 1312.14 | bwd_inner_microstep: 1312.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1954 [2024-06-11 04:32:16,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.58 | bwd_microstep: 828.86 | bwd_inner_microstep: 828.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3526 [2024-06-11 04:32:18,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.97 | bwd_microstep: 1193.15 | bwd_inner_microstep: 1193.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3680 [2024-06-11 04:32:20,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.60 | bwd_microstep: 1425.14 | bwd_inner_microstep: 1425.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3537 [2024-06-11 04:32:22,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1415.09 | bwd_inner_microstep: 1415.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 04:32:24,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.96 | bwd_microstep: 1413.18 | bwd_inner_microstep: 1413.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3553 [2024-06-11 04:32:26,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.25 | bwd_microstep: 1342.98 | bwd_inner_microstep: 1342.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-11 04:32:27,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.20 | bwd_microstep: 792.70 | bwd_inner_microstep: 792.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-11 04:32:29,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.20 | bwd_microstep: 1473.53 | bwd_inner_microstep: 1473.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-11 04:32:31,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1395.47 | bwd_inner_microstep: 1395.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2799 [2024-06-11 04:32:32,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.00 | bwd_microstep: 1165.68 | bwd_inner_microstep: 1165.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-11 04:32:34,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1487.72 | bwd_inner_microstep: 1487.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582 [2024-06-11 04:32:37,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.72 | bwd_microstep: 1597.21 | bwd_inner_microstep: 1597.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3570 [2024-06-11 04:32:43,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-11 04:32:43,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.23 | bwd_microstep: 6139.68 | bwd_inner_microstep: 2089.54 | bwd_allreduce_microstep: 4050.08 | step_microstep: 38.07 [2024-06-11 04:32:43,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15910.25 | bwd: 46928.47 | bwd_inner: 42877.48 | bwd_allreduce: 4050.31 | step: 39.59 █████████▏| 1592/1726 [27:50:06<2:18:23, 61.97s/it] 92%|█████████▏| 1593/1726 [27:51:06<2:16:04, 61.39s/it] 92%|█████████▏| 1593/1726 [27:51:06<2:16:04, 61.39s/it] 92%|█████████▏| 1594/1726 [27:52:09<2:16:10, 61.90s/it] 92%|█████████▏| 1594/1726 [27:52:09<2:16:10, 61.90s/it] 92%|█████████▏| 1595/1726 [27:53:10<2:14:35, 61.64s/it] 92%|█████████▏| 1595/1726 [27:53:10<2:14:35, 61.64s/it] 92%|█████████▏| 1596/1726 [27:54:17<2:16:38, 63.06s/it] 92%|█████████▏| 1596/1726 [27:54:17<2:16:38, 63.06s/it] 93%|█████████▎| 1597/1726 [27:55:20<2:15:39, 63.10s/it] {'loss': 1.183, 'learning_rate': 5.832383491035499e-07, 'epoch': 0.93} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1952 [2024-06-11 04:32:45,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 327.54 | bwd_microstep: 884.92 | bwd_inner_microstep: 884.74 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.14 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3947 [2024-06-11 04:32:47,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.31 | bwd_microstep: 1691.77 | bwd_inner_microstep: 1691.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4007 [2024-06-11 04:32:49,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.32 | bwd_microstep: 1606.45 | bwd_inner_microstep: 1606.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-11 04:32:51,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.39 | bwd_microstep: 1345.24 | bwd_inner_microstep: 1345.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3785 [2024-06-11 04:32:53,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1443.77 | bwd_inner_microstep: 1443.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 04:32:55,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.90 | bwd_microstep: 1276.91 | bwd_inner_microstep: 1276.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3555 [2024-06-11 04:32:57,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.41 | bwd_microstep: 1456.68 | bwd_inner_microstep: 1456.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 04:32:58,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.68 | bwd_microstep: 1247.71 | bwd_inner_microstep: 1247.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 04:33:00,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.11 | bwd_microstep: 1384.59 | bwd_inner_microstep: 1384.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-11 04:33:01,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.53 | bwd_microstep: 682.41 | bwd_inner_microstep: 682.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2009 [2024-06-11 04:33:02,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.26 | bwd_microstep: 835.36 | bwd_inner_microstep: 835.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 04:33:05,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.94 | bwd_microstep: 1625.72 | bwd_inner_microstep: 1625.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3669 [2024-06-11 04:33:07,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.77 | bwd_microstep: 1821.06 | bwd_inner_microstep: 1821.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1905 [2024-06-11 04:33:08,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.82 | bwd_microstep: 778.73 | bwd_inner_microstep: 778.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487 [2024-06-11 04:33:10,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.46 | bwd_microstep: 1411.64 | bwd_inner_microstep: 1411.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3438 [2024-06-11 04:33:12,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1348.95 | bwd_inner_microstep: 1348.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 04:33:14,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.52 | bwd_microstep: 1284.24 | bwd_inner_microstep: 1284.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2497 [2024-06-11 04:33:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.43 | bwd_microstep: 957.90 | bwd_inner_microstep: 957.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3725 [2024-06-11 04:33:17,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.55 | bwd_microstep: 1338.13 | bwd_inner_microstep: 1338.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 04:33:19,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.83 | bwd_microstep: 1491.69 | bwd_inner_microstep: 1491.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3519 [2024-06-11 04:33:21,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.62 | bwd_microstep: 1488.47 | bwd_inner_microstep: 1488.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2154 [2024-06-11 04:33:22,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.83 | bwd_microstep: 849.12 | bwd_inner_microstep: 849.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3535 [2024-06-11 04:33:24,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.08 | bwd_microstep: 1293.58 | bwd_inner_microstep: 1293.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-11 04:33:26,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.44 | bwd_microstep: 1394.35 | bwd_inner_microstep: 1394.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2273 [2024-06-11 04:33:27,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.46 | bwd_microstep: 880.20 | bwd_inner_microstep: 880.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1584 [2024-06-11 04:33:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 224.77 | bwd_microstep: 591.81 | bwd_inner_microstep: 591.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3558 [2024-06-11 04:33:30,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.81 | bwd_microstep: 1524.98 | bwd_inner_microstep: 1524.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-11 04:33:32,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.03 | bwd_microstep: 1281.90 | bwd_inner_microstep: 1281.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3571 [2024-06-11 04:33:34,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.93 | bwd_microstep: 1330.66 | bwd_inner_microstep: 1330.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 04:33:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.18 | bwd_microstep: 1546.20 | bwd_inner_microstep: 1546.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3408 [2024-06-11 04:33:38,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1346.41 | bwd_inner_microstep: 1346.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3745 [2024-06-11 04:33:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.84 | optimizer_gradients: 4.23 | optimizer_step: 6.62 [2024-06-11 04:33:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.43 | bwd_microstep: 6749.51 | bwd_inner_microstep: 1774.29 | bwd_allreduce_microstep: 4975.15 | step_microstep: 39.97 [2024-06-11 04:33:45,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15368.69 | bwd: 46191.09 | bwd_inner: 41214.88 | bwd_allreduce: 4975.46 | step: 41.53 {'loss': 1.1716, 'learning_rate': 5.742742999340411e-07, 'epoch': 0.93} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411 [2024-06-11 04:33:47,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.56 | bwd_microstep: 1268.94 | bwd_inner_microstep: 1268.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 04:33:49,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1245.08 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3476 [2024-06-11 04:33:51,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.93 | bwd_microstep: 1480.08 | bwd_inner_microstep: 1480.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 04:33:52,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.53 | bwd_microstep: 1244.12 | bwd_inner_microstep: 1244.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-11 04:33:55,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.69 | bwd_microstep: 1538.19 | bwd_inner_microstep: 1538.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768 [2024-06-11 04:33:57,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.74 | bwd_microstep: 1436.79 | bwd_inner_microstep: 1436.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3422 [2024-06-11 04:33:58,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.74 | bwd_microstep: 1182.49 | bwd_inner_microstep: 1182.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 04:34:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.41 | bwd_microstep: 1287.75 | bwd_inner_microstep: 1287.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2042 [2024-06-11 04:34:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.04 | bwd_microstep: 810.98 | bwd_inner_microstep: 810.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 04:34:03,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.62 | bwd_microstep: 1393.43 | bwd_inner_microstep: 1393.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-11 04:34:05,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.61 | bwd_microstep: 1291.82 | bwd_inner_microstep: 1291.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651 [2024-06-11 04:34:07,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.19 | bwd_microstep: 1321.62 | bwd_inner_microstep: 1321.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 04:34:09,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.24 | bwd_microstep: 1482.86 | bwd_inner_microstep: 1482.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3696 [2024-06-11 04:34:11,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.14 | bwd_microstep: 1554.85 | bwd_inner_microstep: 1554.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 04:34:13,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.01 | bwd_microstep: 1379.75 | bwd_inner_microstep: 1379.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3457 [2024-06-11 04:34:15,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.68 | bwd_microstep: 1341.98 | bwd_inner_microstep: 1341.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 04:34:16,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1339.26 | bwd_inner_microstep: 1339.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 04:34:18,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.94 | bwd_microstep: 1483.48 | bwd_inner_microstep: 1483.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2087 [2024-06-11 04:34:20,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.01 | bwd_microstep: 912.11 | bwd_inner_microstep: 912.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3667 [2024-06-11 04:34:22,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.55 | bwd_microstep: 1550.05 | bwd_inner_microstep: 1550.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 04:34:24,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1387.74 | bwd_inner_microstep: 1387.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3598 [2024-06-11 04:34:26,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1501.87 | bwd_inner_microstep: 1501.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2135 [2024-06-11 04:34:27,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.57 | bwd_microstep: 930.03 | bwd_inner_microstep: 930.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 04:34:29,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.70 | bwd_microstep: 1656.09 | bwd_inner_microstep: 1656.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 04:34:31,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.73 | bwd_microstep: 1358.98 | bwd_inner_microstep: 1358.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192 [2024-06-11 04:34:33,135] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.35 | bwd_microstep: 957.09 | bwd_inner_microstep: 957.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-11 04:34:35,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.93 | bwd_microstep: 1403.08 | bwd_inner_microstep: 1403.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3598 [2024-06-11 04:34:37,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1464.25 | bwd_inner_microstep: 1464.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-11 04:34:39,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.26 | bwd_microstep: 1650.41 | bwd_inner_microstep: 1650.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3808 [2024-06-11 04:34:41,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.64 | bwd_microstep: 1604.48 | bwd_inner_microstep: 1604.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-11 04:34:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.20 | bwd_microstep: 1181.85 | bwd_inner_microstep: 1181.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3564 [2024-06-11 04:34:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 04:34:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.54 | bwd_microstep: 3919.76 | bwd_inner_microstep: 1693.88 | bwd_allreduce_microstep: 2225.83 | step_microstep: 37.80 [2024-06-11 04:34:47,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16147.64 | bwd: 45561.27 | bwd_inner: 43334.54 | bwd_allreduce: 2226.06 | step: 39.29 {'loss': 1.1702, 'learning_rate': 5.653786680694629e-07, 'epoch': 0.93} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 04:34:49,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1468.10 | bwd_inner_microstep: 1468.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 04:34:51,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.84 | bwd_microstep: 1282.58 | bwd_inner_microstep: 1282.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 04:34:53,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1478.72 | bwd_inner_microstep: 1478.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3768 [2024-06-11 04:34:55,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.58 | bwd_microstep: 1342.97 | bwd_inner_microstep: 1342.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611 [2024-06-11 04:34:57,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.07 | bwd_microstep: 1310.29 | bwd_inner_microstep: 1310.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 04:34:59,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.89 | bwd_microstep: 1384.77 | bwd_inner_microstep: 1384.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 04:35:00,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.93 | bwd_microstep: 1285.50 | bwd_inner_microstep: 1285.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-11 04:35:02,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.08 | bwd_microstep: 793.18 | bwd_inner_microstep: 793.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 04:35:04,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.52 | bwd_microstep: 1476.49 | bwd_inner_microstep: 1476.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 04:35:05,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.93 | bwd_microstep: 1339.85 | bwd_inner_microstep: 1339.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3629 [2024-06-11 04:35:07,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.71 | bwd_microstep: 1323.03 | bwd_inner_microstep: 1323.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 04:35:09,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.66 | bwd_microstep: 1351.75 | bwd_inner_microstep: 1351.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3510 [2024-06-11 04:35:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.19 | bwd_microstep: 1432.59 | bwd_inner_microstep: 1432.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2161 [2024-06-11 04:35:12,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.08 | bwd_microstep: 884.52 | bwd_inner_microstep: 884.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3700 [2024-06-11 04:35:15,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.47 | bwd_microstep: 1720.95 | bwd_inner_microstep: 1720.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3837 [2024-06-11 04:35:17,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.83 | bwd_microstep: 1460.29 | bwd_inner_microstep: 1460.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3635 [2024-06-11 04:35:19,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.91 | bwd_microstep: 1612.84 | bwd_inner_microstep: 1612.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3631 [2024-06-11 04:35:21,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.13 | bwd_microstep: 1512.20 | bwd_inner_microstep: 1512.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 04:35:23,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.74 | bwd_microstep: 1459.91 | bwd_inner_microstep: 1459.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2071 [2024-06-11 04:35:24,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.76 | bwd_microstep: 725.82 | bwd_inner_microstep: 725.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3817 [2024-06-11 04:35:26,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.34 | bwd_microstep: 1355.42 | bwd_inner_microstep: 1355.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-11 04:35:28,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1561.64 | bwd_inner_microstep: 1561.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3016 [2024-06-11 04:35:30,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.93 | bwd_microstep: 1245.09 | bwd_inner_microstep: 1245.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 04:35:32,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.65 | bwd_microstep: 1476.54 | bwd_inner_microstep: 1476.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 04:35:34,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.44 | bwd_microstep: 1282.81 | bwd_inner_microstep: 1282.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-11 04:35:35,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.54 | bwd_microstep: 797.35 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3484 [2024-06-11 04:35:36,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.57 | bwd_microstep: 1186.50 | bwd_inner_microstep: 1186.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-11 04:35:38,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.70 | bwd_microstep: 1284.44 | bwd_inner_microstep: 1284.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-11 04:35:40,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1408.52 | bwd_inner_microstep: 1408.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3650 [2024-06-11 04:35:42,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.94 | bwd_microstep: 1515.47 | bwd_inner_microstep: 1515.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-11 04:35:44,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.39 | bwd_microstep: 970.72 | bwd_inner_microstep: 970.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 04:35:51,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.80 | optimizer_gradients: 4.09 | optimizer_step: 6.62 [2024-06-11 04:35:51,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.97 | bwd_microstep: 6424.27 | bwd_inner_microstep: 1692.13 | bwd_allreduce_microstep: 4732.08 | step_microstep: 37.85 [2024-06-11 04:35:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15821.16 | bwd: 47155.16 | bwd_inner: 42422.14 | bwd_allreduce: 4732.32 | step: 39.39 {'loss': 1.154, 'learning_rate': 5.565514848401887e-07, 'epoch': 0.93} 93%|█████████▎| 1597/1726 [27:55:20<2:15:39, 63.10s/it] 93%|█████████▎| 1598/1726 [27:56:22<2:13:50, 62.74s/it] 93%|█████████▎| 1598/1726 [27:56:22<2:13:50, 62.74s/it] 93%|█████████▎| 1599/1726 [27:57:24<2:12:21, 62.53s/it] 93%|█████████▎| 1599/1726 [27:57:24<2:12:21, 62.53s/it] 93%|█████████▎| 1600/1726 [27:58:27<2:11:48, 62.77s/it] 93%|█████████▎| 1600/1726 [27:58:27<2:11:48, 62.77s/it][INFO|trainer.py:2936] 2024-06-11 04:35:53,554 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600 [INFO|configuration_utils.py:473] 2024-06-11 04:35:53,558 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/config.json [INFO|configuration_utils.py:594] 2024-06-11 04:35:53,561 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-11 04:36:01,298 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-11 04:36:01,332 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-11 04:36:01,359 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-11 04:36:01,372 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/added_tokens.json [2024-06-11 04:36:01,712] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step1600 is about to be saved! [2024-06-11 04:36:01,728] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt [2024-06-11 04:36:01,728] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt... [2024-06-11 04:36:09,775] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/mp_rank_00_model_states.pt. [2024-06-11 04:36:09,813] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... [2024-06-11 04:36:21,305] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. [2024-06-11 04:36:21,321] [INFO] [engine.py:3488:_save_zero_checkpoint] zero checkpoint saved work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tmp-checkpoint-1600/global_step1600/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt [2024-06-11 04:36:21,321] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step1600 is ready now! [INFO|trainer.py:3028] 2024-06-11 04:36:21,503 >> Deleting older checkpoint [work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/checkpoint-1000] due to args.save_total_limit dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 04:36:24,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1361.75 | bwd_inner_microstep: 1361.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 04:36:25,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1335.17 | bwd_inner_microstep: 1335.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3838 [2024-06-11 04:36:28,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.81 | bwd_microstep: 1544.90 | bwd_inner_microstep: 1544.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-11 04:36:30,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1640.59 | bwd_inner_microstep: 1640.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 04:36:32,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.36 | bwd_microstep: 1547.30 | bwd_inner_microstep: 1547.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1862 [2024-06-11 04:36:33,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.46 | bwd_microstep: 705.44 | bwd_inner_microstep: 705.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-11 04:36:35,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.86 | bwd_microstep: 1435.22 | bwd_inner_microstep: 1435.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-11 04:36:36,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.51 | bwd_microstep: 796.39 | bwd_inner_microstep: 796.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 04:36:55,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.52 | bwd_microstep: 1382.34 | bwd_inner_microstep: 1382.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-11 04:36:57,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.92 | bwd_microstep: 1277.99 | bwd_inner_microstep: 1277.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3487 [2024-06-11 04:36:59,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.90 | bwd_microstep: 1405.99 | bwd_inner_microstep: 1405.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1981 [2024-06-11 04:37:00,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.35 | bwd_microstep: 764.30 | bwd_inner_microstep: 764.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-11 04:37:02,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.79 | bwd_microstep: 1375.46 | bwd_inner_microstep: 1375.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-11 04:37:04,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1404.25 | bwd_inner_microstep: 1404.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 04:37:06,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1387.06 | bwd_inner_microstep: 1386.83 | bwd_allreduce_microstep: 0.18 | step_microstep: 0.26 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 04:37:08,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1380.14 | bwd_inner_microstep: 1380.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3641 [2024-06-11 04:37:10,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.15 | bwd_microstep: 1510.20 | bwd_inner_microstep: 1510.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 04:37:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1401.32 | bwd_inner_microstep: 1401.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1920 [2024-06-11 04:37:13,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.98 | bwd_microstep: 719.17 | bwd_inner_microstep: 719.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3649 [2024-06-11 04:37:15,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.28 | bwd_microstep: 1413.74 | bwd_inner_microstep: 1413.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2063 [2024-06-11 04:37:16,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.37 | bwd_microstep: 812.20 | bwd_inner_microstep: 812.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3531 [2024-06-11 04:37:18,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1293.85 | bwd_inner_microstep: 1293.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 04:37:20,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.68 | bwd_microstep: 1554.52 | bwd_inner_microstep: 1554.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819 [2024-06-11 04:37:22,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.79 | bwd_microstep: 1517.67 | bwd_inner_microstep: 1517.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-11 04:37:24,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.99 | bwd_microstep: 1557.54 | bwd_inner_microstep: 1557.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-11 04:37:25,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.46 | bwd_microstep: 802.58 | bwd_inner_microstep: 802.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 04:37:27,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1347.22 | bwd_inner_microstep: 1347.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574 [2024-06-11 04:37:29,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.97 | bwd_microstep: 1499.29 | bwd_inner_microstep: 1499.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 04:37:31,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1452.46 | bwd_inner_microstep: 1452.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1934 [2024-06-11 04:37:32,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.29 | bwd_microstep: 726.89 | bwd_inner_microstep: 726.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3436 [2024-06-11 04:37:34,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.71 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 04:37:43,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.24 | optimizer_step: 6.62 [2024-06-11 04:37:43,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.01 | bwd_microstep: 7510.08 | bwd_inner_microstep: 1543.53 | bwd_allreduce_microstep: 5966.47 | step_microstep: 39.04 [2024-06-11 04:37:43,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15511.15 | bwd: 47373.12 | bwd_inner: 41405.47 | bwd_allreduce: 5966.90 | step: 40.78 {'loss': 1.1878, 'learning_rate': 5.477927813355056e-07, 'epoch': 0.93} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3526 [2024-06-11 04:37:45,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.70 | bwd_microstep: 1477.61 | bwd_inner_microstep: 1477.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-11 04:37:47,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.95 | bwd_microstep: 1529.30 | bwd_inner_microstep: 1529.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2313 [2024-06-11 04:37:48,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.84 | bwd_microstep: 882.55 | bwd_inner_microstep: 882.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 04:37:50,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1506.19 | bwd_inner_microstep: 1506.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:37:52,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.74 | bwd_microstep: 1380.90 | bwd_inner_microstep: 1380.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3848 [2024-06-11 04:37:54,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 646.00 | bwd_microstep: 1762.19 | bwd_inner_microstep: 1762.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3405 [2024-06-11 04:37:56,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.46 | bwd_microstep: 1179.95 | bwd_inner_microstep: 1179.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-11 04:37:58,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.11 | bwd_microstep: 1250.58 | bwd_inner_microstep: 1250.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3986 [2024-06-11 04:38:00,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.12 | bwd_microstep: 1706.97 | bwd_inner_microstep: 1706.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3404 [2024-06-11 04:38:02,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.56 | bwd_microstep: 1310.09 | bwd_inner_microstep: 1310.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-11 04:38:04,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1345.78 | bwd_inner_microstep: 1345.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3671 [2024-06-11 04:38:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.71 | bwd_microstep: 1593.14 | bwd_inner_microstep: 1593.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3460 [2024-06-11 04:38:08,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.91 | bwd_microstep: 1287.20 | bwd_inner_microstep: 1287.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-11 04:38:10,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.45 | bwd_microstep: 1340.91 | bwd_inner_microstep: 1340.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3667 [2024-06-11 04:38:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.04 | bwd_microstep: 1562.23 | bwd_inner_microstep: 1562.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3769 [2024-06-11 04:38:14,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.31 | bwd_microstep: 1601.98 | bwd_inner_microstep: 1601.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-11 04:38:16,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.38 | bwd_microstep: 1285.69 | bwd_inner_microstep: 1285.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 04:38:17,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.93 | bwd_microstep: 1290.05 | bwd_inner_microstep: 1290.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3841 [2024-06-11 04:38:19,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.98 | bwd_microstep: 1465.28 | bwd_inner_microstep: 1465.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-11 04:38:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.86 | bwd_microstep: 1639.44 | bwd_inner_microstep: 1639.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3642 [2024-06-11 04:38:24,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.71 | bwd_microstep: 1345.52 | bwd_inner_microstep: 1345.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3828 [2024-06-11 04:38:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.65 | bwd_microstep: 1481.01 | bwd_inner_microstep: 1480.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 04:38:28,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.44 | bwd_microstep: 1350.18 | bwd_inner_microstep: 1350.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2514 [2024-06-11 04:38:29,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.73 | bwd_microstep: 960.86 | bwd_inner_microstep: 960.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-11 04:38:31,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.58 | bwd_microstep: 1415.09 | bwd_inner_microstep: 1415.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2049 [2024-06-11 04:38:32,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.66 | bwd_microstep: 910.59 | bwd_inner_microstep: 910.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-11 04:38:34,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.20 | bwd_microstep: 1181.12 | bwd_inner_microstep: 1181.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 04:38:36,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.91 | bwd_microstep: 1377.96 | bwd_inner_microstep: 1377.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3772 [2024-06-11 04:38:38,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.68 | bwd_microstep: 1737.93 | bwd_inner_microstep: 1737.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-11 04:38:40,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.83 | bwd_microstep: 1528.58 | bwd_inner_microstep: 1528.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 04:38:42,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.56 | bwd_microstep: 1341.80 | bwd_inner_microstep: 1341.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3586 [2024-06-11 04:38:45,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.06 | optimizer_gradients: 4.02 | optimizer_step: 6.62 [2024-06-11 04:38:45,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.61 | bwd_microstep: 1988.41 | bwd_inner_microstep: 1696.13 | bwd_allreduce_microstep: 292.22 | step_microstep: 39.27 [2024-06-11 04:38:45,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16699.96 | bwd: 45017.10 | bwd_inner: 44723.98 | bwd_allreduce: 292.45 | step: 40.78 {'loss': 1.1753, 'learning_rate': 5.391025884035239e-07, 'epoch': 0.93} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2468 [2024-06-11 04:38:46,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.18 | bwd_microstep: 948.02 | bwd_inner_microstep: 947.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2012 [2024-06-11 04:38:47,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.51 | bwd_microstep: 802.38 | bwd_inner_microstep: 802.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 04:38:49,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.83 | bwd_microstep: 1379.19 | bwd_inner_microstep: 1379.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-11 04:38:50,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 790.27 | bwd_inner_microstep: 790.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:38:52,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1385.18 | bwd_inner_microstep: 1385.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3780 [2024-06-11 04:38:54,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.69 | bwd_microstep: 1348.22 | bwd_inner_microstep: 1348.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 04:38:56,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1243.88 | bwd_inner_microstep: 1243.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-11 04:38:57,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.28 | bwd_microstep: 797.58 | bwd_inner_microstep: 797.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1908 [2024-06-11 04:38:58,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.35 | bwd_microstep: 686.24 | bwd_inner_microstep: 686.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1966 [2024-06-11 04:38:59,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.18 | bwd_microstep: 853.77 | bwd_inner_microstep: 853.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2872 [2024-06-11 04:39:00,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.91 | bwd_microstep: 1175.45 | bwd_inner_microstep: 1175.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478 [2024-06-11 04:39:03,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.76 | bwd_microstep: 1573.98 | bwd_inner_microstep: 1573.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3502 [2024-06-11 04:39:05,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.40 | bwd_microstep: 1679.93 | bwd_inner_microstep: 1679.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-11 04:39:07,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.86 | bwd_microstep: 1289.34 | bwd_inner_microstep: 1289.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2448 [2024-06-11 04:39:08,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 949.08 | bwd_inner_microstep: 949.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3610 [2024-06-11 04:39:10,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1341.25 | bwd_inner_microstep: 1341.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3515 [2024-06-11 04:39:12,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.40 | bwd_microstep: 1487.05 | bwd_inner_microstep: 1487.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-11 04:39:14,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.19 | bwd_microstep: 1459.38 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3827 [2024-06-11 04:39:16,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.07 | bwd_microstep: 1263.14 | bwd_inner_microstep: 1263.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3838 [2024-06-11 04:39:18,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1393.86 | bwd_inner_microstep: 1393.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3444 [2024-06-11 04:39:19,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.65 | bwd_microstep: 1353.15 | bwd_inner_microstep: 1353.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557 [2024-06-11 04:39:21,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1429.99 | bwd_inner_microstep: 1429.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3679 [2024-06-11 04:39:23,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.68 | bwd_microstep: 1391.66 | bwd_inner_microstep: 1391.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2285 [2024-06-11 04:39:25,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 880.22 | bwd_inner_microstep: 880.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-11 04:39:26,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.15 | bwd_microstep: 1303.21 | bwd_inner_microstep: 1303.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553 [2024-06-11 04:39:28,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.70 | bwd_microstep: 1417.23 | bwd_inner_microstep: 1417.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3378 [2024-06-11 04:39:30,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.32 | bwd_microstep: 1339.97 | bwd_inner_microstep: 1339.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025 [2024-06-11 04:39:31,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.23 | bwd_microstep: 898.44 | bwd_inner_microstep: 898.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2249 [2024-06-11 04:39:33,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.32 | bwd_microstep: 1062.38 | bwd_inner_microstep: 1062.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2259 [2024-06-11 04:39:34,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.90 | bwd_microstep: 808.74 | bwd_inner_microstep: 808.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 04:39:36,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.74 | bwd_microstep: 1282.84 | bwd_inner_microstep: 1282.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3625 [2024-06-11 04:39:45,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.19 | optimizer_step: 6.61 [2024-06-11 04:39:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.90 | bwd_microstep: 8638.37 | bwd_inner_microstep: 1470.57 | bwd_allreduce_microstep: 7167.73 | step_microstep: 38.35 [2024-06-11 04:39:45,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14415.18 | bwd: 45653.43 | bwd_inner: 38484.73 | bwd_allreduce: 7167.99 | step: 39.86 {'loss': 1.1944, 'learning_rate': 5.304809366510566e-07, 'epoch': 0.93} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513 [2024-06-11 04:39:47,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.24 | bwd_microstep: 1574.72 | bwd_inner_microstep: 1574.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3477 [2024-06-11 04:39:49,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.42 | bwd_microstep: 1377.48 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3405 [2024-06-11 04:39:51,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.63 | bwd_microstep: 1293.19 | bwd_inner_microstep: 1293.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837 [2024-06-11 04:39:53,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.55 | bwd_microstep: 1484.96 | bwd_inner_microstep: 1484.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3837 [2024-06-11 04:39:55,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.70 | bwd_microstep: 1484.91 | bwd_inner_microstep: 1484.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 04:39:57,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1384.05 | bwd_inner_microstep: 1384.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 04:39:59,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.80 | bwd_microstep: 1245.59 | bwd_inner_microstep: 1245.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3731 [2024-06-11 04:40:01,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.12 | bwd_microstep: 1630.26 | bwd_inner_microstep: 1630.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843 [2024-06-11 04:40:03,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.24 | bwd_microstep: 1559.97 | bwd_inner_microstep: 1559.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3458 [2024-06-11 04:40:05,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.74 | bwd_microstep: 1342.04 | bwd_inner_microstep: 1342.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-11 04:40:06,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.19 | bwd_microstep: 1180.30 | bwd_inner_microstep: 1180.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2408 [2024-06-11 04:40:08,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.19 | bwd_microstep: 937.69 | bwd_inner_microstep: 937.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4126 [2024-06-11 04:40:10,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1442.54 | bwd_inner_microstep: 1442.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3453 [2024-06-11 04:40:12,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.08 | bwd_microstep: 1377.95 | bwd_inner_microstep: 1377.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1964 [2024-06-11 04:40:13,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.81 | bwd_microstep: 889.53 | bwd_inner_microstep: 889.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-11 04:40:15,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.87 | bwd_microstep: 1408.46 | bwd_inner_microstep: 1408.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3544 [2024-06-11 04:40:17,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.05 | bwd_microstep: 1452.65 | bwd_inner_microstep: 1452.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3724 [2024-06-11 04:40:19,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.51 | bwd_microstep: 1626.10 | bwd_inner_microstep: 1626.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3634 [2024-06-11 04:40:21,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.93 | bwd_microstep: 1311.62 | bwd_inner_microstep: 1311.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-11 04:40:23,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.23 | bwd_microstep: 1602.73 | bwd_inner_microstep: 1602.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3623 [2024-06-11 04:40:25,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1535.24 | bwd_inner_microstep: 1535.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3821 [2024-06-11 04:40:28,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.50 | bwd_microstep: 1852.99 | bwd_inner_microstep: 1852.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3824 [2024-06-11 04:40:30,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.86 | bwd_microstep: 1453.63 | bwd_inner_microstep: 1453.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2122 [2024-06-11 04:40:31,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.72 | bwd_microstep: 927.13 | bwd_inner_microstep: 927.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3724 [2024-06-11 04:40:33,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.24 | bwd_microstep: 1594.58 | bwd_inner_microstep: 1594.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 04:40:35,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1285.82 | bwd_inner_microstep: 1285.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 04:40:37,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1554.09 | bwd_inner_microstep: 1554.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-11 04:40:39,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.15 | bwd_microstep: 1440.50 | bwd_inner_microstep: 1440.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3595 [2024-06-11 04:40:41,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.37 | bwd_microstep: 1505.45 | bwd_inner_microstep: 1505.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3797 [2024-06-11 04:40:43,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1556.82 | bwd_inner_microstep: 1556.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052 [2024-06-11 04:40:45,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.42 | bwd_microstep: 911.33 | bwd_inner_microstep: 911.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-11 04:40:47,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.85 | optimizer_gradients: 4.03 | optimizer_step: 6.63 [2024-06-11 04:40:47,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1523.66 | bwd_allreduce_microstep: 7.64 | step_microstep: 37.45 [2024-06-11 04:40:47,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16694.77 | bwd: 44755.69 | bwd_inner: 44747.16 | bwd_allreduce: 7.86 | step: 38.93 {'loss': 1.1298, 'learning_rate': 5.219278564435204e-07, 'epoch': 0.93} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3549 [2024-06-11 04:40:49,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.96 | bwd_microstep: 1488.09 | bwd_inner_microstep: 1488.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 04:40:51,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.85 | bwd_microstep: 1278.59 | bwd_inner_microstep: 1278.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3905 [2024-06-11 04:40:53,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.73 | bwd_microstep: 1484.45 | bwd_inner_microstep: 1484.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3958 [2024-06-11 04:40:55,492] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.58 | bwd_microstep: 1696.74 | bwd_inner_microstep: 1696.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4159 [2024-06-11 04:40:57,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.65 | bwd_microstep: 1545.39 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2333 [2024-06-11 04:40:58,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.17 | bwd_microstep: 982.18 | bwd_inner_microstep: 982.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 04:41:00,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.36 | bwd_microstep: 1380.81 | bwd_inner_microstep: 1380.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3558 [2024-06-11 04:41:02,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.86 | bwd_microstep: 1249.04 | bwd_inner_microstep: 1249.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 04:41:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.62 | bwd_microstep: 792.25 | bwd_inner_microstep: 792.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3803 [2024-06-11 04:41:05,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.28 | bwd_microstep: 1553.74 | bwd_inner_microstep: 1553.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 04:41:07,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.76 | bwd_microstep: 1254.12 | bwd_inner_microstep: 1254.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3638 [2024-06-11 04:41:09,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1314.89 | bwd_inner_microstep: 1314.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3502 [2024-06-11 04:41:11,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.57 | bwd_microstep: 1328.91 | bwd_inner_microstep: 1328.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 04:41:13,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.71 | bwd_microstep: 1378.33 | bwd_inner_microstep: 1378.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3640 [2024-06-11 04:41:15,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.29 | bwd_microstep: 1657.01 | bwd_inner_microstep: 1656.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2216 [2024-06-11 04:41:16,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.77 | bwd_microstep: 925.68 | bwd_inner_microstep: 925.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3670 [2024-06-11 04:41:18,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.60 | bwd_microstep: 1548.44 | bwd_inner_microstep: 1548.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 04:41:20,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1355.02 | bwd_inner_microstep: 1354.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 04:41:22,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1390.13 | bwd_inner_microstep: 1390.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 04:41:24,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.93 | bwd_microstep: 1395.50 | bwd_inner_microstep: 1395.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1995 [2024-06-11 04:41:25,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.14 | bwd_microstep: 708.86 | bwd_inner_microstep: 708.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 04:41:27,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1396.89 | bwd_inner_microstep: 1396.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3485 [2024-06-11 04:41:29,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.30 | bwd_microstep: 1444.60 | bwd_inner_microstep: 1444.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 04:41:31,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1246.79 | bwd_inner_microstep: 1246.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3318 [2024-06-11 04:41:32,923] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.37 | bwd_microstep: 1230.18 | bwd_inner_microstep: 1230.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3559 [2024-06-11 04:41:35,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.45 | bwd_microstep: 1588.49 | bwd_inner_microstep: 1588.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3562 [2024-06-11 04:41:37,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.38 | bwd_microstep: 1421.48 | bwd_inner_microstep: 1421.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1905 [2024-06-11 04:41:38,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.33 | bwd_microstep: 745.15 | bwd_inner_microstep: 745.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-11 04:41:40,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.59 | bwd_microstep: 1449.68 | bwd_inner_microstep: 1449.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3564 [2024-06-11 04:41:42,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.60 | bwd_microstep: 1600.33 | bwd_inner_microstep: 1600.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3770 [2024-06-11 04:41:44,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.44 | bwd_microstep: 1541.67 | bwd_inner_microstep: 1541.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3755 [2024-06-11 04:41:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 04:41:50,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.28 | bwd_microstep: 5143.09 | bwd_inner_microstep: 1550.97 | bwd_allreduce_microstep: 3592.06 | step_microstep: 37.76 [2024-06-11 04:41:50,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16037.13 | bwd: 46516.54 | bwd_inner: 42923.58 | bwd_allreduce: 3592.29 | step: 39.22 {'loss': 1.1499, 'learning_rate': 5.134433779048186e-07, 'epoch': 0.93} 93%|█████████▎| 1601/1726 [28:00:19<2:41:30, 77.53s/it] 93%|█████████▎| 1601/1726 [28:00:19<2:41:30, 77.53s/it] 93%|█████████▎| 1602/1726 [28:01:21<2:30:37, 72.89s/it] 93%|█████████▎| 1602/1726 [28:01:21<2:30:37, 72.89s/it] 93%|█████████▎| 1603/1726 [28:02:22<2:21:44, 69.14s/it] 93%|█████████▎| 1603/1726 [28:02:22<2:21:44, 69.14s/it] 93%|█████████▎| 1604/1726 [28:03:24<2:16:05, 66.93s/it] 93%|█████████▎| 1604/1726 [28:03:24<2:16:05, 66.93s/it] 93%|█████████▎| 1605/1726 [28:04:26<2:12:31, 65.72s/it] 93%|█████████▎| 160dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2652 [2024-06-11 04:41:51,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.76 | bwd_microstep: 1106.16 | bwd_inner_microstep: 1106.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 04:41:53,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.15 | bwd_microstep: 1338.24 | bwd_inner_microstep: 1338.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3965 [2024-06-11 04:41:55,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.67 | bwd_microstep: 1686.06 | bwd_inner_microstep: 1686.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 04:41:57,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1375.35 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3810 [2024-06-11 04:41:59,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.80 | bwd_microstep: 1477.49 | bwd_inner_microstep: 1477.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3718 [2024-06-11 04:42:01,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.20 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 04:42:03,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.26 | bwd_microstep: 1387.60 | bwd_inner_microstep: 1387.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3756 [2024-06-11 04:42:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.84 | bwd_microstep: 1537.30 | bwd_inner_microstep: 1537.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 04:42:07,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.89 | bwd_microstep: 1248.18 | bwd_inner_microstep: 1248.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 04:42:09,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.67 | bwd_microstep: 1255.45 | bwd_inner_microstep: 1255.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3411 [2024-06-11 04:42:11,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.16 | bwd_microstep: 1367.87 | bwd_inner_microstep: 1367.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3504 [2024-06-11 04:42:13,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1416.44 | bwd_inner_microstep: 1416.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3680 [2024-06-11 04:42:15,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.11 | bwd_microstep: 1617.44 | bwd_inner_microstep: 1617.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3936 [2024-06-11 04:42:17,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 655.14 | bwd_microstep: 1793.89 | bwd_inner_microstep: 1793.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3524 [2024-06-11 04:42:20,016] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1587.10 | bwd_inner_microstep: 1587.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3645 [2024-06-11 04:42:22,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.77 | bwd_microstep: 1607.78 | bwd_inner_microstep: 1607.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:42:24,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.57 | bwd_microstep: 1382.77 | bwd_inner_microstep: 1382.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 04:42:26,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1397.70 | bwd_inner_microstep: 1397.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3617 [2024-06-11 04:42:27,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.53 | bwd_microstep: 1341.93 | bwd_inner_microstep: 1341.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3840 [2024-06-11 04:42:30,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.68 | bwd_microstep: 1707.92 | bwd_inner_microstep: 1707.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840 [2024-06-11 04:42:32,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.02 | bwd_microstep: 1659.76 | bwd_inner_microstep: 1659.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 04:42:34,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.21 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3826 [2024-06-11 04:42:36,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.10 | bwd_microstep: 1700.62 | bwd_inner_microstep: 1700.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 04:42:38,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.00 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1491.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544 [2024-06-11 04:42:40,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1427.77 | bwd_inner_microstep: 1427.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3605 [2024-06-11 04:42:43,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.09 | bwd_microstep: 1610.04 | bwd_inner_microstep: 1610.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-11 04:42:44,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.47 | bwd_microstep: 1349.01 | bwd_inner_microstep: 1348.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3542 [2024-06-11 04:42:46,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1455.30 | bwd_inner_microstep: 1455.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3813 [2024-06-11 04:42:49,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.99 | bwd_microstep: 1617.47 | bwd_inner_microstep: 1617.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3470 [2024-06-11 04:42:51,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.07 | bwd_microstep: 1541.99 | bwd_inner_microstep: 1541.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 04:42:53,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.12 | bwd_microstep: 1507.17 | bwd_inner_microstep: 1507.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3811 [2024-06-11 04:42:55,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.88 | optimizer_gradients: 4.04 | optimizer_step: 6.64 [2024-06-11 04:42:55,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.83 | bwd_microstep: 1740.83 | bwd_inner_microstep: 1733.16 | bwd_allreduce_microstep: 7.62 | step_microstep: 37.50 [2024-06-11 04:42:55,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 17684.09 | bwd: 47606.92 | bwd_inner: 47598.41 | bwd_allreduce: 7.85 | step: 39.01 {'loss': 1.1236, 'learning_rate': 5.05027530917237e-07, 'epoch': 0.93} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3472 [2024-06-11 04:42:57,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.36 | bwd_microstep: 1477.79 | bwd_inner_microstep: 1477.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3928 [2024-06-11 04:43:00,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.95 | bwd_microstep: 1592.28 | bwd_inner_microstep: 1592.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 04:43:01,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.76 | bwd_microstep: 1251.30 | bwd_inner_microstep: 1251.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796 [2024-06-11 04:43:04,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 1651.90 | bwd_inner_microstep: 1651.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-11 04:43:05,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.09 | bwd_microstep: 1403.19 | bwd_inner_microstep: 1403.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-11 04:43:07,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.94 | bwd_microstep: 1285.70 | bwd_inner_microstep: 1285.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-11 04:43:08,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.79 | bwd_microstep: 682.08 | bwd_inner_microstep: 682.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-11 04:43:10,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.71 | bwd_microstep: 1437.40 | bwd_inner_microstep: 1437.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-11 04:43:12,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2143 [2024-06-11 04:43:13,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 348.35 | bwd_microstep: 932.03 | bwd_inner_microstep: 932.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 04:43:15,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.98 | bwd_microstep: 1289.64 | bwd_inner_microstep: 1289.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3552 [2024-06-11 04:43:17,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.15 | bwd_microstep: 1297.03 | bwd_inner_microstep: 1297.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3718 [2024-06-11 04:43:19,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.35 | bwd_microstep: 1573.79 | bwd_inner_microstep: 1573.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2185 [2024-06-11 04:43:20,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.53 | bwd_microstep: 1051.71 | bwd_inner_microstep: 1051.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634 [2024-06-11 04:43:23,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.67 | bwd_microstep: 1539.21 | bwd_inner_microstep: 1539.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 04:43:24,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.90 | bwd_microstep: 1349.79 | bwd_inner_microstep: 1349.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477 [2024-06-11 04:43:26,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.43 | bwd_microstep: 1313.45 | bwd_inner_microstep: 1313.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2154 [2024-06-11 04:43:28,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.33 | bwd_microstep: 946.71 | bwd_inner_microstep: 946.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-11 04:43:29,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.90 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3657 [2024-06-11 04:43:31,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.53 | bwd_microstep: 1492.62 | bwd_inner_microstep: 1492.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3538 [2024-06-11 04:43:33,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.14 | bwd_microstep: 1416.91 | bwd_inner_microstep: 1416.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1999 [2024-06-11 04:43:34,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.02 | bwd_microstep: 710.38 | bwd_inner_microstep: 710.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3615 [2024-06-11 04:43:36,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1435.63 | bwd_inner_microstep: 1435.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-11 04:43:38,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.54 | bwd_microstep: 1455.92 | bwd_inner_microstep: 1455.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 04:43:40,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.73 | bwd_microstep: 1513.75 | bwd_inner_microstep: 1513.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 606 [2024-06-11 04:43:40,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.12 | bwd_microstep: 259.06 | bwd_inner_microstep: 259.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3474 [2024-06-11 04:43:42,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.77 | bwd_microstep: 1282.04 | bwd_inner_microstep: 1282.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3794 [2024-06-11 04:43:44,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 1552.35 | bwd_inner_microstep: 1552.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-11 04:43:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1448.33 | bwd_inner_microstep: 1448.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-11 04:43:48,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.62 | bwd_microstep: 1394.93 | bwd_inner_microstep: 1394.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3260 [2024-06-11 04:43:50,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.79 | bwd_microstep: 1316.06 | bwd_inner_microstep: 1316.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3588 [2024-06-11 04:43:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.78 | optimizer_gradients: 4.08 | optimizer_step: 6.60 [2024-06-11 04:43:57,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.84 | bwd_microstep: 6344.83 | bwd_inner_microstep: 1768.04 | bwd_allreduce_microstep: 4576.74 | step_microstep: 37.79 [2024-06-11 04:43:57,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15359.06 | bwd: 45787.55 | bwd_inner: 41209.91 | bwd_allreduce: 4576.97 | step: 39.25 {'loss': 1.1884, 'learning_rate': 4.966803451213475e-07, 'epoch': 0.93} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2043 [2024-06-11 04:43:58,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.10 | bwd_microstep: 803.21 | bwd_inner_microstep: 803.14 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3415 [2024-06-11 04:44:00,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.56 | bwd_microstep: 1277.54 | bwd_inner_microstep: 1277.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 04:44:02,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.55 | bwd_microstep: 1494.12 | bwd_inner_microstep: 1494.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-11 04:44:04,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.48 | bwd_microstep: 1485.17 | bwd_inner_microstep: 1485.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-11 04:44:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.04 | bwd_microstep: 789.51 | bwd_inner_microstep: 789.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-11 04:44:06,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.74 | bwd_microstep: 789.66 | bwd_inner_microstep: 789.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3724 [2024-06-11 04:44:08,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1493.67 | bwd_inner_microstep: 1493.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3749 [2024-06-11 04:44:10,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.73 | bwd_microstep: 1636.85 | bwd_inner_microstep: 1636.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-11 04:44:12,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.43 | bwd_microstep: 1149.95 | bwd_inner_microstep: 1149.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 04:44:14,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.24 | bwd_microstep: 1247.19 | bwd_inner_microstep: 1247.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3416 [2024-06-11 04:44:15,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.06 | bwd_microstep: 1278.37 | bwd_inner_microstep: 1278.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3673 [2024-06-11 04:44:17,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.41 | bwd_microstep: 1548.72 | bwd_inner_microstep: 1548.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 04:44:19,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.37 | bwd_microstep: 1347.17 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3704 [2024-06-11 04:44:22,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.13 | bwd_microstep: 1591.85 | bwd_inner_microstep: 1591.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-11 04:44:23,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1381.64 | bwd_inner_microstep: 1381.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2118 [2024-06-11 04:44:25,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.50 | bwd_microstep: 859.65 | bwd_inner_microstep: 859.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3432 [2024-06-11 04:44:27,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.80 | bwd_microstep: 1510.13 | bwd_inner_microstep: 1510.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-11 04:44:29,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.17 | bwd_microstep: 1531.38 | bwd_inner_microstep: 1531.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2070 [2024-06-11 04:44:30,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.84 | bwd_microstep: 754.49 | bwd_inner_microstep: 754.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1929 [2024-06-11 04:44:31,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.55 | bwd_microstep: 697.38 | bwd_inner_microstep: 697.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2413 [2024-06-11 04:44:32,651] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.09 | bwd_microstep: 940.53 | bwd_inner_microstep: 940.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 04:44:34,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.88 | bwd_microstep: 1281.47 | bwd_inner_microstep: 1281.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 04:44:36,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.87 | bwd_microstep: 1276.98 | bwd_inner_microstep: 1276.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 04:44:38,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.68 | bwd_microstep: 1346.67 | bwd_inner_microstep: 1346.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-11 04:44:39,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.57 | bwd_microstep: 1295.54 | bwd_inner_microstep: 1295.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-11 04:44:41,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1525.59 | bwd_inner_microstep: 1525.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-11 04:44:44,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.34 | bwd_microstep: 1613.04 | bwd_inner_microstep: 1613.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-11 04:44:46,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.90 | bwd_microstep: 1422.29 | bwd_inner_microstep: 1422.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3806 [2024-06-11 04:44:48,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1654.83 | bwd_inner_microstep: 1654.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-11 04:44:50,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1347.04 | bwd_inner_microstep: 1347.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-11 04:44:52,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.27 | bwd_microstep: 1590.59 | bwd_inner_microstep: 1590.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3468 [2024-06-11 04:44:56,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 04:44:56,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 3770.41 | bwd_inner_microstep: 1653.02 | bwd_allreduce_microstep: 2117.34 | step_microstep: 37.62 [2024-06-11 04:44:56,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15504.12 | bwd: 43732.65 | bwd_inner: 41614.36 | bwd_allreduce: 2117.60 | step: 39.12 {'loss': 1.1861, 'learning_rate': 4.884018499158938e-07, 'epoch': 0.93} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3482 [2024-06-11 04:44:58,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.56 | bwd_microstep: 1431.30 | bwd_inner_microstep: 1431.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2352 [2024-06-11 04:45:00,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 368.60 | bwd_microstep: 986.27 | bwd_inner_microstep: 986.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-11 04:45:02,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.89 | bwd_microstep: 1447.98 | bwd_inner_microstep: 1447.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 04:45:04,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.87 | bwd_microstep: 1391.68 | bwd_inner_microstep: 1391.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 04:45:06,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1481.53 | bwd_inner_microstep: 1481.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3719 [2024-06-11 04:45:08,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.73 | bwd_microstep: 1633.46 | bwd_inner_microstep: 1633.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472 [2024-06-11 04:45:10,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.34 | bwd_microstep: 1244.09 | bwd_inner_microstep: 1244.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-11 04:45:11,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.53 | bwd_microstep: 793.20 | bwd_inner_microstep: 793.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1945 [2024-06-11 04:45:12,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.92 | bwd_microstep: 841.16 | bwd_inner_microstep: 841.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 04:45:14,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1346.73 | bwd_inner_microstep: 1346.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 04:45:16,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.33 | bwd_microstep: 1343.97 | bwd_inner_microstep: 1343.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3276 [2024-06-11 04:45:18,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.47 | bwd_microstep: 1447.19 | bwd_inner_microstep: 1447.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3674 [2024-06-11 04:45:20,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.82 | bwd_microstep: 1616.83 | bwd_inner_microstep: 1616.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-11 04:45:22,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.76 | bwd_microstep: 1489.99 | bwd_inner_microstep: 1489.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 04:45:24,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.16 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2187 [2024-06-11 04:45:25,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 950.43 | bwd_inner_microstep: 950.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 04:45:27,791] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1488.97 | bwd_inner_microstep: 1488.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3610 [2024-06-11 04:45:30,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 653.63 | bwd_microstep: 1809.22 | bwd_inner_microstep: 1809.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-11 04:45:31,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.67 | bwd_microstep: 685.30 | bwd_inner_microstep: 685.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 04:45:32,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.62 | bwd_microstep: 1250.65 | bwd_inner_microstep: 1250.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-11 04:45:34,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.84 | bwd_microstep: 801.73 | bwd_inner_microstep: 801.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 04:45:35,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1283.05 | bwd_inner_microstep: 1283.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3628 [2024-06-11 04:45:37,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.87 | bwd_microstep: 1446.45 | bwd_inner_microstep: 1446.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3433 [2024-06-11 04:45:39,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1251.15 | bwd_inner_microstep: 1251.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 04:45:41,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.11 | bwd_microstep: 1251.26 | bwd_inner_microstep: 1251.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 04:45:43,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1400.55 | bwd_inner_microstep: 1400.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2057 [2024-06-11 04:45:44,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.87 | bwd_microstep: 911.39 | bwd_inner_microstep: 911.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:45:46,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.23 | bwd_microstep: 1377.00 | bwd_inner_microstep: 1376.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2281 [2024-06-11 04:45:47,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.64 | bwd_microstep: 911.68 | bwd_inner_microstep: 911.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851 [2024-06-11 04:45:49,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.18 | bwd_microstep: 1460.83 | bwd_inner_microstep: 1460.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-11 04:45:51,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.14 | bwd_microstep: 1182.50 | bwd_inner_microstep: 1182.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-11 04:45:57,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.59 [2024-06-11 04:45:57,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 5181.41 | bwd_inner_microstep: 1814.85 | bwd_allreduce_microstep: 3366.52 | step_microstep: 37.71 [2024-06-11 04:45:57,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15379.29 | bwd: 44622.27 | bwd_inner: 41254.86 | bwd_allreduce: 3366.74 | step: 39.16 {'loss': 1.2099, 'learning_rate': 4.801920744576949e-07, 'epoch': 0.93} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3553 [2024-06-11 04:45:59,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.53 | bwd_microstep: 1580.44 | bwd_inner_microstep: 1580.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3955 [2024-06-11 04:46:01,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.74 | bwd_microstep: 1594.44 | bwd_inner_microstep: 1594.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 04:46:03,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.43 | bwd_microstep: 1245.15 | bwd_inner_microstep: 1245.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-11 04:46:05,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.66 | bwd_microstep: 1543.76 | bwd_inner_microstep: 1543.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3739 [2024-06-11 04:46:07,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.53 | bwd_microstep: 1634.06 | bwd_inner_microstep: 1634.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477 [2024-06-11 04:46:09,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.67 | bwd_microstep: 1185.17 | bwd_inner_microstep: 1185.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-11 04:46:11,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.00 | bwd_microstep: 1530.55 | bwd_inner_microstep: 1530.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3585 [2024-06-11 04:46:13,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1306.61 | bwd_inner_microstep: 1306.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1865 [2024-06-11 04:46:14,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.08 | bwd_microstep: 675.41 | bwd_inner_microstep: 675.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 04:46:15,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.32 | bwd_inner_microstep: 1251.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-11 04:46:17,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.24 | bwd_microstep: 1413.53 | bwd_inner_microstep: 1413.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3956 [2024-06-11 04:46:20,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.97 | bwd_microstep: 1810.78 | bwd_inner_microstep: 1810.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3643 [2024-06-11 04:46:22,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.85 | bwd_microstep: 1539.58 | bwd_inner_microstep: 1539.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2729 [2024-06-11 04:46:24,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.36 | bwd_microstep: 1200.25 | bwd_inner_microstep: 1200.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3819 [2024-06-11 04:46:26,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.47 | bwd_microstep: 1718.96 | bwd_inner_microstep: 1718.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2323 [2024-06-11 04:46:27,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.32 | bwd_microstep: 889.71 | bwd_inner_microstep: 889.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 915 [2024-06-11 04:46:28,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.38 | bwd_microstep: 374.75 | bwd_inner_microstep: 374.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520 [2024-06-11 04:46:30,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.65 | bwd_microstep: 1408.52 | bwd_inner_microstep: 1408.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 04:46:32,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.06 | bwd_microstep: 1491.90 | bwd_inner_microstep: 1491.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-11 04:46:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.78 | bwd_microstep: 1410.08 | bwd_inner_microstep: 1410.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 04:46:36,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1554.18 | bwd_inner_microstep: 1554.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 04:46:38,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.92 | bwd_microstep: 1554.45 | bwd_inner_microstep: 1554.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1997 [2024-06-11 04:46:39,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.07 | bwd_microstep: 737.53 | bwd_inner_microstep: 737.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3536 [2024-06-11 04:46:41,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1493.81 | bwd_inner_microstep: 1493.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 04:46:43,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.64 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2193 [2024-06-11 04:46:44,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.71 | bwd_microstep: 862.15 | bwd_inner_microstep: 862.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3588 [2024-06-11 04:46:46,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.60 | bwd_microstep: 1428.23 | bwd_inner_microstep: 1428.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2691 [2024-06-11 04:46:48,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.68 | bwd_microstep: 1115.32 | bwd_inner_microstep: 1115.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1395 [2024-06-11 04:46:48,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.43 | bwd_microstep: 527.16 | bwd_inner_microstep: 527.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3715 [2024-06-11 04:46:50,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.70 | bwd_microstep: 1397.20 | bwd_inner_microstep: 1397.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3565 [2024-06-11 04:46:52,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.94 | bwd_microstep: 1523.35 | bwd_inner_microstep: 1523.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729 [2024-06-11 04:46:57,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.82 | optimizer_gradients: 4.08 | optimizer_step: 6.57 [2024-06-11 04:46:57,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.07 | bwd_microstep: 4257.70 | bwd_inner_microstep: 1743.03 | bwd_allreduce_microstep: 2514.61 | step_microstep: 37.81 [2024-06-11 04:46:57,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15642.30 | bwd: 44607.60 | bwd_inner: 42092.09 | bwd_allreduce: 2514.84 | step: 39.23 {'loss': 1.1777, 'learning_rate': 4.720510476615348e-07, 'epoch': 0.93} 5/1726 [28:04:26<2:12:31, 65.72s/it] 93%|█████████▎| 1606/1726 [28:05:32<2:11:23, 65.70s/it] 93%|█████████▎| 1606/1726 [28:05:32<2:11:23, 65.70s/it] 93%|█████████▎| 1607/1726 [28:06:34<2:07:47, 64.43s/it] 93%|█████████▎| 1607/1726 [28:06:34<2:07:47, 64.43s/it] 93%|█████████▎| 1608/1726 [28:07:33<2:03:50, 62.97s/it] 93%|█████████▎| 1608/1726 [28:07:33<2:03:50, 62.97s/it] 93%|█████████▎| 1609/1726 [28:08:33<2:01:14, 62.18s/it] 93%|█████████▎| 1609/1726 [28:08:33<2:01:14, 62.18s/it] 93%|█████████▎| 1610/1726 [28:09:34<1:59:16, 61.70s/it] 93%dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 04:46:59,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.84 | bwd_microstep: 1301.74 | bwd_inner_microstep: 1301.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 04:47:01,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.46 | bwd_microstep: 1356.66 | bwd_inner_microstep: 1356.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870 [2024-06-11 04:47:03,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.26 | bwd_microstep: 1466.34 | bwd_inner_microstep: 1466.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3394 [2024-06-11 04:47:05,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.96 | bwd_microstep: 1243.67 | bwd_inner_microstep: 1243.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-11 04:47:07,291] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.70 | bwd_microstep: 1534.45 | bwd_inner_microstep: 1534.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-11 04:47:09,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.15 | bwd_microstep: 1501.15 | bwd_inner_microstep: 1501.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884 [2024-06-11 04:47:10,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.95 | bwd_microstep: 680.54 | bwd_inner_microstep: 680.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3429 [2024-06-11 04:47:11,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.35 | bwd_microstep: 1155.52 | bwd_inner_microstep: 1155.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-11 04:47:13,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.40 | bwd_microstep: 1487.25 | bwd_inner_microstep: 1487.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 04:47:16,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.86 | bwd_microstep: 1558.92 | bwd_inner_microstep: 1558.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3513 [2024-06-11 04:47:18,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.95 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3524 [2024-06-11 04:47:20,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.66 | bwd_microstep: 1538.98 | bwd_inner_microstep: 1538.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-11 04:47:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.81 | bwd_microstep: 1447.46 | bwd_inner_microstep: 1447.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428 [2024-06-11 04:47:24,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.01 | bwd_microstep: 1297.63 | bwd_inner_microstep: 1297.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-11 04:47:25,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.09 | bwd_microstep: 1344.89 | bwd_inner_microstep: 1344.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 04:47:27,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.02 | bwd_microstep: 1287.23 | bwd_inner_microstep: 1287.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3713 [2024-06-11 04:47:29,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.93 | bwd_microstep: 1430.86 | bwd_inner_microstep: 1430.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926 [2024-06-11 04:47:30,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.64 | bwd_microstep: 696.99 | bwd_inner_microstep: 696.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2297 [2024-06-11 04:47:31,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.05 | bwd_microstep: 881.64 | bwd_inner_microstep: 881.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 04:47:33,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1376.33 | bwd_inner_microstep: 1376.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733 [2024-06-11 04:47:35,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1336.94 | bwd_inner_microstep: 1336.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3874 [2024-06-11 04:47:37,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.68 | bwd_microstep: 1583.72 | bwd_inner_microstep: 1583.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-11 04:47:39,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1414.50 | bwd_inner_microstep: 1414.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 04:47:41,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1284.86 | bwd_inner_microstep: 1284.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3834 [2024-06-11 04:47:43,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1556.97 | bwd_inner_microstep: 1556.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 04:47:45,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.57 | bwd_microstep: 1558.00 | bwd_inner_microstep: 1557.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3613 [2024-06-11 04:47:47,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.20 | bwd_microstep: 1246.93 | bwd_inner_microstep: 1246.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-11 04:47:48,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.22 | bwd_microstep: 809.48 | bwd_inner_microstep: 809.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 04:47:50,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1280.59 | bwd_inner_microstep: 1280.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 04:47:52,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.97 | bwd_microstep: 1348.11 | bwd_inner_microstep: 1348.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3939 [2024-06-11 04:47:54,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.10 | bwd_microstep: 1623.26 | bwd_inner_microstep: 1623.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3456 [2024-06-11 04:47:57,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.90 | optimizer_gradients: 4.02 | optimizer_step: 6.57 [2024-06-11 04:47:57,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.68 | bwd_microstep: 2308.14 | bwd_inner_microstep: 1668.10 | bwd_allreduce_microstep: 639.98 | step_microstep: 37.55 [2024-06-11 04:47:57,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15989.07 | bwd: 43445.07 | bwd_inner: 42804.19 | bwd_allreduce: 640.21 | step: 39.07 {'loss': 1.1689, 'learning_rate': 4.6397879820006874e-07, 'epoch': 0.93} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 04:47:59,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.67 | bwd_microstep: 1236.20 | bwd_inner_microstep: 1236.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 04:48:01,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.78 | bwd_microstep: 1284.46 | bwd_inner_microstep: 1284.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 04:48:02,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.72 | bwd_microstep: 1258.20 | bwd_inner_microstep: 1258.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3829 [2024-06-11 04:48:04,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.29 | bwd_microstep: 1555.05 | bwd_inner_microstep: 1555.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4201 [2024-06-11 04:48:07,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 654.65 | bwd_microstep: 1754.31 | bwd_inner_microstep: 1754.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3460 [2024-06-11 04:48:08,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.30 | bwd_microstep: 1180.07 | bwd_inner_microstep: 1180.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3480 [2024-06-11 04:48:10,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.94 | bwd_microstep: 1342.95 | bwd_inner_microstep: 1342.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3744 [2024-06-11 04:48:12,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.49 | bwd_microstep: 1533.67 | bwd_inner_microstep: 1533.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2136 [2024-06-11 04:48:14,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.63 | bwd_microstep: 830.92 | bwd_inner_microstep: 830.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 04:48:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.10 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 04:48:17,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.83 | bwd_microstep: 1393.99 | bwd_inner_microstep: 1393.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493 [2024-06-11 04:48:19,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.16 | bwd_microstep: 1415.40 | bwd_inner_microstep: 1415.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-11 04:48:20,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.60 | bwd_microstep: 788.85 | bwd_inner_microstep: 788.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3494 [2024-06-11 04:48:22,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.28 | bwd_microstep: 1552.11 | bwd_inner_microstep: 1552.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3621 [2024-06-11 04:48:25,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.09 | bwd_microstep: 1598.68 | bwd_inner_microstep: 1598.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-11 04:48:26,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.87 | bwd_microstep: 1288.52 | bwd_inner_microstep: 1288.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 04:48:28,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1386.59 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-11 04:48:31,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.56 | bwd_microstep: 1609.15 | bwd_inner_microstep: 1609.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3518 [2024-06-11 04:48:32,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.84 | bwd_microstep: 1320.40 | bwd_inner_microstep: 1320.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3830 [2024-06-11 04:48:35,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.45 | bwd_microstep: 1756.07 | bwd_inner_microstep: 1756.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3562 [2024-06-11 04:48:37,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.30 | bwd_microstep: 1332.43 | bwd_inner_microstep: 1332.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3676 [2024-06-11 04:48:39,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.60 | bwd_microstep: 1478.10 | bwd_inner_microstep: 1478.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2013 [2024-06-11 04:48:40,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.35 | bwd_microstep: 867.38 | bwd_inner_microstep: 867.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 04:48:42,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.52 | bwd_microstep: 1602.38 | bwd_inner_microstep: 1602.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-11 04:48:44,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.02 | bwd_microstep: 1583.08 | bwd_inner_microstep: 1583.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3384 [2024-06-11 04:48:46,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1435.05 | bwd_inner_microstep: 1435.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3585 [2024-06-11 04:48:48,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.47 | bwd_microstep: 1530.92 | bwd_inner_microstep: 1530.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-11 04:48:50,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.49 | bwd_microstep: 1275.83 | bwd_inner_microstep: 1275.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 04:48:52,406] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.14 | bwd_microstep: 1289.88 | bwd_inner_microstep: 1289.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2191 [2024-06-11 04:48:53,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.17 | bwd_microstep: 795.32 | bwd_inner_microstep: 795.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3493 [2024-06-11 04:48:55,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.23 | bwd_microstep: 1189.67 | bwd_inner_microstep: 1189.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-11 04:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.81 | optimizer_gradients: 4.07 | optimizer_step: 6.59 [2024-06-11 04:48:59,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.47 | bwd_microstep: 3455.40 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 2022.02 | step_microstep: 37.73 [2024-06-11 04:48:59,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16103.50 | bwd: 45204.73 | bwd_inner: 43181.80 | bwd_allreduce: 2022.25 | step: 39.29 {'loss': 1.2202, 'learning_rate': 4.559753545037171e-07, 'epoch': 0.93} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 04:49:01,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1336.95 | bwd_inner_microstep: 1336.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3933 [2024-06-11 04:49:03,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.14 | bwd_microstep: 1592.63 | bwd_inner_microstep: 1592.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2317 [2024-06-11 04:49:04,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 883.17 | bwd_inner_microstep: 883.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-11 04:49:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.74 | bwd_microstep: 1247.15 | bwd_inner_microstep: 1247.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3699 [2024-06-11 04:49:08,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1524.35 | bwd_inner_microstep: 1524.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 04:49:10,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.82 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4125 [2024-06-11 04:49:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.95 | bwd_microstep: 1637.43 | bwd_inner_microstep: 1637.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 04:49:14,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.51 | bwd_microstep: 1381.22 | bwd_inner_microstep: 1381.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 04:49:16,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1254.28 | bwd_inner_microstep: 1254.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1899 [2024-06-11 04:49:17,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.80 | bwd_microstep: 715.09 | bwd_inner_microstep: 715.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1922 [2024-06-11 04:49:18,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.65 | bwd_microstep: 741.05 | bwd_inner_microstep: 741.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3676 [2024-06-11 04:49:20,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.52 | bwd_microstep: 1515.67 | bwd_inner_microstep: 1515.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3636 [2024-06-11 04:49:22,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1410.63 | bwd_inner_microstep: 1410.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 04:49:24,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.27 | bwd_microstep: 1482.44 | bwd_inner_microstep: 1482.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3669 [2024-06-11 04:49:26,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.26 | bwd_microstep: 1513.26 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4034 [2024-06-11 04:49:28,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.38 | bwd_microstep: 1709.60 | bwd_inner_microstep: 1709.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3851 [2024-06-11 04:49:30,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.46 | bwd_microstep: 1651.61 | bwd_inner_microstep: 1651.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-11 04:49:32,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1348.59 | bwd_inner_microstep: 1348.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2015 [2024-06-11 04:49:33,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.71 | bwd_microstep: 897.90 | bwd_inner_microstep: 897.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2034 [2024-06-11 04:49:35,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.21 | bwd_microstep: 811.32 | bwd_inner_microstep: 811.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3817 [2024-06-11 04:49:37,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.14 | bwd_microstep: 1609.94 | bwd_inner_microstep: 1609.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3758 [2024-06-11 04:49:39,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.12 | bwd_microstep: 1449.31 | bwd_inner_microstep: 1449.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3845 [2024-06-11 04:49:41,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.01 | bwd_microstep: 1494.90 | bwd_inner_microstep: 1494.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006 [2024-06-11 04:49:42,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.09 | bwd_microstep: 830.53 | bwd_inner_microstep: 830.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2278 [2024-06-11 04:49:43,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.88 | bwd_microstep: 882.07 | bwd_inner_microstep: 882.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-11 04:49:45,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1451.98 | bwd_inner_microstep: 1451.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3581 [2024-06-11 04:49:47,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.11 | bwd_microstep: 1398.39 | bwd_inner_microstep: 1398.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2438 [2024-06-11 04:49:48,905] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.34 | bwd_microstep: 851.70 | bwd_inner_microstep: 851.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 04:49:50,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.96 | bwd_microstep: 1403.72 | bwd_inner_microstep: 1403.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2511 [2024-06-11 04:49:52,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.80 | bwd_microstep: 959.37 | bwd_inner_microstep: 959.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 04:49:54,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.69 | bwd_microstep: 1459.37 | bwd_inner_microstep: 1459.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568 [2024-06-11 04:50:00,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 04:50:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.45 | bwd_microstep: 5613.62 | bwd_inner_microstep: 1457.93 | bwd_allreduce_microstep: 4155.63 | step_microstep: 38.28 [2024-06-11 04:50:00,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.62 | bwd: 45447.58 | bwd_inner: 41291.03 | bwd_allreduce: 4155.87 | step: 39.71 {'loss': 1.1107, 'learning_rate': 4.480407447605673e-07, 'epoch': 0.93} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3533 [2024-06-11 04:50:02,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.94 | bwd_microstep: 1479.16 | bwd_inner_microstep: 1479.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 04:50:04,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.59 | bwd_microstep: 1276.14 | bwd_inner_microstep: 1276.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3479 [2024-06-11 04:50:05,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.49 | bwd_microstep: 1239.57 | bwd_inner_microstep: 1239.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2033 [2024-06-11 04:50:07,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.47 | bwd_microstep: 808.13 | bwd_inner_microstep: 808.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2231 [2024-06-11 04:50:08,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.66 | bwd_microstep: 862.33 | bwd_inner_microstep: 862.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 04:50:10,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1380.54 | bwd_inner_microstep: 1380.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-11 04:50:11,711] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.25 | bwd_microstep: 1148.55 | bwd_inner_microstep: 1148.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 04:50:13,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.52 | bwd_microstep: 1485.04 | bwd_inner_microstep: 1485.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 04:50:15,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.02 | bwd_microstep: 1252.90 | bwd_inner_microstep: 1252.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-11 04:50:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.61 | bwd_microstep: 790.27 | bwd_inner_microstep: 790.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3505 [2024-06-11 04:50:18,713] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.37 | bwd_microstep: 1549.67 | bwd_inner_microstep: 1549.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2079 [2024-06-11 04:50:19,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.93 | bwd_microstep: 915.37 | bwd_inner_microstep: 915.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-11 04:50:21,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.81 | bwd_microstep: 794.70 | bwd_inner_microstep: 794.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1980 [2024-06-11 04:50:22,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.86 | bwd_microstep: 895.66 | bwd_inner_microstep: 895.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 04:50:24,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.44 | bwd_microstep: 1393.68 | bwd_inner_microstep: 1393.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3509 [2024-06-11 04:50:26,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.80 | bwd_microstep: 1483.89 | bwd_inner_microstep: 1483.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3626 [2024-06-11 04:50:28,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.65 | bwd_microstep: 1772.66 | bwd_inner_microstep: 1772.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3510 [2024-06-11 04:50:30,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.01 | bwd_microstep: 1578.24 | bwd_inner_microstep: 1578.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2112 [2024-06-11 04:50:32,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.42 | bwd_microstep: 857.47 | bwd_inner_microstep: 857.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292 [2024-06-11 04:50:33,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.38 | bwd_microstep: 878.62 | bwd_inner_microstep: 878.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-11 04:50:35,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.04 | bwd_microstep: 1612.99 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-11 04:50:37,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.25 | bwd_microstep: 1297.87 | bwd_inner_microstep: 1297.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2513 [2024-06-11 04:50:38,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 960.98 | bwd_inner_microstep: 960.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3456 [2024-06-11 04:50:40,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.15 | bwd_microstep: 1160.03 | bwd_inner_microstep: 1160.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3613 [2024-06-11 04:50:42,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.36 | bwd_microstep: 1342.91 | bwd_inner_microstep: 1342.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3771 [2024-06-11 04:50:44,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.50 | bwd_microstep: 1578.38 | bwd_inner_microstep: 1578.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3380 [2024-06-11 04:50:46,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.46 | bwd_microstep: 1273.75 | bwd_inner_microstep: 1273.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 04:50:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.07 | bwd_microstep: 1454.46 | bwd_inner_microstep: 1454.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 04:50:49,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.76 | bwd_microstep: 1411.31 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 04:50:51,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.05 | bwd_microstep: 1286.54 | bwd_inner_microstep: 1286.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3585 [2024-06-11 04:50:53,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.47 | bwd_microstep: 1402.44 | bwd_inner_microstep: 1402.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-11 04:51:02,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.09 | optimizer_step: 6.59 [2024-06-11 04:51:02,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.06 | bwd_microstep: 8077.82 | bwd_inner_microstep: 1459.61 | bwd_allreduce_microstep: 6618.16 | step_microstep: 39.05 [2024-06-11 04:51:02,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14948.97 | bwd: 46702.12 | bwd_inner: 40083.06 | bwd_allreduce: 6618.39 | step: 40.58 {'loss': 1.1265, 'learning_rate': 4.4017499691627384e-07, 'epoch': 0.94} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3474 [2024-06-11 04:51:04,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.83 | bwd_microstep: 1569.50 | bwd_inner_microstep: 1569.42 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-11 04:51:06,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.54 | bwd_microstep: 1473.12 | bwd_inner_microstep: 1473.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3487 [2024-06-11 04:51:08,266] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.65 | bwd_microstep: 1242.08 | bwd_inner_microstep: 1242.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-11 04:51:10,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.65 | bwd_microstep: 1272.84 | bwd_inner_microstep: 1272.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2058 [2024-06-11 04:51:11,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.92 | bwd_microstep: 816.16 | bwd_inner_microstep: 816.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 04:51:12,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.01 | bwd_microstep: 1283.14 | bwd_inner_microstep: 1283.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-11 04:51:15,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.50 | bwd_microstep: 1546.33 | bwd_inner_microstep: 1546.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3407 [2024-06-11 04:51:16,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.92 | bwd_microstep: 1151.24 | bwd_inner_microstep: 1151.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3510 [2024-06-11 04:51:18,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1320.03 | bwd_inner_microstep: 1320.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 04:51:19,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.73 | bwd_microstep: 792.32 | bwd_inner_microstep: 792.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 04:51:21,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.35 | bwd_microstep: 1387.99 | bwd_inner_microstep: 1387.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3447 [2024-06-11 04:51:23,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.06 | bwd_microstep: 1255.04 | bwd_inner_microstep: 1255.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 04:51:25,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.73 | bwd_microstep: 1393.67 | bwd_inner_microstep: 1393.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3674 [2024-06-11 04:51:27,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.64 | bwd_microstep: 1387.18 | bwd_inner_microstep: 1387.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 04:51:28,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1343.72 | bwd_inner_microstep: 1343.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3526 [2024-06-11 04:51:31,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.51 | bwd_microstep: 1546.79 | bwd_inner_microstep: 1546.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3501 [2024-06-11 04:51:33,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.88 | bwd_microstep: 1510.43 | bwd_inner_microstep: 1510.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 04:51:35,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.22 | bwd_microstep: 1340.39 | bwd_inner_microstep: 1340.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2105 [2024-06-11 04:51:36,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.49 | bwd_microstep: 918.38 | bwd_inner_microstep: 918.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653 [2024-06-11 04:51:38,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.45 | bwd_microstep: 1620.46 | bwd_inner_microstep: 1620.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3647 [2024-06-11 04:51:40,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.63 | bwd_microstep: 1345.90 | bwd_inner_microstep: 1345.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-11 04:51:42,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1394.10 | bwd_inner_microstep: 1394.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-11 04:51:44,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.14 | bwd_microstep: 1607.28 | bwd_inner_microstep: 1607.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2001 [2024-06-11 04:51:45,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.48 | bwd_microstep: 711.98 | bwd_inner_microstep: 711.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 04:51:47,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1510.54 | bwd_inner_microstep: 1510.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3815 [2024-06-11 04:51:49,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1654.29 | bwd_inner_microstep: 1654.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 04:51:51,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1550.88 | bwd_inner_microstep: 1550.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 04:51:54,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.21 | bwd_microstep: 1551.25 | bwd_inner_microstep: 1551.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-11 04:51:56,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.61 | bwd_microstep: 1469.83 | bwd_inner_microstep: 1469.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2042 [2024-06-11 04:51:57,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.04 | bwd_microstep: 907.67 | bwd_inner_microstep: 907.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-11 04:51:59,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.45 | bwd_microstep: 1510.09 | bwd_inner_microstep: 1510.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3476 [2024-06-11 04:52:04,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 04:52:04,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.16 | bwd_microstep: 4463.15 | bwd_inner_microstep: 1624.78 | bwd_allreduce_microstep: 2838.31 | step_microstep: 38.88 [2024-06-11 04:52:04,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16005.28 | bwd: 45847.81 | bwd_inner: 43008.52 | bwd_allreduce: 2838.58 | step: 40.48 |█████████▎| 1610/1726 [28:09:34<1:59:16, 61.70s/it] 93%|█████████▎| 1611/1726 [28:10:34<1:57:08, 61.12s/it] 93%|█████████▎| 1611/1726 [28:10:34<1:57:08, 61.12s/it] 93%|█████████▎| 1612/1726 [28:11:35<1:56:25, 61.28s/it] 93%|█████████▎| 1612/1726 [28:11:35<1:56:25, 61.28s/it] 93%|█████████▎| 1613/1726 [28:12:37<1:55:21, 61.25s/it] 93%|█████████▎| 1613/1726 [28:12:37<1:55:21, 61.25s/it] 94%|█████████▎| 1614/1726 [28:13:39<1:54:44, 61.47s/it] 94%|█████████▎| 1614/1726 [28:13:39<1:54:44, 61.47s/it] 94%|█████████▎| 1615/1726 [28:14:41<1:54:07, 61.69s/it] {'loss': 1.2352, 'learning_rate': 4.3237813867396117e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-11 04:52:06,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.72 | bwd_microstep: 1468.32 | bwd_inner_microstep: 1468.18 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.23 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 04:52:08,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.10 | bwd_microstep: 1389.03 | bwd_inner_microstep: 1389.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3454 [2024-06-11 04:52:10,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.40 | bwd_microstep: 1216.35 | bwd_inner_microstep: 1216.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 04:52:12,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.30 | bwd_microstep: 1558.39 | bwd_inner_microstep: 1558.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 04:52:14,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1550.63 | bwd_inner_microstep: 1550.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3418 [2024-06-11 04:52:16,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.11 | bwd_microstep: 1157.62 | bwd_inner_microstep: 1157.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:52:17,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.89 | bwd_microstep: 1387.70 | bwd_inner_microstep: 1387.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 04:52:19,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1248.35 | bwd_inner_microstep: 1248.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3407 [2024-06-11 04:52:21,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.47 | bwd_microstep: 1281.82 | bwd_inner_microstep: 1281.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3406 [2024-06-11 04:52:23,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.66 | bwd_microstep: 1295.83 | bwd_inner_microstep: 1295.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 04:52:25,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1487.83 | bwd_inner_microstep: 1487.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1927 [2024-06-11 04:52:26,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.79 | bwd_microstep: 882.53 | bwd_inner_microstep: 882.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 04:52:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.52 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3421 [2024-06-11 04:52:30,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.62 | bwd_microstep: 1445.60 | bwd_inner_microstep: 1445.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 04:52:32,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.78 | bwd_microstep: 1481.05 | bwd_inner_microstep: 1481.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2098 [2024-06-11 04:52:33,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.70 | bwd_microstep: 920.13 | bwd_inner_microstep: 920.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424 [2024-06-11 04:52:35,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.81 | bwd_microstep: 1161.76 | bwd_inner_microstep: 1161.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3634 [2024-06-11 04:52:37,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.55 | bwd_microstep: 1379.72 | bwd_inner_microstep: 1379.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 04:52:39,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.32 | bwd_microstep: 1280.79 | bwd_inner_microstep: 1280.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-11 04:52:41,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.01 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1968 [2024-06-11 04:52:42,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.54 | bwd_microstep: 733.72 | bwd_inner_microstep: 733.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-11 04:52:44,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.26 | bwd_microstep: 1192.79 | bwd_inner_microstep: 1192.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3545 [2024-06-11 04:52:45,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.90 | bwd_microstep: 1328.55 | bwd_inner_microstep: 1328.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 04:52:48,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.09 | bwd_microstep: 1509.40 | bwd_inner_microstep: 1509.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3565 [2024-06-11 04:52:49,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.07 | bwd_microstep: 1205.85 | bwd_inner_microstep: 1205.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 04:52:51,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3740 [2024-06-11 04:52:53,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.85 | bwd_microstep: 1444.97 | bwd_inner_microstep: 1444.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3727 [2024-06-11 04:52:55,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.80 | bwd_microstep: 1536.26 | bwd_inner_microstep: 1536.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2056 [2024-06-11 04:52:57,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.80 | bwd_microstep: 914.77 | bwd_inner_microstep: 914.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2284 [2024-06-11 04:52:58,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.82 | bwd_microstep: 854.09 | bwd_inner_microstep: 854.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-11 04:52:59,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.52 | bwd_microstep: 912.20 | bwd_inner_microstep: 912.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2979 [2024-06-11 04:53:20,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.32 | optimizer_step: 6.59 [2024-06-11 04:53:20,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.84 | bwd_microstep: 19920.01 | bwd_inner_microstep: 1379.47 | bwd_allreduce_microstep: 18540.46 | step_microstep: 40.44 [2024-06-11 04:53:20,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15424.75 | bwd: 59801.99 | bwd_inner: 41260.46 | bwd_allreduce: 18540.76 | step: 42.12 {'loss': 1.1617, 'learning_rate': 4.2465019749411864e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 04:53:22,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.37 | bwd_microstep: 1461.68 | bwd_inner_microstep: 1461.62 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-11 04:53:23,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.47 | bwd_microstep: 793.01 | bwd_inner_microstep: 792.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2383 [2024-06-11 04:53:24,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.99 | bwd_microstep: 996.45 | bwd_inner_microstep: 996.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-11 04:53:26,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.04 | bwd_microstep: 1338.36 | bwd_inner_microstep: 1338.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 04:53:28,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.44 | bwd_microstep: 1274.24 | bwd_inner_microstep: 1274.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3793 [2024-06-11 04:53:30,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.91 | bwd_microstep: 1541.20 | bwd_inner_microstep: 1541.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3863 [2024-06-11 04:53:32,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.09 | bwd_microstep: 1651.94 | bwd_inner_microstep: 1651.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 04:53:34,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1478.95 | bwd_inner_microstep: 1478.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4081 [2024-06-11 04:53:36,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.98 | bwd_microstep: 1529.91 | bwd_inner_microstep: 1529.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708 [2024-06-11 04:54:25,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.17 | bwd_microstep: 1607.47 | bwd_inner_microstep: 1607.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3693 [2024-06-11 04:54:27,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1511.24 | bwd_inner_microstep: 1511.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-11 04:54:28,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.03 | bwd_microstep: 795.94 | bwd_inner_microstep: 795.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 04:54:30,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.42 | bwd_microstep: 1378.99 | bwd_inner_microstep: 1378.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3422 [2024-06-11 04:54:32,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.10 | bwd_microstep: 1270.79 | bwd_inner_microstep: 1270.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:54:34,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1364.66 | bwd_inner_microstep: 1364.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 04:54:36,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.05 | bwd_microstep: 1394.89 | bwd_inner_microstep: 1394.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3444 [2024-06-11 04:54:38,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.67 | bwd_microstep: 1443.71 | bwd_inner_microstep: 1443.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3530 [2024-06-11 04:54:40,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.17 | bwd_microstep: 1648.95 | bwd_inner_microstep: 1648.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3729 [2024-06-11 04:54:42,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.52 | bwd_microstep: 1330.98 | bwd_inner_microstep: 1330.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2050 [2024-06-11 04:54:43,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 306.87 | bwd_microstep: 812.74 | bwd_inner_microstep: 812.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3823 [2024-06-11 04:54:45,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1449.10 | bwd_inner_microstep: 1449.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3534 [2024-06-11 04:54:47,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1321.63 | bwd_inner_microstep: 1321.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 04:54:49,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.11 | bwd_microstep: 1395.05 | bwd_inner_microstep: 1395.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-11 04:54:50,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.99 | bwd_microstep: 1184.34 | bwd_inner_microstep: 1184.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047 [2024-06-11 04:54:51,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.74 | bwd_microstep: 715.41 | bwd_inner_microstep: 715.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2277 [2024-06-11 04:54:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.71 | bwd_microstep: 876.09 | bwd_inner_microstep: 876.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3559 [2024-06-11 04:54:54,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.75 | bwd_microstep: 1263.60 | bwd_inner_microstep: 1263.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 04:54:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1554.96 | bwd_inner_microstep: 1554.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-11 04:54:59,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.22 | bwd_microstep: 1754.46 | bwd_inner_microstep: 1754.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765 [2024-06-11 04:55:01,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.97 | bwd_microstep: 1636.39 | bwd_inner_microstep: 1636.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3665 [2024-06-11 04:55:03,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.18 | bwd_microstep: 1522.38 | bwd_inner_microstep: 1522.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3587 [2024-06-11 04:55:05,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.05 | optimizer_gradients: 4.14 | optimizer_step: 6.68 [2024-06-11 04:55:05,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.88 | bwd_microstep: 1605.82 | bwd_inner_microstep: 1597.74 | bwd_allreduce_microstep: 8.03 | step_microstep: 40.14 [2024-06-11 04:55:05,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.77 | bwd: 42905.36 | bwd_inner: 42896.37 | bwd_allreduce: 8.29 | step: 41.76 {'loss': 1.1581, 'learning_rate': 4.1699120059452093e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406 [2024-06-11 04:55:07,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.55 | bwd_microstep: 1439.22 | bwd_inner_microstep: 1439.12 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3993 [2024-06-11 04:55:09,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.26 | bwd_microstep: 1507.40 | bwd_inner_microstep: 1507.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 04:55:11,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.03 | bwd_microstep: 1344.74 | bwd_inner_microstep: 1344.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-11 04:55:13,113] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.19 | bwd_microstep: 974.16 | bwd_inner_microstep: 974.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 04:55:14,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1252.94 | bwd_inner_microstep: 1252.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 04:55:16,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1383.78 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-11 04:55:17,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.17 | bwd_microstep: 788.17 | bwd_inner_microstep: 788.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 04:55:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.98 | bwd_microstep: 1280.72 | bwd_inner_microstep: 1280.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 04:55:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.08 | bwd_microstep: 1279.54 | bwd_inner_microstep: 1279.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 04:55:23,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.28 | bwd_microstep: 1287.03 | bwd_inner_microstep: 1287.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077 [2024-06-11 04:55:24,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.84 | bwd_microstep: 820.71 | bwd_inner_microstep: 820.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 04:55:26,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1383.83 | bwd_inner_microstep: 1383.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3713 [2024-06-11 04:55:28,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.94 | bwd_microstep: 1633.42 | bwd_inner_microstep: 1633.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2141 [2024-06-11 04:55:29,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.71 | bwd_microstep: 962.88 | bwd_inner_microstep: 962.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2116 [2024-06-11 04:55:31,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.90 | bwd_microstep: 957.48 | bwd_inner_microstep: 957.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 04:55:33,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1342.45 | bwd_inner_microstep: 1342.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-11 04:55:34,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.63 | bwd_microstep: 915.76 | bwd_inner_microstep: 915.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2026 [2024-06-11 04:55:35,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.90 | bwd_microstep: 902.35 | bwd_inner_microstep: 902.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3395 [2024-06-11 04:55:37,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1342.46 | bwd_inner_microstep: 1342.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3422 [2024-06-11 04:55:39,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.35 | bwd_microstep: 1294.13 | bwd_inner_microstep: 1294.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3666 [2024-06-11 04:55:41,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.72 | bwd_microstep: 1721.46 | bwd_inner_microstep: 1721.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 04:55:43,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.50 | bwd_microstep: 1533.14 | bwd_inner_microstep: 1532.76 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.31 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3542 [2024-06-11 04:55:45,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.35 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2178 [2024-06-11 04:55:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.36 | bwd_microstep: 1055.18 | bwd_inner_microstep: 1055.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3738 [2024-06-11 04:55:49,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.58 | bwd_microstep: 1539.88 | bwd_inner_microstep: 1539.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3593 [2024-06-11 04:55:51,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.88 | bwd_microstep: 1671.20 | bwd_inner_microstep: 1671.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-11 04:55:53,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.17 | bwd_microstep: 1453.02 | bwd_inner_microstep: 1452.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 04:55:55,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.32 | bwd_microstep: 1503.11 | bwd_inner_microstep: 1503.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 04:55:57,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1413.82 | bwd_inner_microstep: 1413.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 04:55:59,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.49 | bwd_microstep: 1359.89 | bwd_inner_microstep: 1359.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-11 04:56:01,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.02 | bwd_microstep: 1412.77 | bwd_inner_microstep: 1412.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 04:56:28,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.24 | optimizer_step: 6.60 [2024-06-11 04:56:28,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 26974.98 | bwd_inner_microstep: 1529.86 | bwd_allreduce_microstep: 25445.04 | step_microstep: 39.44 [2024-06-11 04:56:28,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15522.76 | bwd: 67133.32 | bwd_inner: 41686.88 | bwd_allreduce: 25445.61 | step: 41.74 {'loss': 1.1922, 'learning_rate': 4.094011749501103e-07, 'epoch': 0.94} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1942 [2024-06-11 04:56:29,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.46 | bwd_microstep: 781.18 | bwd_inner_microstep: 781.04 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 720 [2024-06-11 04:56:30,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 116.23 | bwd_microstep: 292.62 | bwd_inner_microstep: 292.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3483 [2024-06-11 04:56:32,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.80 | bwd_microstep: 1473.18 | bwd_inner_microstep: 1473.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 04:56:34,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.11 | bwd_microstep: 1541.96 | bwd_inner_microstep: 1541.80 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.27 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 04:56:36,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.55 | bwd_microstep: 1280.20 | bwd_inner_microstep: 1280.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 04:56:38,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.09 | bwd_microstep: 1244.80 | bwd_inner_microstep: 1244.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2211 [2024-06-11 04:56:39,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.74 | bwd_microstep: 952.45 | bwd_inner_microstep: 952.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-11 04:56:41,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.00 | bwd_microstep: 1406.29 | bwd_inner_microstep: 1406.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-11 04:57:11,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.77 | bwd_microstep: 1510.32 | bwd_inner_microstep: 1510.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3694 [2024-06-11 04:57:13,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.67 | bwd_microstep: 1674.62 | bwd_inner_microstep: 1674.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 04:57:15,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.51 | bwd_microstep: 1252.13 | bwd_inner_microstep: 1252.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 04:57:17,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.34 | bwd_microstep: 1343.87 | bwd_inner_microstep: 1343.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-11 04:57:19,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.21 | bwd_microstep: 1376.39 | bwd_inner_microstep: 1376.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 04:57:21,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.92 | bwd_microstep: 1349.94 | bwd_inner_microstep: 1349.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3457 [2024-06-11 04:57:23,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.50 | bwd_microstep: 1299.86 | bwd_inner_microstep: 1299.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2032 [2024-06-11 04:57:24,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.11 | bwd_microstep: 895.65 | bwd_inner_microstep: 895.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3661 [2024-06-11 04:57:26,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.19 | bwd_microstep: 1604.36 | bwd_inner_microstep: 1604.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634 [2024-06-11 04:57:28,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.13 | bwd_microstep: 1503.22 | bwd_inner_microstep: 1503.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995 [2024-06-11 04:57:29,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.09 | bwd_microstep: 891.62 | bwd_inner_microstep: 891.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2158 [2024-06-11 04:57:31,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.57 | bwd_microstep: 851.24 | bwd_inner_microstep: 851.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3627 [2024-06-11 04:57:33,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.73 | bwd_microstep: 1438.26 | bwd_inner_microstep: 1438.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3543 [2024-06-11 04:57:34,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.42 | bwd_microstep: 1321.90 | bwd_inner_microstep: 1321.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1939 [2024-06-11 04:57:35,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.41 | bwd_microstep: 697.91 | bwd_inner_microstep: 697.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273 [2024-06-11 04:57:37,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.04 | bwd_microstep: 905.39 | bwd_inner_microstep: 905.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-11 04:57:39,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.83 | bwd_microstep: 1649.93 | bwd_inner_microstep: 1649.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-11 04:57:40,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.49 | bwd_microstep: 800.31 | bwd_inner_microstep: 800.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 04:57:42,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.37 | bwd_microstep: 1275.98 | bwd_inner_microstep: 1275.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3774 [2024-06-11 04:57:44,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.96 | bwd_microstep: 1345.09 | bwd_inner_microstep: 1345.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3639 [2024-06-11 04:57:46,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.94 | bwd_microstep: 1568.51 | bwd_inner_microstep: 1568.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1857 [2024-06-11 04:57:47,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.14 | bwd_microstep: 770.96 | bwd_inner_microstep: 770.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804 [2024-06-11 04:57:49,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.92 | bwd_microstep: 1751.01 | bwd_inner_microstep: 1750.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3814 [2024-06-11 04:58:17,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 04:58:17,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.59 | bwd_microstep: 26921.40 | bwd_inner_microstep: 1917.95 | bwd_allreduce_microstep: 25003.39 | step_microstep: 39.38 [2024-06-11 04:58:17,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14876.47 | bwd: 64972.62 | bwd_inner: 39968.06 | bwd_allreduce: 25003.80 | step: 41.44 {'loss': 1.1368, 'learning_rate': 4.0188014729292125e-07, 'epoch': 0.94} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 04:58:19,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1379.17 | bwd_inner_microstep: 1379.04 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 04:58:21,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.17 | bwd_microstep: 1476.67 | bwd_inner_microstep: 1476.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3822 [2024-06-11 04:58:23,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1505.30 | bwd_inner_microstep: 1505.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1934 [2024-06-11 04:58:24,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.21 | bwd_microstep: 785.55 | bwd_inner_microstep: 785.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4209 [2024-06-11 04:58:26,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 651.00 | bwd_microstep: 1746.50 | bwd_inner_microstep: 1746.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3476 [2024-06-11 04:58:28,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.46 | bwd_microstep: 1305.42 | bwd_inner_microstep: 1305.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 04:58:30,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.54 | bwd_microstep: 1185.61 | bwd_inner_microstep: 1185.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1956 [2024-06-11 04:58:31,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.07 | bwd_microstep: 789.54 | bwd_inner_microstep: 789.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 04:58:33,182] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.96 | bwd_microstep: 1281.31 | bwd_inner_microstep: 1281.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 04:58:35,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.37 | bwd_microstep: 1392.95 | bwd_inner_microstep: 1392.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3436 [2024-06-11 04:58:37,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.88 | bwd_microstep: 1375.36 | bwd_inner_microstep: 1375.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3515 [2024-06-11 04:58:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.17 | bwd_microstep: 1220.32 | bwd_inner_microstep: 1220.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3641 [2024-06-11 04:58:40,956] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.44 | bwd_microstep: 1644.47 | bwd_inner_microstep: 1644.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2456 [2024-06-11 04:58:42,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.14 | bwd_microstep: 948.33 | bwd_inner_microstep: 948.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3466 [2024-06-11 04:58:44,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.49 | bwd_microstep: 1436.13 | bwd_inner_microstep: 1436.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3486 [2024-06-11 04:58:46,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1440.69 | bwd_inner_microstep: 1440.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3640 [2024-06-11 04:58:48,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.83 | bwd_microstep: 1570.64 | bwd_inner_microstep: 1570.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3425 [2024-06-11 04:58:50,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.03 | bwd_microstep: 1407.38 | bwd_inner_microstep: 1407.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 04:58:52,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.83 | bwd_microstep: 1381.08 | bwd_inner_microstep: 1381.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3522 [2024-06-11 04:58:54,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.25 | bwd_microstep: 1421.70 | bwd_inner_microstep: 1421.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025 [2024-06-11 04:58:55,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 903.64 | bwd_inner_microstep: 903.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-11 04:58:57,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.83 | bwd_microstep: 1656.75 | bwd_inner_microstep: 1656.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3667 [2024-06-11 04:58:59,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.78 | bwd_microstep: 1585.63 | bwd_inner_microstep: 1585.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 04:59:01,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.88 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 04:59:03,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.37 | bwd_microstep: 1256.03 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3673 [2024-06-11 04:59:05,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.77 | bwd_microstep: 1591.33 | bwd_inner_microstep: 1591.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3570 [2024-06-11 04:59:07,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.91 | bwd_microstep: 1430.35 | bwd_inner_microstep: 1430.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2091 [2024-06-11 04:59:08,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.63 | bwd_microstep: 883.41 | bwd_inner_microstep: 883.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 04:59:10,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.50 | bwd_microstep: 1390.37 | bwd_inner_microstep: 1390.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616 [2024-06-11 04:59:12,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.85 | bwd_microstep: 1313.12 | bwd_inner_microstep: 1313.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3432 [2024-06-11 04:59:14,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.37 | bwd_microstep: 1153.97 | bwd_inner_microstep: 1153.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3804 [2024-06-11 04:59:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.14 | optimizer_step: 6.62 [2024-06-11 04:59:17,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.12 | bwd_microstep: 3184.46 | bwd_inner_microstep: 1634.08 | bwd_allreduce_microstep: 1550.32 | step_microstep: 38.72 [2024-06-11 04:59:17,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15965.17 | bwd: 44328.21 | bwd_inner: 42776.88 | bwd_allreduce: 1550.60 | step: 40.30 94%|█████████▎| 1615/1726 [28:14:41<1:54:07, 61.69s/it] 94%|█████████▎| 1616/1726 [28:15:56<2:00:43, 65.85s/it] 94%|█████████▎| 1616/1726 [28:15:56<2:00:43, 65.85s/it] 94%|█████████▎| 1617/1726 [28:17:42<2:21:22, 77.82s/it] 94%|█████████▎| 1617/1726 [28:17:42<2:21:22, 77.82s/it] 94%|█████████▎| 1618/1726 [28:19:05<2:22:53, 79.38s/it] 94%|█████████▎| 1618/1726 [28:19:05<2:22:53, 79.38s/it] 94%|█████████▍| 1619/1726 [28:20:54<2:37:06, 88.10s/it] 94%|█████████▍| 1619/1726 [28:20:54<2:37:06, 88.10s/it] 94%|█████████▍| 1620/1726 [28:21:54<2:21:05, 7{'loss': 1.1664, 'learning_rate': 3.9442814411197125e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-11 04:59:19,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.67 | bwd_microstep: 1470.78 | bwd_inner_microstep: 1470.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 04:59:21,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.17 | bwd_microstep: 1378.34 | bwd_inner_microstep: 1378.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2039 [2024-06-11 04:59:22,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.35 | bwd_microstep: 718.63 | bwd_inner_microstep: 718.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3780 [2024-06-11 04:59:25,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.66 | bwd_microstep: 1545.59 | bwd_inner_microstep: 1545.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3482 [2024-06-11 04:59:26,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.00 | bwd_microstep: 1245.34 | bwd_inner_microstep: 1245.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3401 [2024-06-11 04:59:28,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.26 | bwd_microstep: 1150.45 | bwd_inner_microstep: 1150.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-11 04:59:30,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.71 | bwd_microstep: 1187.75 | bwd_inner_microstep: 1187.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2992 [2024-06-11 04:59:31,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.69 | bwd_microstep: 1101.62 | bwd_inner_microstep: 1101.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3457 [2024-06-11 04:59:33,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.11 | bwd_microstep: 1278.54 | bwd_inner_microstep: 1278.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-11 04:59:34,408] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.17 | bwd_microstep: 792.59 | bwd_inner_microstep: 792.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1912 [2024-06-11 04:59:35,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.22 | bwd_microstep: 687.14 | bwd_inner_microstep: 687.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 04:59:37,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.52 | bwd_microstep: 1379.64 | bwd_inner_microstep: 1379.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2007 [2024-06-11 04:59:38,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.30 | bwd_microstep: 899.10 | bwd_inner_microstep: 899.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3664 [2024-06-11 04:59:40,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.33 | bwd_microstep: 1717.63 | bwd_inner_microstep: 1717.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 04:59:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 04:59:44,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.95 | bwd_microstep: 1393.79 | bwd_inner_microstep: 1393.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3507 [2024-06-11 04:59:46,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.40 | bwd_microstep: 1288.76 | bwd_inner_microstep: 1288.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-11 04:59:48,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.08 | bwd_microstep: 1250.52 | bwd_inner_microstep: 1250.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3531 [2024-06-11 04:59:49,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.91 | bwd_microstep: 1328.11 | bwd_inner_microstep: 1328.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2122 [2024-06-11 04:59:51,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.27 | bwd_microstep: 828.99 | bwd_inner_microstep: 828.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3507 [2024-06-11 04:59:52,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.45 | bwd_microstep: 1318.19 | bwd_inner_microstep: 1318.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3687 [2024-06-11 04:59:55,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.53 | bwd_microstep: 1530.46 | bwd_inner_microstep: 1530.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3449 [2024-06-11 04:59:56,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1386.36 | bwd_inner_microstep: 1386.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3609 [2024-06-11 04:59:58,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.38 | bwd_microstep: 1430.78 | bwd_inner_microstep: 1430.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3565 [2024-06-11 05:00:01,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.63 | bwd_microstep: 1562.36 | bwd_inner_microstep: 1562.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2266 [2024-06-11 05:00:02,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.94 | bwd_microstep: 970.97 | bwd_inner_microstep: 970.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-11 05:00:04,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.30 | bwd_microstep: 1650.73 | bwd_inner_microstep: 1650.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3562 [2024-06-11 05:00:06,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1398.05 | bwd_inner_microstep: 1398.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 05:00:08,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.87 | bwd_microstep: 1384.17 | bwd_inner_microstep: 1384.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3562 [2024-06-11 05:00:10,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1523.88 | bwd_inner_microstep: 1523.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3414 [2024-06-11 05:00:12,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.08 | bwd_microstep: 1310.11 | bwd_inner_microstep: 1310.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 05:00:20,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.52 | optimizer_step: 6.61 [2024-06-11 05:00:20,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.64 | bwd_microstep: 7916.82 | bwd_inner_microstep: 1407.66 | bwd_allreduce_microstep: 6509.06 | step_microstep: 41.02 [2024-06-11 05:00:20,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15282.85 | bwd: 47304.55 | bwd_inner: 40794.47 | bwd_allreduce: 6509.32 | step: 42.61 {'loss': 1.0814, 'learning_rate': 3.8704519165317923e-07, 'epoch': 0.94} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3478 [2024-06-11 05:00:23,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1569.46 | bwd_inner_microstep: 1569.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3402 [2024-06-11 05:00:24,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.66 | bwd_microstep: 1398.59 | bwd_inner_microstep: 1398.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3477 [2024-06-11 05:00:26,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.48 | bwd_microstep: 1216.49 | bwd_inner_microstep: 1216.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 05:00:28,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.31 | bwd_microstep: 1286.21 | bwd_inner_microstep: 1286.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 05:00:30,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1299.69 | bwd_inner_microstep: 1299.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-11 05:00:31,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.42 | bwd_microstep: 795.24 | bwd_inner_microstep: 795.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 05:00:33,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.54 | bwd_microstep: 1289.81 | bwd_inner_microstep: 1289.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1956 [2024-06-11 05:00:34,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.73 | bwd_microstep: 855.50 | bwd_inner_microstep: 855.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2901 [2024-06-11 05:00:35,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.60 | bwd_microstep: 1091.92 | bwd_inner_microstep: 1091.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 05:00:37,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.85 | bwd_microstep: 1250.26 | bwd_inner_microstep: 1250.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3507 [2024-06-11 05:00:39,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.65 | bwd_microstep: 1382.18 | bwd_inner_microstep: 1382.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 05:00:41,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.44 | bwd_microstep: 1490.57 | bwd_inner_microstep: 1490.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2329 [2024-06-11 05:00:42,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 367.93 | bwd_microstep: 989.72 | bwd_inner_microstep: 989.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2945 [2024-06-11 05:00:44,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 414.78 | bwd_microstep: 1095.96 | bwd_inner_microstep: 1095.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 05:00:46,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.07 | bwd_microstep: 1404.26 | bwd_inner_microstep: 1404.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3819 [2024-06-11 05:00:48,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.75 | bwd_microstep: 1691.09 | bwd_inner_microstep: 1691.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1949 [2024-06-11 05:00:49,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.58 | bwd_microstep: 825.84 | bwd_inner_microstep: 825.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3615 [2024-06-11 05:00:51,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1408.74 | bwd_inner_microstep: 1408.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-11 05:00:53,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1398.63 | bwd_inner_microstep: 1398.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 05:00:55,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1283.50 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 05:00:57,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1391.30 | bwd_inner_microstep: 1391.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 05:00:59,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1382.97 | bwd_inner_microstep: 1382.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-11 05:01:01,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1497.52 | bwd_inner_microstep: 1497.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-11 05:01:03,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.43 | bwd_microstep: 1286.83 | bwd_inner_microstep: 1286.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-11 05:01:04,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.42 | bwd_microstep: 1250.21 | bwd_inner_microstep: 1250.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3609 [2024-06-11 05:01:06,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1343.00 | bwd_inner_microstep: 1342.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2027 [2024-06-11 05:01:07,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.60 | bwd_microstep: 809.07 | bwd_inner_microstep: 809.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 05:01:09,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.40 | bwd_microstep: 1404.29 | bwd_inner_microstep: 1404.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 05:01:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.65 | bwd_microstep: 1280.68 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 05:01:13,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.99 | bwd_microstep: 1560.62 | bwd_inner_microstep: 1560.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2224 [2024-06-11 05:01:15,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.80 | bwd_microstep: 960.67 | bwd_inner_microstep: 960.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3568 [2024-06-11 05:01:19,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.62 [2024-06-11 05:01:19,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.67 | bwd_microstep: 4143.41 | bwd_inner_microstep: 1522.09 | bwd_allreduce_microstep: 2621.26 | step_microstep: 38.49 [2024-06-11 05:01:19,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15197.08 | bwd: 43334.26 | bwd_inner: 40712.08 | bwd_allreduce: 2621.49 | step: 40.21 {'loss': 1.1835, 'learning_rate': 3.797313159192628e-07, 'epoch': 0.94} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 05:01:21,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.63 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4695 [2024-06-11 05:01:24,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.17 | bwd_microstep: 1876.12 | bwd_inner_microstep: 1876.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857 [2024-06-11 05:01:25,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1362.63 | bwd_inner_microstep: 1362.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 05:01:27,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.15 | bwd_microstep: 1393.76 | bwd_inner_microstep: 1393.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2215 [2024-06-11 05:01:29,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.23 | bwd_microstep: 955.15 | bwd_inner_microstep: 955.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2782 [2024-06-11 05:01:30,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.38 | bwd_microstep: 1053.54 | bwd_inner_microstep: 1053.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 05:01:32,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.25 | bwd_microstep: 1285.58 | bwd_inner_microstep: 1285.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3477 [2024-06-11 05:01:34,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1185.46 | bwd_inner_microstep: 1185.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 05:01:35,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.71 | bwd_microstep: 1247.95 | bwd_inner_microstep: 1247.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2138 [2024-06-11 05:01:37,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.84 | bwd_microstep: 1022.61 | bwd_inner_microstep: 1022.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 05:01:38,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.90 | bwd_microstep: 1246.11 | bwd_inner_microstep: 1246.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 05:01:40,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.00 | bwd_microstep: 1340.65 | bwd_inner_microstep: 1340.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 05:01:42,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.10 | bwd_microstep: 1372.24 | bwd_inner_microstep: 1372.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-11 05:01:44,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.38 | bwd_microstep: 1329.03 | bwd_inner_microstep: 1329.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 05:01:46,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1484.17 | bwd_inner_microstep: 1484.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3658 [2024-06-11 05:01:48,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1616.60 | bwd_inner_microstep: 1616.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 05:01:50,731] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.30 | bwd_microstep: 1390.18 | bwd_inner_microstep: 1390.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 05:01:52,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.46 | bwd_microstep: 1290.09 | bwd_inner_microstep: 1290.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3531 [2024-06-11 05:01:54,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.52 | bwd_microstep: 1356.93 | bwd_inner_microstep: 1356.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2282 [2024-06-11 05:01:55,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.81 | bwd_microstep: 877.19 | bwd_inner_microstep: 877.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1927 [2024-06-11 05:01:56,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.34 | bwd_microstep: 697.67 | bwd_inner_microstep: 697.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3814 [2024-06-11 05:01:58,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.58 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 05:02:00,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.57 | bwd_microstep: 1284.71 | bwd_inner_microstep: 1284.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1931 [2024-06-11 05:02:01,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.18 | bwd_microstep: 760.23 | bwd_inner_microstep: 760.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2078 [2024-06-11 05:02:02,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.05 | bwd_microstep: 855.88 | bwd_inner_microstep: 855.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3716 [2024-06-11 05:02:04,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1336.13 | bwd_inner_microstep: 1336.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-11 05:02:06,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1396.73 | bwd_inner_microstep: 1396.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3536 [2024-06-11 05:02:08,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.97 | bwd_microstep: 1398.86 | bwd_inner_microstep: 1398.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3589 [2024-06-11 05:02:10,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.51 | bwd_microstep: 1368.25 | bwd_inner_microstep: 1368.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3808 [2024-06-11 05:02:12,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.47 | bwd_microstep: 1418.27 | bwd_inner_microstep: 1418.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-11 05:02:13,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.15 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3047 [2024-06-11 05:03:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.69 | optimizer_step: 6.60 [2024-06-11 05:03:49,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.93 | bwd_microstep: 95267.24 | bwd_inner_microstep: 1495.03 | bwd_allreduce_microstep: 93772.13 | step_microstep: 41.07 [2024-06-11 05:03:49,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15129.37 | bwd: 134141.56 | bwd_inner: 40368.49 | bwd_allreduce: 93772.39 | step: 42.55 {'loss': 1.1262, 'learning_rate': 3.7248654266965665e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-11 05:03:51,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1464.84 | bwd_inner_microstep: 1464.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 05:03:53,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.48 | bwd_microstep: 1372.38 | bwd_inner_microstep: 1372.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-11 05:03:54,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 437.70 | bwd_microstep: 1143.93 | bwd_inner_microstep: 1143.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1992 [2024-06-11 05:03:55,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.65 | bwd_microstep: 796.91 | bwd_inner_microstep: 796.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3796 [2024-06-11 05:03:57,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.31 | bwd_microstep: 1346.29 | bwd_inner_microstep: 1346.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3499 [2024-06-11 05:03:59,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.92 | bwd_microstep: 1476.27 | bwd_inner_microstep: 1476.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 05:04:01,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.54 | bwd_microstep: 1338.05 | bwd_inner_microstep: 1338.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 05:04:03,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.89 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 05:04:05,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.98 | bwd_microstep: 1382.95 | bwd_inner_microstep: 1382.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 05:04:53,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.01 | bwd_microstep: 1284.99 | bwd_inner_microstep: 1284.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2565 [2024-06-11 05:04:55,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 375.21 | bwd_microstep: 995.98 | bwd_inner_microstep: 995.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-11 05:04:57,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.73 | bwd_microstep: 1488.89 | bwd_inner_microstep: 1488.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-11 05:04:59,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1478.81 | bwd_inner_microstep: 1478.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 2965 [2024-06-11 05:05:01,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.91 | bwd_microstep: 1239.64 | bwd_inner_microstep: 1239.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2137 [2024-06-11 05:05:02,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.14 | bwd_microstep: 921.60 | bwd_inner_microstep: 921.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-11 05:05:04,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.40 | bwd_microstep: 1496.13 | bwd_inner_microstep: 1496.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-11 05:05:06,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.88 | bwd_microstep: 1565.17 | bwd_inner_microstep: 1565.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1898 [2024-06-11 05:05:07,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.28 | bwd_microstep: 679.82 | bwd_inner_microstep: 679.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181 [2024-06-11 05:05:08,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.53 | bwd_microstep: 855.79 | bwd_inner_microstep: 855.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3828 [2024-06-11 05:05:10,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.25 | bwd_microstep: 1547.81 | bwd_inner_microstep: 1547.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3628 [2024-06-11 05:05:12,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.73 | bwd_microstep: 1271.62 | bwd_inner_microstep: 1271.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-11 05:05:14,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.09 | bwd_microstep: 1152.25 | bwd_inner_microstep: 1152.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-11 05:05:16,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.05 | bwd_microstep: 1402.64 | bwd_inner_microstep: 1402.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 05:05:18,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.54 | bwd_microstep: 1283.68 | bwd_inner_microstep: 1283.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 653 [2024-06-11 05:05:18,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 109.94 | bwd_microstep: 276.69 | bwd_inner_microstep: 276.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 05:05:20,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.43 | bwd_microstep: 1374.89 | bwd_inner_microstep: 1374.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2290 [2024-06-11 05:05:21,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.28 | bwd_microstep: 905.47 | bwd_inner_microstep: 905.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-11 05:05:23,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1408.01 | bwd_inner_microstep: 1407.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 5438 [2024-06-11 05:05:26,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 776.33 | bwd_microstep: 2089.72 | bwd_inner_microstep: 2089.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3805 [2024-06-11 05:05:28,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1385.50 | bwd_inner_microstep: 1385.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3814 [2024-06-11 05:05:30,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.50 | bwd_microstep: 1644.16 | bwd_inner_microstep: 1644.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3805 [2024-06-11 05:05:36,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.12 | optimizer_step: 6.65 [2024-06-11 05:05:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.64 | bwd_microstep: 5537.23 | bwd_inner_microstep: 1741.09 | bwd_allreduce_microstep: 3796.08 | step_microstep: 38.17 [2024-06-11 05:05:36,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15367.69 | bwd: 44888.98 | bwd_inner: 41091.99 | bwd_allreduce: 3796.31 | step: 39.66 {'loss': 1.174, 'learning_rate': 3.653108974204145e-07, 'epoch': 0.94} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3406 [2024-06-11 05:05:38,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.02 | bwd_microstep: 1280.67 | bwd_inner_microstep: 1280.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3851 [2024-06-11 05:05:40,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.55 | bwd_microstep: 1454.47 | bwd_inner_microstep: 1454.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900 [2024-06-11 05:05:42,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.48 | bwd_microstep: 1581.47 | bwd_inner_microstep: 1581.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2303 [2024-06-11 05:05:43,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.11 | bwd_microstep: 908.99 | bwd_inner_microstep: 908.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 05:05:45,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.60 | bwd_microstep: 1377.99 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1938 [2024-06-11 05:05:46,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.78 | bwd_microstep: 696.61 | bwd_inner_microstep: 696.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-11 05:05:48,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1413.45 | bwd_inner_microstep: 1413.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3493 [2024-06-11 05:05:50,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.11 | bwd_microstep: 1384.69 | bwd_inner_microstep: 1384.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 05:05:52,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.00 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 05:05:54,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1387.45 | bwd_inner_microstep: 1387.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3423 [2024-06-11 05:05:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.33 | bwd_microstep: 1184.22 | bwd_inner_microstep: 1184.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 05:05:58,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1346.31 | bwd_inner_microstep: 1346.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-11 05:05:59,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.28 | bwd_microstep: 1315.85 | bwd_inner_microstep: 1315.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633 [2024-06-11 05:06:01,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.16 | bwd_microstep: 1537.35 | bwd_inner_microstep: 1537.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-11 05:06:03,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 1450.84 | bwd_inner_microstep: 1450.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3638 [2024-06-11 05:06:06,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.09 | bwd_microstep: 1741.04 | bwd_inner_microstep: 1741.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3829 [2024-06-11 05:06:08,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.40 | bwd_microstep: 1643.69 | bwd_inner_microstep: 1643.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3692 [2024-06-11 05:06:10,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.37 | bwd_microstep: 1484.03 | bwd_inner_microstep: 1484.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-11 05:06:12,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.21 | bwd_microstep: 1430.32 | bwd_inner_microstep: 1430.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3829 [2024-06-11 05:06:14,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.97 | bwd_microstep: 1264.05 | bwd_inner_microstep: 1264.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3804 [2024-06-11 05:06:16,788] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.86 | bwd_microstep: 1747.53 | bwd_inner_microstep: 1747.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 05:06:18,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.37 | bwd_microstep: 1394.75 | bwd_inner_microstep: 1394.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3474 [2024-06-11 05:06:20,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1478.04 | bwd_inner_microstep: 1478.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 05:06:22,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.41 | bwd_microstep: 1553.13 | bwd_inner_microstep: 1553.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 05:06:24,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1283.10 | bwd_inner_microstep: 1283.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 05:06:26,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.55 | bwd_microstep: 1554.31 | bwd_inner_microstep: 1554.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2066 [2024-06-11 05:06:28,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.62 | bwd_microstep: 918.36 | bwd_inner_microstep: 918.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3707 [2024-06-11 05:06:29,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.44 | bwd_microstep: 1294.57 | bwd_inner_microstep: 1294.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2020 [2024-06-11 05:06:31,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.20 | bwd_microstep: 805.91 | bwd_inner_microstep: 805.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 05:06:33,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.59 | bwd_microstep: 1642.50 | bwd_inner_microstep: 1642.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3812 [2024-06-11 05:06:35,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 641.72 | bwd_microstep: 1754.83 | bwd_inner_microstep: 1754.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3413 [2024-06-11 05:07:21,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-11 05:07:21,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.51 | bwd_microstep: 45793.74 | bwd_inner_microstep: 1447.80 | bwd_allreduce_microstep: 44345.87 | step_microstep: 39.67 [2024-06-11 05:07:21,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16425.86 | bwd: 88490.92 | bwd_inner: 44144.13 | bwd_allreduce: 44346.11 | step: 41.09 9.86s/it] 94%|█████████▍| 1620/1726 [28:21:54<2:21:05, 79.86s/it] 94%|█████████▍| 1621/1726 [28:22:57<2:10:52, 74.78s/it] 94%|█████████▍| 1621/1726 [28:22:57<2:10:52, 74.78s/it] 94%|█████████▍| 1622/1726 [28:23:56<2:01:20, 70.01s/it] 94%|█████████▍| 1622/1726 [28:23:56<2:01:20, 70.01s/it] 94%|█████████▍| 1623/1726 [28:26:26<2:41:10, 93.89s/it] 94%|█████████▍| 1623/1726 [28:26:26<2:41:10, 93.89s/it] 94%|█████████▍| 1624/1726 [28:28:13<2:46:29, 97.94s/it] 94%|█████████▍| 1624/1726 [28:28:13<2:46:29, 97.94s/it] 94%|████████�{'loss': 1.1827, 'learning_rate': 3.5820440544411807e-07, 'epoch': 0.94} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3546 [2024-06-11 05:07:24,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.82 | bwd_microstep: 1572.08 | bwd_inner_microstep: 1572.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3944 [2024-06-11 05:07:26,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.44 | bwd_microstep: 1684.89 | bwd_inner_microstep: 1684.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3900 [2024-06-11 05:07:28,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.95 | bwd_microstep: 1574.61 | bwd_inner_microstep: 1574.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2275 [2024-06-11 05:07:29,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 964.68 | bwd_inner_microstep: 964.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2049 [2024-06-11 05:07:31,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.25 | bwd_microstep: 810.03 | bwd_inner_microstep: 810.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 05:07:32,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.75 | bwd_microstep: 1241.97 | bwd_inner_microstep: 1241.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 05:07:34,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.63 | bwd_microstep: 1379.86 | bwd_inner_microstep: 1379.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3745 [2024-06-11 05:07:36,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.72 | bwd_microstep: 1631.22 | bwd_inner_microstep: 1631.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1889 [2024-06-11 05:07:54,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.48 | bwd_microstep: 683.83 | bwd_inner_microstep: 683.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 05:07:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.17 | bwd_microstep: 1279.87 | bwd_inner_microstep: 1279.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 05:07:59,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.34 | bwd_microstep: 1287.04 | bwd_inner_microstep: 1287.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 05:08:02,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.20 | bwd_microstep: 1500.61 | bwd_inner_microstep: 1500.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3491 [2024-06-11 05:08:04,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.99 | bwd_microstep: 1504.63 | bwd_inner_microstep: 1504.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2984 [2024-06-11 05:08:06,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.68 | bwd_microstep: 1036.55 | bwd_inner_microstep: 1036.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 05:08:07,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.73 | bwd_microstep: 1374.63 | bwd_inner_microstep: 1374.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 05:08:09,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1345.32 | bwd_inner_microstep: 1345.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 05:08:11,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.09 | bwd_microstep: 1249.32 | bwd_inner_microstep: 1249.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-11 05:08:13,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.39 | bwd_microstep: 1604.90 | bwd_inner_microstep: 1604.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3681 [2024-06-11 05:08:15,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.05 | bwd_microstep: 1521.28 | bwd_inner_microstep: 1521.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3637 [2024-06-11 05:08:17,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1418.69 | bwd_inner_microstep: 1418.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 05:08:19,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.48 | bwd_microstep: 1254.04 | bwd_inner_microstep: 1254.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 05:08:21,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.32 | bwd_microstep: 1493.58 | bwd_inner_microstep: 1493.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3701 [2024-06-11 05:08:23,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.35 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964 [2024-06-11 05:08:24,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.33 | bwd_microstep: 703.03 | bwd_inner_microstep: 703.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 05:08:26,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.08 | bwd_microstep: 1452.42 | bwd_inner_microstep: 1452.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-11 05:08:28,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.02 | bwd_microstep: 1295.28 | bwd_inner_microstep: 1295.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3430 [2024-06-11 05:08:30,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.50 | bwd_microstep: 1308.35 | bwd_inner_microstep: 1308.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3572 [2024-06-11 05:08:31,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.10 | bwd_microstep: 1299.90 | bwd_inner_microstep: 1299.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2126 [2024-06-11 05:08:33,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 344.68 | bwd_microstep: 921.24 | bwd_inner_microstep: 921.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3577 [2024-06-11 05:08:35,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.73 | bwd_microstep: 1541.54 | bwd_inner_microstep: 1541.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3590 [2024-06-11 05:08:37,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.12 | bwd_microstep: 1424.91 | bwd_inner_microstep: 1424.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607 [2024-06-11 05:08:40,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.03 | optimizer_step: 6.59 [2024-06-11 05:08:40,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.98 | bwd_microstep: 2164.43 | bwd_inner_microstep: 1827.31 | bwd_allreduce_microstep: 337.07 | step_microstep: 37.31 [2024-06-11 05:08:40,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15884.97 | bwd: 42884.16 | bwd_inner: 42546.20 | bwd_allreduce: 337.29 | step: 38.84 {'loss': 1.2129, 'learning_rate': 3.511670917698018e-07, 'epoch': 0.94} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1868 [2024-06-11 05:08:41,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 288.52 | bwd_microstep: 763.72 | bwd_inner_microstep: 763.65 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.07 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3983 [2024-06-11 05:08:43,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.90 | bwd_microstep: 1601.41 | bwd_inner_microstep: 1601.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3867 [2024-06-11 05:08:45,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.70 | bwd_microstep: 1561.49 | bwd_inner_microstep: 1561.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3871 [2024-06-11 05:08:47,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.15 | bwd_microstep: 1467.05 | bwd_inner_microstep: 1467.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3494 [2024-06-11 05:08:49,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.74 | bwd_microstep: 1219.46 | bwd_inner_microstep: 1219.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2252 [2024-06-11 05:08:50,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.51 | bwd_microstep: 964.00 | bwd_inner_microstep: 963.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3498 [2024-06-11 05:08:52,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.13 | bwd_microstep: 1284.48 | bwd_inner_microstep: 1284.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 05:08:54,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.16 | bwd_microstep: 1251.98 | bwd_inner_microstep: 1251.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3521 [2024-06-11 05:08:55,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.20 | bwd_microstep: 1226.52 | bwd_inner_microstep: 1226.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3427 [2024-06-11 05:08:57,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.03 | bwd_microstep: 1250.35 | bwd_inner_microstep: 1250.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3494 [2024-06-11 05:08:59,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.73 | bwd_microstep: 1315.13 | bwd_inner_microstep: 1315.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3527 [2024-06-11 05:09:01,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.71 | bwd_microstep: 1453.56 | bwd_inner_microstep: 1453.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-11 05:09:02,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.44 | bwd_microstep: 808.94 | bwd_inner_microstep: 808.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3696 [2024-06-11 05:09:04,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.21 | bwd_microstep: 1583.45 | bwd_inner_microstep: 1583.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2390 [2024-06-11 05:09:05,924] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.95 | bwd_microstep: 930.08 | bwd_inner_microstep: 930.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 05:09:07,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1486.99 | bwd_inner_microstep: 1486.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-11 05:09:09,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1421.08 | bwd_inner_microstep: 1421.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-11 05:09:10,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 675.95 | bwd_inner_microstep: 675.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-11 05:09:12,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.00 | bwd_microstep: 1528.12 | bwd_inner_microstep: 1528.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-11 05:09:14,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1450.56 | bwd_inner_microstep: 1450.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-11 05:09:17,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.94 | bwd_microstep: 1490.24 | bwd_inner_microstep: 1490.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-11 05:09:19,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.69 | bwd_microstep: 1493.87 | bwd_inner_microstep: 1493.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418 [2024-06-11 05:09:20,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.04 | bwd_microstep: 1184.07 | bwd_inner_microstep: 1184.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2024 [2024-06-11 05:09:21,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.58 | bwd_microstep: 714.58 | bwd_inner_microstep: 714.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 05:09:23,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.23 | bwd_microstep: 1556.84 | bwd_inner_microstep: 1556.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3629 [2024-06-11 05:09:25,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1473.91 | bwd_inner_microstep: 1473.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-11 05:09:27,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.52 | bwd_microstep: 1440.71 | bwd_inner_microstep: 1440.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2017 [2024-06-11 05:09:28,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.93 | bwd_microstep: 742.81 | bwd_inner_microstep: 742.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1950 [2024-06-11 05:09:29,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.84 | bwd_microstep: 701.80 | bwd_inner_microstep: 701.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2279 [2024-06-11 05:09:31,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.02 | bwd_microstep: 907.86 | bwd_inner_microstep: 907.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3596 [2024-06-11 05:09:32,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1310.55 | bwd_inner_microstep: 1310.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602 [2024-06-11 05:09:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.22 | optimizer_step: 6.63 [2024-06-11 05:09:41,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.56 | bwd_microstep: 8208.89 | bwd_inner_microstep: 1617.55 | bwd_allreduce_microstep: 6591.30 | step_microstep: 37.93 [2024-06-11 05:09:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14917.29 | bwd: 46470.45 | bwd_inner: 39878.20 | bwd_allreduce: 6591.55 | step: 39.39 {'loss': 1.1475, 'learning_rate': 3.441989811828417e-07, 'epoch': 0.94} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3416 [2024-06-11 05:09:43,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1437.52 | bwd_inner_microstep: 1437.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 05:09:45,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.48 | bwd_microstep: 1394.34 | bwd_inner_microstep: 1394.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 05:09:47,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.45 | bwd_microstep: 1346.81 | bwd_inner_microstep: 1346.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 05:09:49,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 1279.52 | bwd_inner_microstep: 1279.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3402 [2024-06-11 05:09:50,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.90 | bwd_microstep: 1149.08 | bwd_inner_microstep: 1149.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-11 05:09:52,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.52 | bwd_microstep: 1386.66 | bwd_inner_microstep: 1386.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3603 [2024-06-11 05:09:54,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.53 | bwd_microstep: 1341.53 | bwd_inner_microstep: 1341.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171 [2024-06-11 05:09:55,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.34 | bwd_microstep: 853.09 | bwd_inner_microstep: 853.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455 [2024-06-11 05:09:57,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.31 | bwd_microstep: 1322.60 | bwd_inner_microstep: 1322.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-11 05:09:59,802] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.74 | bwd_microstep: 1523.15 | bwd_inner_microstep: 1523.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 05:10:01,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.22 | bwd_microstep: 1483.08 | bwd_inner_microstep: 1483.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3577 [2024-06-11 05:10:03,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.22 | bwd_microstep: 1490.94 | bwd_inner_microstep: 1490.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 05:10:05,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.36 | bwd_microstep: 1477.40 | bwd_inner_microstep: 1477.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-11 05:10:07,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.99 | bwd_microstep: 1474.74 | bwd_inner_microstep: 1474.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-11 05:10:09,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.76 | bwd_microstep: 1425.00 | bwd_inner_microstep: 1424.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 05:10:12,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.70 | bwd_microstep: 1522.32 | bwd_inner_microstep: 1522.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3520 [2024-06-11 05:10:13,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.35 | bwd_microstep: 1366.88 | bwd_inner_microstep: 1366.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 05:10:15,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.62 | bwd_microstep: 1294.48 | bwd_inner_microstep: 1294.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3930 [2024-06-11 05:10:17,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.31 | bwd_microstep: 1527.25 | bwd_inner_microstep: 1527.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-11 05:10:19,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.16 | bwd_microstep: 1423.86 | bwd_inner_microstep: 1423.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1944 [2024-06-11 05:10:20,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 698.29 | bwd_inner_microstep: 698.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 05:10:22,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.58 | bwd_microstep: 1508.69 | bwd_inner_microstep: 1508.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3723 [2024-06-11 05:10:24,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1436.73 | bwd_inner_microstep: 1436.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2075 [2024-06-11 05:10:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.22 | bwd_microstep: 880.66 | bwd_inner_microstep: 880.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3538 [2024-06-11 05:10:27,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.39 | bwd_microstep: 1398.35 | bwd_inner_microstep: 1398.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 05:10:30,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.45 | bwd_microstep: 1562.85 | bwd_inner_microstep: 1562.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3820 [2024-06-11 05:10:32,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.42 | bwd_microstep: 1388.45 | bwd_inner_microstep: 1388.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3441 [2024-06-11 05:10:33,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.71 | bwd_microstep: 1282.46 | bwd_inner_microstep: 1282.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-11 05:10:35,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.74 | bwd_microstep: 1351.53 | bwd_inner_microstep: 1351.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-11 05:10:37,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.45 | bwd_microstep: 1447.56 | bwd_inner_microstep: 1447.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3820 [2024-06-11 05:10:40,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.45 | bwd_microstep: 1822.59 | bwd_inner_microstep: 1822.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 05:10:42,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 05:10:42,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.47 | bwd_microstep: 1973.52 | bwd_inner_microstep: 1409.14 | bwd_allreduce_microstep: 564.34 | step_microstep: 37.75 [2024-06-11 05:10:42,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16339.01 | bwd: 44271.96 | bwd_inner: 43706.72 | bwd_allreduce: 564.56 | step: 39.22 {'loss': 1.1476, 'learning_rate': 3.3730009822488864e-07, 'epoch': 0.94} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389 [2024-06-11 05:10:44,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.41 | bwd_microstep: 1332.54 | bwd_inner_microstep: 1332.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3852 [2024-06-11 05:10:46,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.08 | bwd_microstep: 1458.77 | bwd_inner_microstep: 1458.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 05:10:48,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1447.81 | bwd_inner_microstep: 1447.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3750 [2024-06-11 05:10:50,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.56 | bwd_microstep: 1436.54 | bwd_inner_microstep: 1436.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-11 05:10:52,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.72 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3753 [2024-06-11 05:10:54,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.90 | bwd_microstep: 1538.16 | bwd_inner_microstep: 1538.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2037 [2024-06-11 05:10:55,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.30 | bwd_microstep: 716.48 | bwd_inner_microstep: 716.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3758 [2024-06-11 05:10:57,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1543.08 | bwd_inner_microstep: 1543.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 05:10:59,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.23 | bwd_microstep: 1251.18 | bwd_inner_microstep: 1251.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3700 [2024-06-11 05:11:01,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.75 | bwd_microstep: 1423.81 | bwd_inner_microstep: 1423.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3483 [2024-06-11 05:11:03,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.55 | bwd_microstep: 1405.02 | bwd_inner_microstep: 1405.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-11 05:11:05,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1514.93 | bwd_inner_microstep: 1514.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3616 [2024-06-11 05:11:07,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.98 | bwd_microstep: 1424.71 | bwd_inner_microstep: 1424.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1974 [2024-06-11 05:11:08,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.76 | bwd_microstep: 800.36 | bwd_inner_microstep: 800.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3431 [2024-06-11 05:11:10,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.10 | bwd_microstep: 1215.22 | bwd_inner_microstep: 1215.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3487 [2024-06-11 05:11:12,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.81 | bwd_microstep: 1582.10 | bwd_inner_microstep: 1582.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3139 [2024-06-11 05:11:13,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.38 | bwd_microstep: 1253.61 | bwd_inner_microstep: 1253.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 05:11:15,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.13 | bwd_microstep: 1483.28 | bwd_inner_microstep: 1483.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-11 05:11:17,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.71 | bwd_microstep: 793.86 | bwd_inner_microstep: 793.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3503 [2024-06-11 05:11:19,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1415.07 | bwd_inner_microstep: 1415.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-11 05:11:21,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 1608.82 | bwd_inner_microstep: 1608.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 05:11:23,151] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.29 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3433 [2024-06-11 05:11:24,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.89 | bwd_microstep: 1152.87 | bwd_inner_microstep: 1152.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3811 [2024-06-11 05:11:27,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1654.31 | bwd_inner_microstep: 1654.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 05:11:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.63 | bwd_microstep: 1397.50 | bwd_inner_microstep: 1397.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3703 [2024-06-11 05:11:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1330.07 | bwd_inner_microstep: 1330.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3811 [2024-06-11 05:11:33,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.40 | bwd_microstep: 1693.30 | bwd_inner_microstep: 1693.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3576 [2024-06-11 05:11:35,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.89 | bwd_microstep: 1632.10 | bwd_inner_microstep: 1632.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3578 [2024-06-11 05:11:37,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.65 | bwd_microstep: 1401.66 | bwd_inner_microstep: 1401.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3727 [2024-06-11 05:11:39,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1432.84 | bwd_inner_microstep: 1432.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-11 05:11:41,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.15 | bwd_microstep: 1498.58 | bwd_inner_microstep: 1498.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-11 05:11:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.93 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-11 05:11:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.84 | bwd_microstep: 1489.01 | bwd_inner_microstep: 1481.30 | bwd_allreduce_microstep: 7.67 | step_microstep: 37.78 [2024-06-11 05:11:43,442] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16428.04 | bwd: 43955.26 | bwd_inner: 43946.69 | bwd_allreduce: 7.90 | step: 39.33 {'loss': 1.1369, 'learning_rate': 3.3047046719377305e-07, 'epoch': 0.94} �▍| 1625/1726 [28:29:58<2:48:33, 100.13s/it] 94%|█████████▍| 1625/1726 [28:29:58<2:48:33, 100.13s/it] 94%|█████████▍| 1626/1726 [28:31:16<2:35:51, 93.52s/it] 94%|█████████▍| 1626/1726 [28:31:16<2:35:51, 93.52s/it] 94%|█████████▍| 1627/1726 [28:32:18<2:18:33, 83.97s/it] 94%|█████████▍| 1627/1726 [28:32:18<2:18:33, 83.97s/it] 94%|█████████▍| 1628/1726 [28:33:19<2:05:52, 77.06s/it] 94%|█████████▍| 1628/1726 [28:33:19<2:05:52, 77.06s/it] 94%|█████████▍| 1629/1726 [28:34:20<1:56:39, 72.16s/it] 94%|█████████▍| 1629/1726 [28:34:20<1:56:39, 72dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-11 05:11:45,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.05 | bwd_microstep: 1343.59 | bwd_inner_microstep: 1343.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 05:11:47,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.46 | bwd_microstep: 1344.31 | bwd_inner_microstep: 1344.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 05:11:49,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.20 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 05:11:50,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.69 | bwd_microstep: 1303.32 | bwd_inner_microstep: 1303.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-11 05:11:51,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.45 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 05:11:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.88 | bwd_microstep: 1286.94 | bwd_inner_microstep: 1286.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-11 05:11:55,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.36 | bwd_microstep: 1253.63 | bwd_inner_microstep: 1253.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1919 [2024-06-11 05:11:56,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.68 | bwd_microstep: 717.80 | bwd_inner_microstep: 717.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 05:11:58,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.63 | bwd_microstep: 1484.30 | bwd_inner_microstep: 1484.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3755 [2024-06-11 05:12:00,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.32 | bwd_microstep: 1462.91 | bwd_inner_microstep: 1462.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3450 [2024-06-11 05:12:02,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.15 | bwd_microstep: 1299.80 | bwd_inner_microstep: 1299.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989 [2024-06-11 05:12:03,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.20 | bwd_microstep: 896.11 | bwd_inner_microstep: 896.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3623 [2024-06-11 05:12:05,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.54 | bwd_microstep: 1708.98 | bwd_inner_microstep: 1708.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 05:12:07,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-11 05:12:09,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1342.43 | bwd_inner_microstep: 1342.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3529 [2024-06-11 05:12:11,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.42 | bwd_microstep: 1414.45 | bwd_inner_microstep: 1414.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3567 [2024-06-11 05:12:13,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.93 | bwd_microstep: 1298.29 | bwd_inner_microstep: 1298.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2174 [2024-06-11 05:12:14,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.26 | bwd_microstep: 887.89 | bwd_inner_microstep: 887.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3691 [2024-06-11 05:12:16,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.82 | bwd_microstep: 1617.68 | bwd_inner_microstep: 1617.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3610 [2024-06-11 05:12:19,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.10 | bwd_microstep: 1601.53 | bwd_inner_microstep: 1601.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3626 [2024-06-11 05:12:21,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.88 | bwd_microstep: 1709.53 | bwd_inner_microstep: 1709.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3747 [2024-06-11 05:12:23,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.46 | bwd_microstep: 1373.18 | bwd_inner_microstep: 1373.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737 [2024-06-11 05:12:25,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1433.44 | bwd_inner_microstep: 1433.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 05:12:27,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.58 | bwd_microstep: 1298.05 | bwd_inner_microstep: 1298.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-11 05:12:29,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.78 | bwd_microstep: 1499.89 | bwd_inner_microstep: 1499.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 05:12:31,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.96 | bwd_microstep: 1556.24 | bwd_inner_microstep: 1556.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 05:12:33,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.68 | bwd_microstep: 1454.16 | bwd_inner_microstep: 1454.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-11 05:12:35,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.36 | bwd_microstep: 1512.95 | bwd_inner_microstep: 1512.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3809 [2024-06-11 05:12:37,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.85 | bwd_microstep: 1655.30 | bwd_inner_microstep: 1655.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 05:12:39,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.35 | bwd_microstep: 1560.09 | bwd_inner_microstep: 1560.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602 [2024-06-11 05:12:41,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.14 | bwd_microstep: 1534.21 | bwd_inner_microstep: 1534.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3593 [2024-06-11 05:12:44,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.07 | optimizer_step: 6.62 [2024-06-11 05:12:44,068] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.66 | bwd_microstep: 1535.15 | bwd_inner_microstep: 1527.21 | bwd_allreduce_microstep: 7.89 | step_microstep: 37.51 [2024-06-11 05:12:44,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16378.12 | bwd: 43903.89 | bwd_inner: 43895.10 | bwd_allreduce: 8.12 | step: 38.97 {'loss': 1.1472, 'learning_rate': 3.2371011214342053e-07, 'epoch': 0.94} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-11 05:12:46,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.61 | bwd_microstep: 1405.57 | bwd_inner_microstep: 1405.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 05:12:47,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.70 | bwd_microstep: 1382.55 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3845 [2024-06-11 05:12:50,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.56 | bwd_microstep: 1560.34 | bwd_inner_microstep: 1560.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 05:12:51,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.07 | bwd_microstep: 1342.22 | bwd_inner_microstep: 1342.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 05:12:53,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.34 | bwd_microstep: 1484.94 | bwd_inner_microstep: 1484.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3732 [2024-06-11 05:12:55,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1335.07 | bwd_inner_microstep: 1335.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3685 [2024-06-11 05:12:58,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.30 | bwd_microstep: 1626.05 | bwd_inner_microstep: 1626.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3694 [2024-06-11 05:13:00,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.59 | bwd_microstep: 1529.64 | bwd_inner_microstep: 1529.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-11 05:13:02,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.25 | bwd_microstep: 1345.92 | bwd_inner_microstep: 1345.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1957 [2024-06-11 05:13:03,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.38 | bwd_microstep: 892.19 | bwd_inner_microstep: 892.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3387 [2024-06-11 05:13:05,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.55 | bwd_microstep: 1336.89 | bwd_inner_microstep: 1336.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 05:13:06,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 1344.75 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3512 [2024-06-11 05:13:08,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.51 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3602 [2024-06-11 05:13:11,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1567.02 | bwd_inner_microstep: 1566.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1919 [2024-06-11 05:13:12,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.46 | bwd_microstep: 782.54 | bwd_inner_microstep: 782.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 05:13:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 794.24 | bwd_inner_microstep: 794.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2076 [2024-06-11 05:13:14,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.68 | bwd_microstep: 822.29 | bwd_inner_microstep: 822.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 05:13:16,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.02 | bwd_microstep: 1288.81 | bwd_inner_microstep: 1288.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3588 [2024-06-11 05:13:18,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.94 | bwd_microstep: 1340.93 | bwd_inner_microstep: 1340.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2276 [2024-06-11 05:13:19,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.87 | bwd_microstep: 974.23 | bwd_inner_microstep: 974.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2067 [2024-06-11 05:13:20,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.17 | bwd_microstep: 816.45 | bwd_inner_microstep: 816.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-11 05:13:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.33 | bwd_microstep: 975.84 | bwd_inner_microstep: 975.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1949 [2024-06-11 05:13:22,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.37 | bwd_microstep: 700.35 | bwd_inner_microstep: 700.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 05:13:24,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.36 | bwd_microstep: 1457.94 | bwd_inner_microstep: 1457.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2272 [2024-06-11 05:13:26,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.43 | bwd_microstep: 878.31 | bwd_inner_microstep: 878.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3565 [2024-06-11 05:13:27,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.10 | bwd_microstep: 1304.21 | bwd_inner_microstep: 1304.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3825 [2024-06-11 05:13:29,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.74 | bwd_microstep: 1391.94 | bwd_inner_microstep: 1391.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3585 [2024-06-11 05:13:31,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.44 | bwd_microstep: 1425.39 | bwd_inner_microstep: 1425.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3574 [2024-06-11 05:13:33,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.23 | bwd_microstep: 1444.68 | bwd_inner_microstep: 1444.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2223 [2024-06-11 05:13:35,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.69 | bwd_microstep: 1059.41 | bwd_inner_microstep: 1059.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3563 [2024-06-11 05:13:37,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.25 | bwd_microstep: 1471.90 | bwd_inner_microstep: 1471.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3455 [2024-06-11 05:13:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.22 | optimizer_step: 6.62 [2024-06-11 05:13:44,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 6887.62 | bwd_inner_microstep: 1532.14 | bwd_allreduce_microstep: 5355.43 | step_microstep: 37.98 [2024-06-11 05:13:44,677] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14924.87 | bwd: 45351.63 | bwd_inner: 39995.26 | bwd_allreduce: 5355.67 | step: 39.47 {'loss': 1.1699, 'learning_rate': 3.17019056883765e-07, 'epoch': 0.94} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3472 [2024-06-11 05:13:46,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.79 | bwd_microstep: 1563.99 | bwd_inner_microstep: 1563.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2399 [2024-06-11 05:13:48,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.14 | bwd_microstep: 998.50 | bwd_inner_microstep: 998.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501 [2024-06-11 05:13:50,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.03 | bwd_microstep: 1445.03 | bwd_inner_microstep: 1445.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 05:13:52,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.65 | bwd_microstep: 1380.30 | bwd_inner_microstep: 1380.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 05:13:54,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.65 | bwd_microstep: 1390.38 | bwd_inner_microstep: 1390.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 05:13:55,821] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.62 | bwd_microstep: 1284.15 | bwd_inner_microstep: 1284.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 05:13:57,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1286.81 | bwd_inner_microstep: 1286.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3506 [2024-06-11 05:13:59,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.83 | bwd_microstep: 1287.69 | bwd_inner_microstep: 1287.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 05:14:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1384.26 | bwd_inner_microstep: 1384.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3510 [2024-06-11 05:14:02,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.69 | bwd_microstep: 1222.70 | bwd_inner_microstep: 1222.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3684 [2024-06-11 05:14:04,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.39 | bwd_microstep: 1417.02 | bwd_inner_microstep: 1416.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3495 [2024-06-11 05:14:07,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.53 | bwd_microstep: 1579.65 | bwd_inner_microstep: 1579.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3834 [2024-06-11 05:14:09,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.41 | bwd_microstep: 1614.90 | bwd_inner_microstep: 1614.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3673 [2024-06-11 05:14:11,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.43 | bwd_microstep: 1719.65 | bwd_inner_microstep: 1719.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3542 [2024-06-11 05:14:14,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.10 | bwd_microstep: 1692.99 | bwd_inner_microstep: 1692.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3492 [2024-06-11 05:14:15,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.78 | bwd_microstep: 1205.62 | bwd_inner_microstep: 1205.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1986 [2024-06-11 05:14:16,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.94 | bwd_microstep: 834.21 | bwd_inner_microstep: 834.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 05:14:18,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.00 | bwd_microstep: 1397.79 | bwd_inner_microstep: 1397.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 05:14:20,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.85 | bwd_microstep: 1514.31 | bwd_inner_microstep: 1514.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3472 [2024-06-11 05:14:22,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 478.92 | bwd_microstep: 1247.93 | bwd_inner_microstep: 1247.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-11 05:14:24,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.55 | bwd_microstep: 1435.42 | bwd_inner_microstep: 1435.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3619 [2024-06-11 05:14:26,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.29 | bwd_microstep: 1614.31 | bwd_inner_microstep: 1614.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3697 [2024-06-11 05:14:28,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.40 | bwd_microstep: 1383.80 | bwd_inner_microstep: 1383.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2292 [2024-06-11 05:14:30,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.75 | bwd_microstep: 981.88 | bwd_inner_microstep: 981.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 05:14:32,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.53 | bwd_microstep: 1417.20 | bwd_inner_microstep: 1417.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715 [2024-06-11 05:14:34,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.49 | bwd_microstep: 1436.65 | bwd_inner_microstep: 1436.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2018 [2024-06-11 05:14:35,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.14 | bwd_microstep: 863.74 | bwd_inner_microstep: 863.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3698 [2024-06-11 05:14:37,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.99 | bwd_microstep: 1587.02 | bwd_inner_microstep: 1587.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3017 [2024-06-11 05:14:39,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1326.35 | bwd_inner_microstep: 1325.21 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.09 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3786 [2024-06-11 05:14:41,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.43 | bwd_microstep: 1696.76 | bwd_inner_microstep: 1696.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3826 [2024-06-11 05:14:43,888] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.29 | bwd_microstep: 1690.99 | bwd_inner_microstep: 1690.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-11 05:14:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.17 | optimizer_step: 6.65 [2024-06-11 05:14:46,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.84 | bwd_microstep: 1620.28 | bwd_inner_microstep: 1611.96 | bwd_allreduce_microstep: 8.27 | step_microstep: 37.87 [2024-06-11 05:14:46,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16592.30 | bwd: 44522.34 | bwd_inner: 44512.95 | bwd_allreduce: 8.63 | step: 39.62 {'loss': 1.1481, 'learning_rate': 3.103973249806691e-07, 'epoch': 0.95} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 05:14:48,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.38 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3911 [2024-06-11 05:14:50,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.41 | bwd_microstep: 1690.07 | bwd_inner_microstep: 1690.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3480 [2024-06-11 05:14:52,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.30 | bwd_microstep: 1442.26 | bwd_inner_microstep: 1442.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1937 [2024-06-11 05:14:53,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.29 | bwd_microstep: 792.38 | bwd_inner_microstep: 792.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2095 [2024-06-11 05:14:54,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.88 | bwd_microstep: 774.58 | bwd_inner_microstep: 774.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-11 05:14:55,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.95 | bwd_microstep: 791.08 | bwd_inner_microstep: 791.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-11 05:14:56,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.99 | bwd_microstep: 711.32 | bwd_inner_microstep: 711.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3786 [2024-06-11 05:14:58,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1549.10 | bwd_inner_microstep: 1549.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 05:15:00,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.44 | bwd_microstep: 1392.47 | bwd_inner_microstep: 1392.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-11 05:15:02,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.24 | bwd_microstep: 1276.47 | bwd_inner_microstep: 1276.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3717 [2024-06-11 05:15:04,800] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.41 | bwd_microstep: 1702.36 | bwd_inner_microstep: 1702.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2307 [2024-06-11 05:15:06,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.40 | bwd_microstep: 981.65 | bwd_inner_microstep: 981.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 05:15:08,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.52 | bwd_microstep: 1341.84 | bwd_inner_microstep: 1341.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3457 [2024-06-11 05:15:09,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.78 | bwd_microstep: 1424.99 | bwd_inner_microstep: 1424.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827 [2024-06-11 05:15:12,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.46 | bwd_microstep: 1618.92 | bwd_inner_microstep: 1618.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3688 [2024-06-11 05:15:14,311] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.99 | bwd_microstep: 1526.67 | bwd_inner_microstep: 1526.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-11 05:15:16,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.48 | bwd_microstep: 1394.45 | bwd_inner_microstep: 1394.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3826 [2024-06-11 05:15:18,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.10 | bwd_microstep: 1659.09 | bwd_inner_microstep: 1659.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3507 [2024-06-11 05:15:20,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.21 | bwd_microstep: 1223.41 | bwd_inner_microstep: 1223.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 05:15:22,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.33 | bwd_microstep: 1558.99 | bwd_inner_microstep: 1558.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438 [2024-06-11 05:15:23,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.64 | bwd_microstep: 1158.27 | bwd_inner_microstep: 1158.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3822 [2024-06-11 05:15:26,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.21 | bwd_microstep: 1487.11 | bwd_inner_microstep: 1487.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 05:15:27,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.46 | bwd_microstep: 1352.53 | bwd_inner_microstep: 1352.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3638 [2024-06-11 05:15:29,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1504.15 | bwd_inner_microstep: 1504.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3607 [2024-06-11 05:15:32,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.97 | bwd_microstep: 1703.51 | bwd_inner_microstep: 1703.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3477 [2024-06-11 05:15:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1314.27 | bwd_inner_microstep: 1314.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2299 [2024-06-11 05:15:35,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.54 | bwd_microstep: 1022.25 | bwd_inner_microstep: 1022.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3587 [2024-06-11 05:15:37,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.33 | bwd_microstep: 1435.39 | bwd_inner_microstep: 1435.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-11 05:15:39,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.21 | bwd_microstep: 1599.32 | bwd_inner_microstep: 1599.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3602 [2024-06-11 05:15:41,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.40 | bwd_microstep: 1439.11 | bwd_inner_microstep: 1439.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-11 05:15:43,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.23 | bwd_microstep: 1450.12 | bwd_inner_microstep: 1450.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 05:15:47,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.25 | optimizer_step: 6.62 [2024-06-11 05:15:47,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.34 | bwd_microstep: 3679.41 | bwd_inner_microstep: 1578.29 | bwd_allreduce_microstep: 2101.04 | step_microstep: 39.40 [2024-06-11 05:15:47,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16098.37 | bwd: 45375.57 | bwd_inner: 43273.45 | bwd_allreduce: 2101.36 | step: 41.11 {'loss': 1.1526, 'learning_rate': 3.038449397558396e-07, 'epoch': 0.95} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509 [2024-06-11 05:15:50,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.66 | bwd_microstep: 1577.03 | bwd_inner_microstep: 1577.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4025 [2024-06-11 05:15:52,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.12 | bwd_microstep: 1542.85 | bwd_inner_microstep: 1540.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3581 [2024-06-11 05:15:54,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.82 | bwd_microstep: 1504.77 | bwd_inner_microstep: 1504.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 05:15:56,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.58 | bwd_microstep: 1650.43 | bwd_inner_microstep: 1650.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 05:15:58,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.39 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3424 [2024-06-11 05:16:00,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.13 | bwd_microstep: 1151.63 | bwd_inner_microstep: 1151.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2190 [2024-06-11 05:16:01,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.92 | bwd_microstep: 951.62 | bwd_inner_microstep: 951.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3495 [2024-06-11 05:16:03,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.96 | bwd_microstep: 1284.87 | bwd_inner_microstep: 1284.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1883 [2024-06-11 05:16:04,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.71 | bwd_microstep: 682.04 | bwd_inner_microstep: 682.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3680 [2024-06-11 05:16:06,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.10 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3440 [2024-06-11 05:16:08,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.35 | bwd_microstep: 1345.27 | bwd_inner_microstep: 1345.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-11 05:16:10,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.16 | bwd_microstep: 1615.88 | bwd_inner_microstep: 1615.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3482 [2024-06-11 05:16:12,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.01 | bwd_microstep: 1429.49 | bwd_inner_microstep: 1429.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3714 [2024-06-11 05:16:14,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.77 | bwd_microstep: 1624.35 | bwd_inner_microstep: 1624.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3670 [2024-06-11 05:16:16,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.65 | bwd_microstep: 1511.90 | bwd_inner_microstep: 1511.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1941 [2024-06-11 05:16:17,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.34 | bwd_microstep: 730.13 | bwd_inner_microstep: 730.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-11 05:16:19,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1514.21 | bwd_inner_microstep: 1514.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3619 [2024-06-11 05:16:21,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1510.33 | bwd_inner_microstep: 1510.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3580 [2024-06-11 05:16:23,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.22 | bwd_microstep: 1399.26 | bwd_inner_microstep: 1399.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2284 [2024-06-11 05:16:25,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.34 | bwd_microstep: 908.89 | bwd_inner_microstep: 908.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3505 [2024-06-11 05:16:27,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.34 | bwd_microstep: 1390.23 | bwd_inner_microstep: 1390.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-11 05:16:29,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.82 | bwd_microstep: 1525.85 | bwd_inner_microstep: 1525.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2283 [2024-06-11 05:16:30,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.78 | bwd_microstep: 879.91 | bwd_inner_microstep: 879.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3614 [2024-06-11 05:16:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.69 | bwd_microstep: 1610.92 | bwd_inner_microstep: 1610.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2995 [2024-06-11 05:16:34,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 419.84 | bwd_microstep: 1109.41 | bwd_inner_microstep: 1109.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 05:16:36,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.36 | bwd_microstep: 1552.16 | bwd_inner_microstep: 1552.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 2883 [2024-06-11 05:16:37,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.53 | bwd_microstep: 1209.73 | bwd_inner_microstep: 1209.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 29, images per sample: 7.25, dynamic token length: 3558 [2024-06-11 05:16:39,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.64 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3522 [2024-06-11 05:16:41,641] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1294.47 | bwd_inner_microstep: 1294.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805 [2024-06-11 05:16:43,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.40 | bwd_microstep: 1449.94 | bwd_inner_microstep: 1449.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3426 [2024-06-11 05:16:45,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.78 | bwd_microstep: 1447.43 | bwd_inner_microstep: 1447.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3377 [2024-06-11 05:16:50,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-11 05:16:50,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.56 | bwd_microstep: 4295.96 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 3001.17 | step_microstep: 39.32 [2024-06-11 05:16:50,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16048.15 | bwd: 46059.35 | bwd_inner: 43054.73 | bwd_allreduce: 3001.41 | step: 40.89 {'loss': 1.2055, 'learning_rate': 2.9736192428674093e-07, 'epoch': 0.95} .16s/it] 94%|█████████▍| 1630/1726 [28:35:20<1:49:55, 68.70s/it] 94%|█████████▍| 1630/1726 [28:35:20<1:49:55, 68.70s/it] 94%|█████████▍| 1631/1726 [28:36:21<1:44:55, 66.27s/it] 94%|█████████▍| 1631/1726 [28:36:21<1:44:55, 66.27s/it] 95%|█████████▍| 1632/1726 [28:37:22<1:41:33, 64.83s/it] 95%|█████████▍| 1632/1726 [28:37:22<1:41:33, 64.83s/it] 95%|█████████▍| 1633/1726 [28:38:24<1:39:05, 63.93s/it] 95%|█████████▍| 1633/1726 [28:38:24<1:39:05, 63.93s/it] 95%|█████████▍| 1634/1726 [28:39:27<1:37:20, 63.49s/it] 95%|█████████dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 05:16:52,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.49 | bwd_microstep: 1375.20 | bwd_inner_microstep: 1375.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-11 05:16:53,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.89 | bwd_microstep: 1180.62 | bwd_inner_microstep: 1180.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865 [2024-06-11 05:16:55,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.61 | bwd_microstep: 741.31 | bwd_inner_microstep: 741.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 05:16:56,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.90 | bwd_microstep: 1386.03 | bwd_inner_microstep: 1386.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3883 [2024-06-11 05:16:59,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.34 | bwd_microstep: 1517.20 | bwd_inner_microstep: 1517.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 05:17:00,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1251.61 | bwd_inner_microstep: 1251.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3769 [2024-06-11 05:17:02,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.33 | bwd_microstep: 1437.65 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-11 05:17:04,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.47 | bwd_microstep: 1345.95 | bwd_inner_microstep: 1345.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2693 [2024-06-11 05:17:06,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.89 | bwd_microstep: 1020.38 | bwd_inner_microstep: 1020.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3722 [2024-06-11 05:17:08,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.85 | bwd_microstep: 1490.29 | bwd_inner_microstep: 1490.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3670 [2024-06-11 05:17:09,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.01 | bwd_microstep: 1326.33 | bwd_inner_microstep: 1326.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 749 [2024-06-11 05:17:10,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 119.00 | bwd_microstep: 302.07 | bwd_inner_microstep: 302.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1886 [2024-06-11 05:17:11,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.65 | bwd_microstep: 682.72 | bwd_inner_microstep: 682.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3674 [2024-06-11 05:17:13,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.35 | bwd_microstep: 1570.56 | bwd_inner_microstep: 1570.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3475 [2024-06-11 05:17:15,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.23 | bwd_microstep: 1344.12 | bwd_inner_microstep: 1344.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 05:17:17,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.89 | bwd_microstep: 1489.09 | bwd_inner_microstep: 1489.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1901 [2024-06-11 05:17:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.27 | bwd_microstep: 779.35 | bwd_inner_microstep: 779.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 05:17:20,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.44 | bwd_microstep: 1354.82 | bwd_inner_microstep: 1354.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3840 [2024-06-11 05:17:22,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.64 | bwd_microstep: 1859.38 | bwd_inner_microstep: 1859.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3602 [2024-06-11 05:17:24,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.73 | bwd_microstep: 1538.20 | bwd_inner_microstep: 1538.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 911 [2024-06-11 05:17:25,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.97 | bwd_microstep: 371.94 | bwd_inner_microstep: 371.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3619 [2024-06-11 05:17:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.51 | bwd_microstep: 1417.73 | bwd_inner_microstep: 1417.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 957 [2024-06-11 05:17:27,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.95 | bwd_microstep: 381.37 | bwd_inner_microstep: 381.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 05:17:30,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.06 | bwd_microstep: 1637.75 | bwd_inner_microstep: 1637.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 05:17:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.66 | bwd_microstep: 1499.85 | bwd_inner_microstep: 1499.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3557 [2024-06-11 05:17:34,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.48 | bwd_microstep: 1333.20 | bwd_inner_microstep: 1333.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 698 [2024-06-11 05:17:34,551] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.65 | bwd_microstep: 286.27 | bwd_inner_microstep: 286.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 05:17:36,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.05 | bwd_microstep: 1308.77 | bwd_inner_microstep: 1308.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3805 [2024-06-11 05:17:38,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.44 | bwd_microstep: 1583.63 | bwd_inner_microstep: 1583.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3461 [2024-06-11 05:17:40,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.47 | bwd_microstep: 1473.76 | bwd_inner_microstep: 1473.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3802 [2024-06-11 05:17:42,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.80 | bwd_microstep: 1716.50 | bwd_inner_microstep: 1716.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-11 05:17:51,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.00 | optimizer_gradients: 4.28 | optimizer_step: 6.58 [2024-06-11 05:17:51,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.82 | bwd_microstep: 7600.38 | bwd_inner_microstep: 1867.00 | bwd_allreduce_microstep: 5733.30 | step_microstep: 41.59 [2024-06-11 05:17:51,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14842.44 | bwd: 45604.05 | bwd_inner: 39869.82 | bwd_allreduce: 5733.55 | step: 43.11 {'loss': 1.1403, 'learning_rate': 2.909483014065195e-07, 'epoch': 0.95} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2011 [2024-06-11 05:17:52,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.74 | bwd_microstep: 893.48 | bwd_inner_microstep: 893.37 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2514 [2024-06-11 05:17:53,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.87 | bwd_microstep: 940.46 | bwd_inner_microstep: 940.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3537 [2024-06-11 05:17:55,544] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.93 | bwd_microstep: 1289.51 | bwd_inner_microstep: 1289.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 05:17:57,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.82 | bwd_microstep: 1474.49 | bwd_inner_microstep: 1474.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3533 [2024-06-11 05:17:59,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1229.75 | bwd_inner_microstep: 1229.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1387 [2024-06-11 05:18:00,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 202.20 | bwd_microstep: 525.00 | bwd_inner_microstep: 524.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1949 [2024-06-11 05:18:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.25 | bwd_microstep: 792.78 | bwd_inner_microstep: 792.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1957 [2024-06-11 05:18:02,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.36 | bwd_microstep: 795.73 | bwd_inner_microstep: 795.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2476 [2024-06-11 05:18:03,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.85 | bwd_microstep: 954.41 | bwd_inner_microstep: 954.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4048 [2024-06-11 05:18:06,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 663.78 | bwd_microstep: 1814.00 | bwd_inner_microstep: 1813.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520 [2024-06-11 05:18:08,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.84 | bwd_microstep: 1584.40 | bwd_inner_microstep: 1584.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 05:18:10,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.82 | bwd_microstep: 1480.43 | bwd_inner_microstep: 1480.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 05:18:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.75 | bwd_microstep: 1377.23 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1912 [2024-06-11 05:18:13,246] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.34 | bwd_microstep: 778.60 | bwd_inner_microstep: 778.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-11 05:18:15,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.47 | bwd_microstep: 1611.79 | bwd_inner_microstep: 1611.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 05:18:17,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.03 | bwd_microstep: 1487.73 | bwd_inner_microstep: 1487.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3651 [2024-06-11 05:18:19,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.39 | bwd_microstep: 1614.75 | bwd_inner_microstep: 1614.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-11 05:18:21,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.32 | bwd_microstep: 1353.38 | bwd_inner_microstep: 1353.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2172 [2024-06-11 05:18:23,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.85 | bwd_microstep: 1012.93 | bwd_inner_microstep: 1012.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 05:18:25,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1492.80 | bwd_inner_microstep: 1492.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 05:18:26,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.21 | bwd_microstep: 1287.39 | bwd_inner_microstep: 1287.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-11 05:18:28,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1419.04 | bwd_inner_microstep: 1419.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 941 [2024-06-11 05:18:29,338] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 146.79 | bwd_microstep: 378.84 | bwd_inner_microstep: 378.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3828 [2024-06-11 05:18:31,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.87 | bwd_microstep: 1452.78 | bwd_inner_microstep: 1452.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-11 05:18:33,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3765 [2024-06-11 05:18:35,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.16 | bwd_microstep: 1644.57 | bwd_inner_microstep: 1644.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.13 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3564 [2024-06-11 05:18:37,445] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.47 | bwd_microstep: 1267.50 | bwd_inner_microstep: 1267.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 05:18:39,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.32 | bwd_microstep: 1288.18 | bwd_inner_microstep: 1288.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3806 [2024-06-11 05:18:41,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1459.31 | bwd_inner_microstep: 1459.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3437 [2024-06-11 05:18:42,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.42 | bwd_microstep: 1161.31 | bwd_inner_microstep: 1161.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1936 [2024-06-11 05:18:43,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.56 | bwd_microstep: 789.22 | bwd_inner_microstep: 789.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-11 05:18:52,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.40 | optimizer_step: 6.60 [2024-06-11 05:18:52,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.27 | bwd_microstep: 7877.13 | bwd_inner_microstep: 1876.47 | bwd_allreduce_microstep: 6000.57 | step_microstep: 40.67 [2024-06-11 05:18:52,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14840.38 | bwd: 46043.81 | bwd_inner: 40042.21 | bwd_allreduce: 6000.86 | step: 42.32 {'loss': 1.129, 'learning_rate': 2.8460409370392405e-07, 'epoch': 0.95} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3553 [2024-06-11 05:18:54,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.71 | bwd_microstep: 1355.47 | bwd_inner_microstep: 1355.36 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3459 [2024-06-11 05:18:56,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.58 | bwd_microstep: 1381.71 | bwd_inner_microstep: 1381.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 05:18:58,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1277.43 | bwd_inner_microstep: 1277.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 05:18:59,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.63 | bwd_microstep: 1251.01 | bwd_inner_microstep: 1250.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3797 [2024-06-11 05:19:01,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.16 | bwd_microstep: 1452.62 | bwd_inner_microstep: 1452.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-11 05:19:03,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1387.38 | bwd_inner_microstep: 1387.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-11 05:19:05,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.31 | bwd_microstep: 1347.48 | bwd_inner_microstep: 1347.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2245 [2024-06-11 05:19:06,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 361.65 | bwd_microstep: 967.31 | bwd_inner_microstep: 967.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-11 05:19:08,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.99 | bwd_microstep: 1251.62 | bwd_inner_microstep: 1251.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 846 [2024-06-11 05:19:09,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 133.19 | bwd_microstep: 350.32 | bwd_inner_microstep: 350.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1950 [2024-06-11 05:19:10,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.16 | bwd_microstep: 792.58 | bwd_inner_microstep: 792.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509 [2024-06-11 05:19:12,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.87 | bwd_microstep: 1438.89 | bwd_inner_microstep: 1438.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.20 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3506 [2024-06-11 05:19:14,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.61 | bwd_microstep: 1448.87 | bwd_inner_microstep: 1448.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2016 [2024-06-11 05:19:15,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.92 | bwd_microstep: 902.20 | bwd_inner_microstep: 902.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3461 [2024-06-11 05:19:17,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1408.49 | bwd_inner_microstep: 1408.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3535 [2024-06-11 05:19:19,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.45 | bwd_microstep: 1456.44 | bwd_inner_microstep: 1456.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3666 [2024-06-11 05:19:21,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.24 | bwd_microstep: 1424.68 | bwd_inner_microstep: 1424.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3492 [2024-06-11 05:19:23,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.44 | bwd_microstep: 1434.29 | bwd_inner_microstep: 1434.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3523 [2024-06-11 05:19:25,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1555.96 | bwd_inner_microstep: 1555.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-11 05:19:27,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.29 | bwd_microstep: 1606.02 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3897 [2024-06-11 05:19:29,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.56 | bwd_microstep: 1588.34 | bwd_inner_microstep: 1588.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 05:19:31,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.37 | bwd_microstep: 1384.11 | bwd_inner_microstep: 1384.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3573 [2024-06-11 05:19:33,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.17 | bwd_microstep: 1301.77 | bwd_inner_microstep: 1301.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 05:19:35,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.19 | bwd_microstep: 1281.84 | bwd_inner_microstep: 1281.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3810 [2024-06-11 05:19:37,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.41 | bwd_microstep: 1752.77 | bwd_inner_microstep: 1752.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3829 [2024-06-11 05:19:39,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.90 | bwd_microstep: 1405.71 | bwd_inner_microstep: 1405.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 05:19:41,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.28 | bwd_microstep: 1556.97 | bwd_inner_microstep: 1556.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3513 [2024-06-11 05:19:43,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.25 | bwd_microstep: 1290.93 | bwd_inner_microstep: 1290.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 05:19:45,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.00 | bwd_microstep: 1658.65 | bwd_inner_microstep: 1658.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3586 [2024-06-11 05:19:48,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1533.71 | bwd_inner_microstep: 1533.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 05:19:49,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.80 | bwd_microstep: 1399.33 | bwd_inner_microstep: 1399.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 05:19:54,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.12 | optimizer_step: 6.59 [2024-06-11 05:19:54,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 3681.15 | bwd_inner_microstep: 1422.77 | bwd_allreduce_microstep: 2258.32 | step_microstep: 38.58 [2024-06-11 05:19:54,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16057.76 | bwd: 45326.08 | bwd_inner: 43066.77 | bwd_allreduce: 2258.61 | step: 40.63 {'loss': 1.1928, 'learning_rate': 2.7832932352322094e-07, 'epoch': 0.95} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3391 [2024-06-11 05:19:55,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.97 | bwd_microstep: 1238.28 | bwd_inner_microstep: 1238.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3878 [2024-06-11 05:19:57,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3901 [2024-06-11 05:20:00,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.17 | bwd_microstep: 1585.24 | bwd_inner_microstep: 1585.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 05:20:02,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.97 | bwd_microstep: 1481.63 | bwd_inner_microstep: 1481.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3786 [2024-06-11 05:20:04,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.67 | bwd_microstep: 1448.08 | bwd_inner_microstep: 1448.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3742 [2024-06-11 05:20:06,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.28 | bwd_microstep: 1633.63 | bwd_inner_microstep: 1633.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 05:20:08,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.67 | bwd_microstep: 1386.50 | bwd_inner_microstep: 1386.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 05:20:10,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.35 | bwd_microstep: 1387.35 | bwd_inner_microstep: 1387.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-11 05:20:12,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.78 | bwd_microstep: 1290.90 | bwd_inner_microstep: 1290.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 05:20:13,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.95 | bwd_microstep: 1287.78 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3680 [2024-06-11 05:20:15,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.84 | bwd_microstep: 1485.66 | bwd_inner_microstep: 1485.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 05:20:17,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.15 | bwd_microstep: 1257.36 | bwd_inner_microstep: 1257.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2114 [2024-06-11 05:20:18,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.75 | bwd_microstep: 922.93 | bwd_inner_microstep: 922.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2209 [2024-06-11 05:20:20,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 387.83 | bwd_microstep: 1055.66 | bwd_inner_microstep: 1055.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 05:20:22,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.62 | bwd_microstep: 1363.71 | bwd_inner_microstep: 1363.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3543 [2024-06-11 05:20:23,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.05 | bwd_microstep: 1201.77 | bwd_inner_microstep: 1201.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3836 [2024-06-11 05:20:26,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.43 | bwd_microstep: 1692.42 | bwd_inner_microstep: 1692.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-11 05:20:28,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.47 | bwd_microstep: 1523.39 | bwd_inner_microstep: 1523.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3827 [2024-06-11 05:20:30,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1391.28 | bwd_inner_microstep: 1391.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-11 05:20:32,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.42 | bwd_microstep: 1395.63 | bwd_inner_microstep: 1395.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3569 [2024-06-11 05:20:33,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.62 | bwd_microstep: 1236.08 | bwd_inner_microstep: 1236.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2297 [2024-06-11 05:20:35,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.94 | bwd_microstep: 983.74 | bwd_inner_microstep: 983.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3831 [2024-06-11 05:20:37,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.59 | bwd_microstep: 1663.43 | bwd_inner_microstep: 1663.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-11 05:20:39,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.91 | bwd_microstep: 1198.26 | bwd_inner_microstep: 1198.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 05:20:41,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.33 | bwd_microstep: 1409.46 | bwd_inner_microstep: 1409.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-11 05:20:42,845] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.72 | bwd_microstep: 1189.59 | bwd_inner_microstep: 1189.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3775 [2024-06-11 05:20:45,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.08 | bwd_microstep: 1574.38 | bwd_inner_microstep: 1574.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3812 [2024-06-11 05:20:47,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1514.84 | bwd_inner_microstep: 1514.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3444 [2024-06-11 05:20:49,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.59 | bwd_microstep: 1382.39 | bwd_inner_microstep: 1382.24 | bwd_allreduce_microstep: 0.11 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574 [2024-06-11 05:20:51,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.09 | bwd_microstep: 1496.46 | bwd_inner_microstep: 1496.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 05:20:53,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.63 | bwd_microstep: 1551.43 | bwd_inner_microstep: 1551.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-11 05:20:56,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.12 | optimizer_step: 6.59 [2024-06-11 05:20:56,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.08 | bwd_microstep: 2570.21 | bwd_inner_microstep: 824.55 | bwd_allreduce_microstep: 1745.60 | step_microstep: 39.07 [2024-06-11 05:20:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16279.92 | bwd: 45285.18 | bwd_inner: 43538.39 | bwd_allreduce: 1745.97 | step: 41.31 {'loss': 1.2094, 'learning_rate': 2.7212401296411675e-07, 'epoch': 0.95} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1887 [2024-06-11 05:20:57,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.03 | bwd_microstep: 866.87 | bwd_inner_microstep: 866.74 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3908 [2024-06-11 05:20:59,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.13 | bwd_microstep: 1485.89 | bwd_inner_microstep: 1485.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 05:21:01,292] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.21 | bwd_microstep: 1379.27 | bwd_inner_microstep: 1379.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 05:21:03,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 1382.40 | bwd_inner_microstep: 1382.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-11 05:21:04,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.80 | bwd_microstep: 679.05 | bwd_inner_microstep: 679.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-11 05:21:05,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.36 | bwd_microstep: 1276.28 | bwd_inner_microstep: 1276.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1913 [2024-06-11 05:21:06,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.44 | bwd_microstep: 686.90 | bwd_inner_microstep: 686.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3409 [2024-06-11 05:21:08,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.47 | bwd_microstep: 1181.83 | bwd_inner_microstep: 1181.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1377 [2024-06-11 05:21:09,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 214.58 | bwd_microstep: 556.40 | bwd_inner_microstep: 556.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3424 [2024-06-11 05:21:11,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1345.46 | bwd_inner_microstep: 1345.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3489 [2024-06-11 05:21:13,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1408.24 | bwd_inner_microstep: 1408.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2110 [2024-06-11 05:21:14,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.70 | bwd_microstep: 1017.77 | bwd_inner_microstep: 1017.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3465 [2024-06-11 05:21:16,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.43 | bwd_microstep: 1340.72 | bwd_inner_microstep: 1340.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3496 [2024-06-11 05:21:18,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.68 | bwd_microstep: 1553.88 | bwd_inner_microstep: 1553.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3665 [2024-06-11 05:21:20,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.06 | bwd_microstep: 1465.93 | bwd_inner_microstep: 1465.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-11 05:21:22,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.11 | bwd_microstep: 1284.17 | bwd_inner_microstep: 1284.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3587 [2024-06-11 05:21:24,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1311.99 | bwd_inner_microstep: 1311.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3828 [2024-06-11 05:21:26,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1390.74 | bwd_inner_microstep: 1390.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3472 [2024-06-11 05:21:27,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.11 | bwd_microstep: 1184.09 | bwd_inner_microstep: 1184.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2242 [2024-06-11 05:21:28,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.98 | bwd_microstep: 902.56 | bwd_inner_microstep: 902.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2924 [2024-06-11 05:21:30,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.97 | bwd_microstep: 1191.04 | bwd_inner_microstep: 1191.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 05:21:32,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.37 | bwd_microstep: 1399.07 | bwd_inner_microstep: 1399.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1384 [2024-06-11 05:21:33,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 204.14 | bwd_microstep: 527.25 | bwd_inner_microstep: 527.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3812 [2024-06-11 05:21:35,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1575.21 | bwd_inner_microstep: 1575.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3389 [2024-06-11 05:21:37,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.56 | bwd_microstep: 1366.01 | bwd_inner_microstep: 1365.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3817 [2024-06-11 05:21:39,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.94 | bwd_microstep: 1748.93 | bwd_inner_microstep: 1748.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 05:21:41,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.62 | bwd_microstep: 1314.85 | bwd_inner_microstep: 1314.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3770 [2024-06-11 05:21:43,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.55 | bwd_microstep: 1375.91 | bwd_inner_microstep: 1375.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762 [2024-06-11 05:21:45,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.77 | bwd_microstep: 1470.56 | bwd_inner_microstep: 1470.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3561 [2024-06-11 05:21:47,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.36 | bwd_microstep: 1381.85 | bwd_inner_microstep: 1381.41 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.36 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3583 [2024-06-11 05:21:49,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.52 | bwd_microstep: 1305.31 | bwd_inner_microstep: 1305.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-11 05:21:57,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.28 | optimizer_step: 6.62 [2024-06-11 05:21:57,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.08 | bwd_microstep: 7424.96 | bwd_inner_microstep: 1804.23 | bwd_allreduce_microstep: 5620.66 | step_microstep: 42.50 [2024-06-11 05:21:57,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15030.20 | bwd: 45781.43 | bwd_inner: 40159.40 | bwd_allreduce: 5621.18 | step: 44.77 ▍| 1634/1726 [28:39:27<1:37:20, 63.49s/it] 95%|█████████▍| 1635/1726 [28:40:27<1:35:03, 62.68s/it] 95%|█████████▍| 1635/1726 [28:40:27<1:35:03, 62.68s/it] 95%|█████████▍| 1636/1726 [28:41:29<1:33:21, 62.24s/it] 95%|█████████▍| 1636/1726 [28:41:29<1:33:21, 62.24s/it] 95%|█████████▍| 1637/1726 [28:42:30<1:32:06, 62.09s/it] 95%|█████████▍| 1637/1726 [28:42:30<1:32:06, 62.09s/it] 95%|█████████▍| 1638/1726 [28:43:32<1:31:00, 62.05s/it] 95%|█████████▍| 1638/1726 [28:43:32<1:31:00, 62.05s/it] 95%|█████████▍| 1639/1726 [28:44:34<1:29:35, 61.79s/it] {'loss': 1.1314, 'learning_rate': 2.6598818388168246e-07, 'epoch': 0.95} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-11 05:21:59,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.82 | bwd_microstep: 1481.57 | bwd_inner_microstep: 1481.44 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2005 [2024-06-11 05:22:00,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.90 | bwd_microstep: 798.22 | bwd_inner_microstep: 798.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 05:22:02,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.18 | bwd_microstep: 1343.61 | bwd_inner_microstep: 1343.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3792 [2024-06-11 05:22:04,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1547.97 | bwd_inner_microstep: 1547.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850 [2024-06-11 05:22:06,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1459.19 | bwd_inner_microstep: 1459.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 05:22:08,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.54 | bwd_microstep: 1278.54 | bwd_inner_microstep: 1278.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 05:22:09,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.06 | bwd_microstep: 1247.21 | bwd_inner_microstep: 1247.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 05:22:11,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.19 | bwd_microstep: 1245.53 | bwd_inner_microstep: 1245.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3767 [2024-06-11 05:22:13,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.92 | bwd_microstep: 1437.93 | bwd_inner_microstep: 1437.62 | bwd_allreduce_microstep: 0.21 | step_microstep: 0.33 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3430 [2024-06-11 05:22:15,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.92 | bwd_microstep: 1187.40 | bwd_inner_microstep: 1187.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 05:22:17,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1287.53 | bwd_inner_microstep: 1287.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 05:22:19,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1385.76 | bwd_inner_microstep: 1385.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3484 [2024-06-11 05:22:21,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.30 | bwd_microstep: 1508.84 | bwd_inner_microstep: 1508.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3449 [2024-06-11 05:22:22,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.49 | bwd_microstep: 1317.57 | bwd_inner_microstep: 1317.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3656 [2024-06-11 05:22:24,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.94 | bwd_microstep: 1423.21 | bwd_inner_microstep: 1423.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3490 [2024-06-11 05:22:26,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.15 | bwd_microstep: 1285.97 | bwd_inner_microstep: 1285.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436 [2024-06-11 05:22:28,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 445.32 | bwd_microstep: 1167.22 | bwd_inner_microstep: 1167.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3679 [2024-06-11 05:22:30,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.24 | bwd_microstep: 1356.48 | bwd_inner_microstep: 1356.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 05:22:32,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.15 | bwd_microstep: 1377.35 | bwd_inner_microstep: 1377.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3436 [2024-06-11 05:22:33,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.06 | bwd_microstep: 1156.16 | bwd_inner_microstep: 1156.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3452 [2024-06-11 05:22:35,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.86 | bwd_microstep: 1160.59 | bwd_inner_microstep: 1160.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 05:22:37,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.35 | bwd_microstep: 1661.95 | bwd_inner_microstep: 1661.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3820 [2024-06-11 05:22:39,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.14 | bwd_microstep: 1658.99 | bwd_inner_microstep: 1658.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-11 05:22:41,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.31 | bwd_microstep: 978.77 | bwd_inner_microstep: 978.50 | bwd_allreduce_microstep: 0.15 | step_microstep: 0.18 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3670 [2024-06-11 05:22:43,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.52 | bwd_microstep: 1660.41 | bwd_inner_microstep: 1660.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3593 [2024-06-11 05:22:45,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.21 | bwd_microstep: 1345.14 | bwd_inner_microstep: 1345.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2292 [2024-06-11 05:22:46,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.32 | bwd_microstep: 882.20 | bwd_inner_microstep: 882.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3491 [2024-06-11 05:22:48,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1477.67 | bwd_inner_microstep: 1477.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 05:22:50,566] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.00 | bwd_microstep: 1384.78 | bwd_inner_microstep: 1384.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3807 [2024-06-11 05:22:52,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1449.36 | bwd_inner_microstep: 1449.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434 [2024-06-11 05:22:54,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.75 | bwd_microstep: 1313.18 | bwd_inner_microstep: 1313.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3729 [2024-06-11 05:22:58,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.36 | optimizer_step: 6.63 [2024-06-11 05:22:58,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.05 | bwd_microstep: 3224.37 | bwd_inner_microstep: 1747.51 | bwd_allreduce_microstep: 1476.79 | step_microstep: 39.59 [2024-06-11 05:22:58,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16059.50 | bwd: 44490.75 | bwd_inner: 43012.35 | bwd_allreduce: 1477.44 | step: 42.03 {'loss': 1.1699, 'learning_rate': 2.5992185788627834e-07, 'epoch': 0.95} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3423 [2024-06-11 05:23:00,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.72 | bwd_microstep: 1442.40 | bwd_inner_microstep: 1442.29 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.15 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945 [2024-06-11 05:23:01,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.24 | bwd_microstep: 697.95 | bwd_inner_microstep: 697.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3860 [2024-06-11 05:23:03,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.12 | bwd_microstep: 1458.63 | bwd_inner_microstep: 1458.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3775 [2024-06-11 05:23:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1438.30 | bwd_inner_microstep: 1438.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3782 [2024-06-11 05:23:07,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.63 | bwd_microstep: 1350.85 | bwd_inner_microstep: 1350.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 05:23:08,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.71 | bwd_microstep: 1299.68 | bwd_inner_microstep: 1299.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3518 [2024-06-11 05:23:10,581] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.11 | bwd_microstep: 1224.51 | bwd_inner_microstep: 1224.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3417 [2024-06-11 05:23:12,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.07 | bwd_microstep: 1345.42 | bwd_inner_microstep: 1345.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3862 [2024-06-11 05:23:14,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.41 | bwd_microstep: 1463.53 | bwd_inner_microstep: 1463.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3512 [2024-06-11 05:23:16,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.22 | bwd_microstep: 1192.13 | bwd_inner_microstep: 1192.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2901 [2024-06-11 05:23:17,633] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.47 | bwd_microstep: 1092.02 | bwd_inner_microstep: 1091.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3487 [2024-06-11 05:23:19,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.30 | bwd_microstep: 1322.33 | bwd_inner_microstep: 1322.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3983 [2024-06-11 05:23:22,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 695.22 | bwd_microstep: 1904.13 | bwd_inner_microstep: 1904.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1911 [2024-06-11 05:23:23,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.28 | bwd_microstep: 686.79 | bwd_inner_microstep: 686.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3644 [2024-06-11 05:23:25,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.81 | bwd_microstep: 1815.78 | bwd_inner_microstep: 1815.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 05:23:27,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.86 | bwd_microstep: 1245.86 | bwd_inner_microstep: 1245.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3423 [2024-06-11 05:23:28,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.26 | bwd_microstep: 1151.97 | bwd_inner_microstep: 1151.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 05:23:30,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.74 | bwd_microstep: 1284.50 | bwd_inner_microstep: 1284.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3819 [2024-06-11 05:23:32,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1546.28 | bwd_inner_microstep: 1546.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3717 [2024-06-11 05:23:34,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.12 | bwd_microstep: 1366.71 | bwd_inner_microstep: 1366.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3539 [2024-06-11 05:23:36,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.88 | bwd_microstep: 1467.16 | bwd_inner_microstep: 1466.95 | bwd_allreduce_microstep: 0.13 | step_microstep: 0.25 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3610 [2024-06-11 05:23:38,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.76 | bwd_microstep: 1407.83 | bwd_inner_microstep: 1407.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 05:23:40,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.49 | bwd_microstep: 1258.04 | bwd_inner_microstep: 1258.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3605 [2024-06-11 05:23:42,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.96 | bwd_microstep: 1413.87 | bwd_inner_microstep: 1413.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1999 [2024-06-11 05:23:43,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.43 | bwd_microstep: 741.36 | bwd_inner_microstep: 741.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2286 [2024-06-11 05:23:44,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.20 | bwd_microstep: 975.10 | bwd_inner_microstep: 975.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 05:23:46,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.53 | bwd_microstep: 1277.71 | bwd_inner_microstep: 1277.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 05:23:48,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.76 | bwd_microstep: 1482.03 | bwd_inner_microstep: 1482.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1986 [2024-06-11 05:23:49,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.01 | bwd_microstep: 706.67 | bwd_inner_microstep: 706.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-11 05:23:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.94 | bwd_microstep: 1498.55 | bwd_inner_microstep: 1498.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3589 [2024-06-11 05:23:53,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.13 | bwd_microstep: 1506.37 | bwd_inner_microstep: 1506.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-11 05:23:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.57 [2024-06-11 05:23:58,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.83 | bwd_microstep: 3877.76 | bwd_inner_microstep: 2098.77 | bwd_allreduce_microstep: 1778.91 | step_microstep: 40.17 [2024-06-11 05:23:58,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15662.07 | bwd: 43942.24 | bwd_inner: 42162.11 | bwd_allreduce: 1779.37 | step: 42.05 {'loss': 1.161, 'learning_rate': 2.539250563434736e-07, 'epoch': 0.95} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-11 05:24:00,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.39 | bwd_microstep: 1335.55 | bwd_inner_microstep: 1335.45 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 05:24:01,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.53 | bwd_microstep: 1250.46 | bwd_inner_microstep: 1250.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3888 [2024-06-11 05:24:03,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1584.53 | bwd_inner_microstep: 1584.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3744 [2024-06-11 05:24:06,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.07 | bwd_microstep: 1631.90 | bwd_inner_microstep: 1631.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1939 [2024-06-11 05:24:07,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.84 | bwd_microstep: 822.97 | bwd_inner_microstep: 822.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3492 [2024-06-11 05:24:09,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.10 | bwd_microstep: 1221.52 | bwd_inner_microstep: 1221.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 05:24:10,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.57 | bwd_microstep: 1151.23 | bwd_inner_microstep: 1151.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1894 [2024-06-11 05:24:11,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.54 | bwd_microstep: 684.09 | bwd_inner_microstep: 684.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3421 [2024-06-11 05:24:13,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.60 | bwd_microstep: 1153.85 | bwd_inner_microstep: 1153.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3696 [2024-06-11 05:24:15,313] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1529.27 | bwd_inner_microstep: 1529.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3499 [2024-06-11 05:24:17,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.72 | bwd_microstep: 1623.46 | bwd_inner_microstep: 1623.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659 [2024-06-11 05:24:19,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.86 | bwd_microstep: 1715.38 | bwd_inner_microstep: 1715.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2068 [2024-06-11 05:24:21,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.50 | bwd_microstep: 914.81 | bwd_inner_microstep: 914.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3569 [2024-06-11 05:24:22,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.03 | bwd_microstep: 1209.01 | bwd_inner_microstep: 1208.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3490 [2024-06-11 05:24:24,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.60 | bwd_microstep: 1221.44 | bwd_inner_microstep: 1221.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3633 [2024-06-11 05:24:26,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.61 | bwd_microstep: 1408.94 | bwd_inner_microstep: 1408.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 05:24:28,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.42 | bwd_microstep: 1660.06 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3511 [2024-06-11 05:24:30,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1350.26 | bwd_inner_microstep: 1350.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 05:24:32,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.54 | bwd_microstep: 1656.08 | bwd_inner_microstep: 1656.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.31 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3822 [2024-06-11 05:24:34,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.17 | bwd_microstep: 1263.09 | bwd_inner_microstep: 1263.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 05:24:36,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.12 | bwd_microstep: 1352.61 | bwd_inner_microstep: 1352.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3778 [2024-06-11 05:24:38,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1456.28 | bwd_inner_microstep: 1456.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3601 [2024-06-11 05:24:40,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.10 | bwd_microstep: 1554.93 | bwd_inner_microstep: 1554.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 05:24:42,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.89 | bwd_microstep: 1383.74 | bwd_inner_microstep: 1383.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1409 [2024-06-11 05:24:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 207.98 | bwd_microstep: 532.00 | bwd_inner_microstep: 531.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 05:24:45,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.84 | bwd_microstep: 1404.32 | bwd_inner_microstep: 1404.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 05:24:47,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.28 | bwd_microstep: 1406.72 | bwd_inner_microstep: 1406.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3430 [2024-06-11 05:24:49,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.88 | bwd_microstep: 1473.47 | bwd_inner_microstep: 1473.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650 [2024-06-11 05:24:51,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.66 | bwd_microstep: 1585.15 | bwd_inner_microstep: 1585.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3592 [2024-06-11 05:24:53,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.77 | bwd_microstep: 1705.11 | bwd_inner_microstep: 1705.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 05:24:55,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.07 | bwd_microstep: 1340.38 | bwd_inner_microstep: 1340.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3389 [2024-06-11 05:25:02,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.26 | optimizer_step: 6.64 [2024-06-11 05:25:02,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.78 | bwd_microstep: 6105.02 | bwd_inner_microstep: 1444.64 | bwd_allreduce_microstep: 4660.30 | step_microstep: 40.52 [2024-06-11 05:25:02,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16026.92 | bwd: 47687.72 | bwd_inner: 43026.36 | bwd_allreduce: 4660.60 | step: 42.70 {'loss': 1.1667, 'learning_rate': 2.479978003739669e-07, 'epoch': 0.95} dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3429 [2024-06-11 05:25:04,220] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.36 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.63 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.15 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2459 [2024-06-11 05:25:05,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.59 | bwd_microstep: 919.74 | bwd_inner_microstep: 919.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 05:25:07,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.45 | bwd_microstep: 1271.99 | bwd_inner_microstep: 1271.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1928 [2024-06-11 05:25:08,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.27 | bwd_microstep: 786.28 | bwd_inner_microstep: 786.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755 [2024-06-11 05:25:10,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.41 | bwd_microstep: 1635.33 | bwd_inner_microstep: 1635.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 05:25:12,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.00 | bwd_microstep: 1286.19 | bwd_inner_microstep: 1286.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 05:25:14,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1387.82 | bwd_inner_microstep: 1387.41 | bwd_allreduce_microstep: 0.20 | step_microstep: 0.39 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3748 [2024-06-11 05:25:16,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.94 | bwd_microstep: 1540.75 | bwd_inner_microstep: 1540.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 05:25:18,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.18 | bwd_microstep: 1391.47 | bwd_inner_microstep: 1391.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3500 [2024-06-11 05:25:20,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.26 | bwd_microstep: 1317.76 | bwd_inner_microstep: 1317.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1945 [2024-06-11 05:25:21,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 329.19 | bwd_microstep: 885.92 | bwd_inner_microstep: 885.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3617 [2024-06-11 05:25:23,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.43 | bwd_microstep: 1316.09 | bwd_inner_microstep: 1316.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.17 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2065 [2024-06-11 05:25:24,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.31 | bwd_microstep: 824.28 | bwd_inner_microstep: 824.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1982 [2024-06-11 05:25:25,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.13 | bwd_microstep: 735.15 | bwd_inner_microstep: 735.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1963 [2024-06-11 05:25:26,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.13 | bwd_microstep: 826.08 | bwd_inner_microstep: 826.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 05:25:28,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.74 | bwd_microstep: 1376.71 | bwd_inner_microstep: 1376.44 | bwd_allreduce_microstep: 0.14 | step_microstep: 0.22 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2070 [2024-06-11 05:25:29,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.09 | bwd_microstep: 918.54 | bwd_inner_microstep: 918.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3523 [2024-06-11 05:25:31,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.58 | bwd_microstep: 1294.45 | bwd_inner_microstep: 1294.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3501 [2024-06-11 05:25:33,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.44 | bwd_microstep: 1322.18 | bwd_inner_microstep: 1322.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3545 [2024-06-11 05:25:35,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.78 | bwd_microstep: 1200.90 | bwd_inner_microstep: 1200.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971 [2024-06-11 05:25:36,025] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.95 | bwd_microstep: 704.64 | bwd_inner_microstep: 704.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 05:25:38,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.23 | bwd_microstep: 1654.85 | bwd_inner_microstep: 1654.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 05:25:40,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.33 | bwd_microstep: 1549.85 | bwd_inner_microstep: 1549.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 05:25:42,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1511.70 | bwd_inner_microstep: 1511.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-11 05:25:44,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.63 | bwd_microstep: 1390.02 | bwd_inner_microstep: 1389.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3523 [2024-06-11 05:25:46,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.20 | bwd_microstep: 1489.06 | bwd_inner_microstep: 1489.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3730 [2024-06-11 05:25:48,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.03 | bwd_microstep: 1636.44 | bwd_inner_microstep: 1636.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3810 [2024-06-11 05:25:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.06 | bwd_microstep: 1500.47 | bwd_inner_microstep: 1500.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2192 [2024-06-11 05:25:52,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.14 | bwd_microstep: 957.02 | bwd_inner_microstep: 956.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 05:25:54,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.90 | bwd_microstep: 1393.85 | bwd_inner_microstep: 1393.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3030 [2024-06-11 05:25:55,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.18 | bwd_microstep: 1329.03 | bwd_inner_microstep: 1329.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3860 [2024-06-11 05:26:05,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.00 | optimizer_gradients: 4.27 | optimizer_step: 6.64 [2024-06-11 05:26:05,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.57 | bwd_microstep: 8555.43 | bwd_inner_microstep: 1705.44 | bwd_allreduce_microstep: 6849.91 | step_microstep: 40.38 [2024-06-11 05:26:05,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15114.87 | bwd: 47294.88 | bwd_inner: 40443.34 | bwd_allreduce: 6850.56 | step: 42.92 {'loss': 1.234, 'learning_rate': 2.4214011085352815e-07, 'epoch': 0.95} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3466 [2024-06-11 05:26:07,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.81 | bwd_microstep: 1464.56 | bwd_inner_microstep: 1464.41 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3907 [2024-06-11 05:26:09,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.06 | bwd_microstep: 1482.51 | bwd_inner_microstep: 1482.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3427 [2024-06-11 05:26:11,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.89 | bwd_microstep: 1443.63 | bwd_inner_microstep: 1443.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1233 [2024-06-11 05:26:11,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 185.29 | bwd_microstep: 483.06 | bwd_inner_microstep: 483.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3778 [2024-06-11 05:26:13,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.97 | bwd_microstep: 1544.86 | bwd_inner_microstep: 1544.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764 [2024-06-11 05:26:16,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.52 | bwd_microstep: 1636.10 | bwd_inner_microstep: 1635.96 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.21 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3419 [2024-06-11 05:26:17,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.78 | bwd_microstep: 1180.79 | bwd_inner_microstep: 1180.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3440 [2024-06-11 05:26:19,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.14 | bwd_microstep: 1214.54 | bwd_inner_microstep: 1214.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-11 05:26:21,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.65 | bwd_microstep: 1219.31 | bwd_inner_microstep: 1219.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2181 [2024-06-11 05:26:22,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.07 | bwd_microstep: 856.72 | bwd_inner_microstep: 856.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-11 05:26:24,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.01 | bwd_microstep: 1315.28 | bwd_inner_microstep: 1315.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3492 [2024-06-11 05:26:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.81 | bwd_microstep: 1482.02 | bwd_inner_microstep: 1481.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-11 05:26:28,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.87 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3663 [2024-06-11 05:26:30,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.99 | bwd_microstep: 1443.29 | bwd_inner_microstep: 1443.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3656 [2024-06-11 05:26:32,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.11 | bwd_microstep: 1515.07 | bwd_inner_microstep: 1515.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686 [2024-06-11 05:26:34,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.16 | bwd_microstep: 1512.44 | bwd_inner_microstep: 1512.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3509 [2024-06-11 05:26:36,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.68 | bwd_microstep: 1443.05 | bwd_inner_microstep: 1443.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3472 [2024-06-11 05:26:38,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.04 | bwd_microstep: 1505.80 | bwd_inner_microstep: 1505.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-11 05:26:40,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.68 | bwd_microstep: 1617.05 | bwd_inner_microstep: 1616.92 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.28 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3465 [2024-06-11 05:26:42,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.76 | bwd_microstep: 1436.79 | bwd_inner_microstep: 1436.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3414 [2024-06-11 05:26:44,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.00 | bwd_microstep: 1280.65 | bwd_inner_microstep: 1280.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3431 [2024-06-11 05:26:46,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.05 | bwd_microstep: 1347.21 | bwd_inner_microstep: 1347.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3843 [2024-06-11 05:26:48,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.37 | bwd_microstep: 1698.12 | bwd_inner_microstep: 1698.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3587 [2024-06-11 05:27:24,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.99 | bwd_microstep: 1426.91 | bwd_inner_microstep: 1426.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3652 [2024-06-11 05:27:26,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.84 | bwd_microstep: 1513.42 | bwd_inner_microstep: 1513.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3811 [2024-06-11 05:27:28,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.29 | bwd_microstep: 1589.96 | bwd_inner_microstep: 1589.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 05:27:31,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.38 | bwd_microstep: 1551.47 | bwd_inner_microstep: 1551.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 05:27:33,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.22 | bwd_microstep: 1549.35 | bwd_inner_microstep: 1549.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-11 05:27:35,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.81 | bwd_microstep: 1282.59 | bwd_inner_microstep: 1282.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 05:27:37,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.24 | bwd_microstep: 1548.29 | bwd_inner_microstep: 1548.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3759 [2024-06-11 05:27:39,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.31 | bwd_microstep: 1447.18 | bwd_inner_microstep: 1447.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3573 [2024-06-11 05:27:48,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 05:27:48,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.28 | bwd_microstep: 8621.18 | bwd_inner_microstep: 1575.27 | bwd_allreduce_microstep: 7045.85 | step_microstep: 38.93 [2024-06-11 05:27:48,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16739.67 | bwd: 51998.00 | bwd_inner: 44950.88 | bwd_allreduce: 7046.32 | step: 41.27 95%|█████████▍| 1639/1726 [28:44:34<1:29:35, 61.79s/it] 95%|█████████▌| 1640/1726 [28:45:34<1:28:11, 61.53s/it] 95%|█████████▌| 1640/1726 [28:45:34<1:28:11, 61.53s/it] 95%|█████████▌| 1641/1726 [28:46:34<1:26:30, 61.06s/it] 95%|█████████▌| 1641/1726 [28:46:34<1:26:30, 61.06s/it] 95%|█████████▌| 1642/1726 [28:47:39<1:26:45, 61.97s/it] 95%|█████████▌| 1642/1726 [28:47:39<1:26:45, 61.97s/it] 95%|█████████▌| 1643/1726 [28:48:41<1:26:04, 62.22s/it] 95%|█████████▌| 1643/1726 [28:48:41<1:26:04, 62.22s/it] 95%|█████████▌| 1644/1726 [28:50:25<1:41:51, 74.53s/it] {'loss': 1.1751, 'learning_rate': 2.3635200841290784e-07, 'epoch': 0.95} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2060 [2024-06-11 05:28:27,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.28 | bwd_microstep: 866.00 | bwd_inner_microstep: 865.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.20 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1344 [2024-06-11 05:28:28,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 197.24 | bwd_microstep: 515.18 | bwd_inner_microstep: 515.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3872 [2024-06-11 05:28:30,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.01 | bwd_microstep: 1482.94 | bwd_inner_microstep: 1482.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-11 05:28:32,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1276.48 | bwd_inner_microstep: 1276.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755 [2024-06-11 05:28:34,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.94 | bwd_microstep: 1630.06 | bwd_inner_microstep: 1630.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 05:28:36,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.72 | bwd_microstep: 1243.60 | bwd_inner_microstep: 1243.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 05:28:37,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.58 | bwd_microstep: 1346.28 | bwd_inner_microstep: 1346.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1880 [2024-06-11 05:28:38,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.80 | bwd_microstep: 708.92 | bwd_inner_microstep: 708.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3766 [2024-06-11 05:28:40,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.37 | bwd_microstep: 1305.89 | bwd_inner_microstep: 1305.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 05:28:42,523] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.59 | bwd_microstep: 1245.52 | bwd_inner_microstep: 1245.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2003 [2024-06-11 05:28:43,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.51 | bwd_microstep: 800.33 | bwd_inner_microstep: 800.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3534 [2024-06-11 05:28:45,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.08 | bwd_microstep: 1294.94 | bwd_inner_microstep: 1294.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1984 [2024-06-11 05:28:46,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.79 | bwd_microstep: 734.88 | bwd_inner_microstep: 734.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3452 [2024-06-11 05:28:48,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1316.77 | bwd_inner_microstep: 1316.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3655 [2024-06-11 05:28:50,491] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.34 | bwd_microstep: 1611.68 | bwd_inner_microstep: 1611.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3679 [2024-06-11 05:28:52,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.94 | bwd_microstep: 1618.17 | bwd_inner_microstep: 1618.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3643 [2024-06-11 05:28:54,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1406.96 | bwd_inner_microstep: 1406.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 05:28:56,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.87 | bwd_microstep: 1341.71 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3831 [2024-06-11 05:28:58,400] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.49 | bwd_microstep: 1360.85 | bwd_inner_microstep: 1360.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3495 [2024-06-11 05:29:00,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.95 | bwd_microstep: 1319.55 | bwd_inner_microstep: 1319.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3605 [2024-06-11 05:29:02,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 1569.71 | bwd_inner_microstep: 1569.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3687 [2024-06-11 05:29:04,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.94 | bwd_microstep: 1491.10 | bwd_inner_microstep: 1490.65 | bwd_allreduce_microstep: 0.24 | step_microstep: 0.29 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3378 [2024-06-11 05:29:06,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.57 | bwd_microstep: 1241.04 | bwd_inner_microstep: 1241.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2024 [2024-06-11 05:29:07,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.60 | bwd_microstep: 841.04 | bwd_inner_microstep: 841.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3445 [2024-06-11 05:29:08,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.75 | bwd_microstep: 1158.33 | bwd_inner_microstep: 1158.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 05:29:10,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.11 | bwd_microstep: 1349.19 | bwd_inner_microstep: 1349.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2231 [2024-06-11 05:29:12,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.52 | bwd_microstep: 867.94 | bwd_inner_microstep: 867.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006 [2024-06-11 05:29:13,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.61 | bwd_microstep: 833.83 | bwd_inner_microstep: 833.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 05:29:15,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.56 | bwd_microstep: 1397.74 | bwd_inner_microstep: 1397.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 05:29:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.00 | bwd_microstep: 1651.13 | bwd_inner_microstep: 1651.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3731 [2024-06-11 05:29:19,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.26 | bwd_microstep: 1535.12 | bwd_inner_microstep: 1535.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 05:29:25,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.28 | optimizer_step: 6.60 [2024-06-11 05:29:25,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 5413.65 | bwd_inner_microstep: 1575.70 | bwd_allreduce_microstep: 3837.87 | step_microstep: 39.94 [2024-06-11 05:29:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14936.01 | bwd: 43776.59 | bwd_inner: 39937.28 | bwd_allreduce: 3838.39 | step: 42.09 {'loss': 1.183, 'learning_rate': 2.3063351343777241e-07, 'epoch': 0.95} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3409 [2024-06-11 05:29:27,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.79 | bwd_microstep: 1336.42 | bwd_inner_microstep: 1336.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3925 [2024-06-11 05:29:29,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.19 | bwd_microstep: 1589.56 | bwd_inner_microstep: 1589.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3410 [2024-06-11 05:29:31,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.04 | bwd_microstep: 1326.03 | bwd_inner_microstep: 1326.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 05:29:33,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.23 | bwd_microstep: 1247.06 | bwd_inner_microstep: 1247.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2613 [2024-06-11 05:29:34,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 398.20 | bwd_microstep: 1063.35 | bwd_inner_microstep: 1063.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 05:29:36,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1297.10 | bwd_inner_microstep: 1297.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3448 [2024-06-11 05:29:38,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.30 | bwd_microstep: 1252.49 | bwd_inner_microstep: 1252.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3403 [2024-06-11 05:29:39,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.07 | bwd_microstep: 1147.32 | bwd_inner_microstep: 1147.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-11 05:29:41,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.10 | bwd_microstep: 1307.40 | bwd_inner_microstep: 1307.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 05:29:43,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.04 | bwd_microstep: 1376.95 | bwd_inner_microstep: 1376.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3406 [2024-06-11 05:29:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.78 | bwd_microstep: 1507.31 | bwd_inner_microstep: 1507.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2356 [2024-06-11 05:29:46,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.71 | bwd_microstep: 950.85 | bwd_inner_microstep: 950.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3375 [2024-06-11 05:29:48,519] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1239.71 | bwd_inner_microstep: 1239.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3450 [2024-06-11 05:29:50,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.46 | bwd_microstep: 1444.12 | bwd_inner_microstep: 1444.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 05:29:52,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.28 | bwd_microstep: 1375.90 | bwd_inner_microstep: 1375.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3426 [2024-06-11 05:29:54,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.18 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 05:29:56,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.52 | bwd_microstep: 1387.16 | bwd_inner_microstep: 1387.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3626 [2024-06-11 05:29:58,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.55 | bwd_microstep: 1521.25 | bwd_inner_microstep: 1521.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2073 [2024-06-11 05:29:59,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.21 | bwd_microstep: 917.09 | bwd_inner_microstep: 917.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 05:30:01,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.18 | bwd_microstep: 1514.81 | bwd_inner_microstep: 1514.64 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.19 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 05:30:03,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.32 | bwd_microstep: 1389.95 | bwd_inner_microstep: 1389.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3452 [2024-06-11 05:30:05,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.64 | bwd_microstep: 1192.31 | bwd_inner_microstep: 1192.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3513 [2024-06-11 05:30:07,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1591.22 | bwd_inner_microstep: 1591.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2276 [2024-06-11 05:30:08,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.68 | bwd_microstep: 788.77 | bwd_inner_microstep: 788.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-11 05:30:10,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.61 | bwd_microstep: 1454.18 | bwd_inner_microstep: 1454.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 05:30:12,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.58 | bwd_microstep: 1349.47 | bwd_inner_microstep: 1349.02 | bwd_allreduce_microstep: 0.22 | step_microstep: 0.35 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3603 [2024-06-11 05:30:14,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.01 | bwd_microstep: 1537.81 | bwd_inner_microstep: 1537.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3829 [2024-06-11 05:30:16,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.58 | bwd_microstep: 1588.19 | bwd_inner_microstep: 1588.09 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.18 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3576 [2024-06-11 05:30:18,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.43 | bwd_microstep: 1464.16 | bwd_inner_microstep: 1464.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-11 05:30:20,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.03 | bwd_microstep: 1546.35 | bwd_inner_microstep: 1546.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3461 [2024-06-11 05:30:22,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.47 | bwd_microstep: 1439.47 | bwd_inner_microstep: 1439.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-11 05:30:27,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.63 [2024-06-11 05:30:27,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.21 | bwd_microstep: 4293.02 | bwd_inner_microstep: 1741.85 | bwd_allreduce_microstep: 2551.09 | step_microstep: 40.86 [2024-06-11 05:30:27,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16094.02 | bwd: 45722.91 | bwd_inner: 43170.26 | bwd_allreduce: 2551.77 | step: 43.19 {'loss': 1.2201, 'learning_rate': 2.2498464606863334e-07, 'epoch': 0.95} dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2640 [2024-06-11 05:30:29,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 412.93 | bwd_microstep: 1103.30 | bwd_inner_microstep: 1103.15 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3462 [2024-06-11 05:30:30,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.12 | bwd_microstep: 1274.60 | bwd_inner_microstep: 1274.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3808 [2024-06-11 05:30:33,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.38 | bwd_microstep: 1598.00 | bwd_inner_microstep: 1597.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-11 05:30:35,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.41 | bwd_microstep: 1498.79 | bwd_inner_microstep: 1498.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 05:30:36,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.75 | bwd_microstep: 1251.55 | bwd_inner_microstep: 1251.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 05:30:38,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.46 | bwd_microstep: 1250.72 | bwd_inner_microstep: 1250.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-11 05:30:40,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.63 | bwd_microstep: 1429.63 | bwd_inner_microstep: 1429.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 05:30:42,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.80 | bwd_microstep: 1382.80 | bwd_inner_microstep: 1382.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3560 [2024-06-11 05:30:44,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.39 | bwd_microstep: 1350.03 | bwd_inner_microstep: 1350.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-11 05:30:46,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.00 | bwd_microstep: 1474.84 | bwd_inner_microstep: 1474.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3673 [2024-06-11 05:30:48,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.04 | bwd_microstep: 1610.17 | bwd_inner_microstep: 1610.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3569 [2024-06-11 05:30:50,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.61 | bwd_microstep: 1496.76 | bwd_inner_microstep: 1496.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-11 05:30:52,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.12 | bwd_microstep: 1419.93 | bwd_inner_microstep: 1419.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3494 [2024-06-11 05:30:54,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.88 | bwd_microstep: 1579.57 | bwd_inner_microstep: 1579.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982 [2024-06-11 05:30:56,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.85 | bwd_microstep: 891.34 | bwd_inner_microstep: 891.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3678 [2024-06-11 05:30:58,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.66 | bwd_microstep: 1422.72 | bwd_inner_microstep: 1422.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 05:31:00,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.33 | bwd_microstep: 1411.05 | bwd_inner_microstep: 1411.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3831 [2024-06-11 05:31:02,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.30 | bwd_microstep: 1757.65 | bwd_inner_microstep: 1757.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3567 [2024-06-11 05:31:05,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.89 | bwd_microstep: 2422.99 | bwd_inner_microstep: 2422.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3514 [2024-06-11 05:31:07,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.58 | bwd_microstep: 1289.44 | bwd_inner_microstep: 1289.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3822 [2024-06-11 05:31:09,180] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.58 | bwd_microstep: 1415.12 | bwd_inner_microstep: 1415.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-11 05:31:10,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.19 | bwd_microstep: 1293.04 | bwd_inner_microstep: 1293.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-11 05:31:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.04 | bwd_microstep: 1180.73 | bwd_inner_microstep: 1180.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3543 [2024-06-11 05:31:14,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.64 | bwd_microstep: 1230.53 | bwd_inner_microstep: 1230.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3465 [2024-06-11 05:31:16,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.80 | bwd_microstep: 1374.67 | bwd_inner_microstep: 1374.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 05:31:18,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1551.25 | bwd_inner_microstep: 1551.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2045 [2024-06-11 05:31:19,377] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.80 | bwd_microstep: 716.69 | bwd_inner_microstep: 716.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-11 05:31:21,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.94 | bwd_microstep: 1345.53 | bwd_inner_microstep: 1345.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3775 [2024-06-11 05:31:23,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.73 | bwd_microstep: 1745.39 | bwd_inner_microstep: 1745.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3540 [2024-06-11 05:31:25,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.39 | bwd_microstep: 1435.82 | bwd_inner_microstep: 1435.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2406 [2024-06-11 05:31:27,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.10 | bwd_microstep: 1064.45 | bwd_inner_microstep: 1064.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3583 [2024-06-11 05:31:29,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-11 05:31:29,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1498.67 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 8.23 | step_microstep: 38.80 [2024-06-11 05:31:29,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16360.83 | bwd: 44767.81 | bwd_inner: 44758.57 | bwd_allreduce: 8.51 | step: 40.92 {'loss': 1.1501, 'learning_rate': 2.1940542620076723e-07, 'epoch': 0.95} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 05:31:31,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.10 | bwd_microstep: 1343.51 | bwd_inner_microstep: 1343.44 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3454 [2024-06-11 05:31:32,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.49 | bwd_microstep: 1414.91 | bwd_inner_microstep: 1414.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1343 [2024-06-11 05:31:33,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 208.84 | bwd_microstep: 544.80 | bwd_inner_microstep: 544.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 05:31:35,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.64 | bwd_microstep: 1271.05 | bwd_inner_microstep: 1271.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2039 [2024-06-11 05:31:36,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.60 | bwd_microstep: 808.88 | bwd_inner_microstep: 808.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-11 05:31:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.98 | bwd_microstep: 1477.82 | bwd_inner_microstep: 1477.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 05:31:40,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.36 | bwd_microstep: 1282.23 | bwd_inner_microstep: 1282.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 05:31:42,356] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.20 | bwd_microstep: 1371.07 | bwd_inner_microstep: 1371.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 05:31:44,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.20 | bwd_microstep: 1248.75 | bwd_inner_microstep: 1248.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2046 [2024-06-11 05:31:45,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.70 | bwd_microstep: 780.41 | bwd_inner_microstep: 780.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 05:31:47,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.37 | bwd_microstep: 1397.24 | bwd_inner_microstep: 1397.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3440 [2024-06-11 05:31:49,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.72 | bwd_microstep: 1401.40 | bwd_inner_microstep: 1401.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3441 [2024-06-11 05:31:50,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.38 | bwd_microstep: 1409.74 | bwd_inner_microstep: 1409.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 05:31:53,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.00 | bwd_microstep: 1479.83 | bwd_inner_microstep: 1479.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 05:31:55,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.47 | bwd_microstep: 1482.12 | bwd_inner_microstep: 1482.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3516 [2024-06-11 05:31:56,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.11 | bwd_microstep: 1194.96 | bwd_inner_microstep: 1194.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2444 [2024-06-11 05:31:57,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.71 | bwd_microstep: 887.25 | bwd_inner_microstep: 887.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-11 05:32:00,234] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.17 | bwd_microstep: 1660.44 | bwd_inner_microstep: 1660.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 05:32:02,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.48 | bwd_microstep: 1511.25 | bwd_inner_microstep: 1511.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-11 05:32:04,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.85 | bwd_microstep: 1415.36 | bwd_inner_microstep: 1415.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2284 [2024-06-11 05:32:05,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.10 | bwd_microstep: 972.82 | bwd_inner_microstep: 972.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-11 05:32:07,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 1303.23 | bwd_inner_microstep: 1303.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3644 [2024-06-11 05:32:09,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.80 | bwd_microstep: 1443.51 | bwd_inner_microstep: 1443.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2993 [2024-06-11 05:32:11,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.53 | bwd_microstep: 1299.86 | bwd_inner_microstep: 1299.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3809 [2024-06-11 05:32:13,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.41 | bwd_microstep: 1415.76 | bwd_inner_microstep: 1415.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1989 [2024-06-11 05:32:14,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.32 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 05:32:16,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.66 | bwd_microstep: 1253.54 | bwd_inner_microstep: 1253.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2241 [2024-06-11 05:32:17,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.02 | bwd_microstep: 871.37 | bwd_inner_microstep: 871.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3588 [2024-06-11 05:32:19,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.13 | bwd_microstep: 1306.79 | bwd_inner_microstep: 1306.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3585 [2024-06-11 05:32:21,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.72 | bwd_microstep: 1569.24 | bwd_inner_microstep: 1569.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3574 [2024-06-11 05:32:23,192] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.04 | bwd_microstep: 1451.16 | bwd_inner_microstep: 1451.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3773 [2024-06-11 05:32:30,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.09 | optimizer_step: 6.60 [2024-06-11 05:32:30,087] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.20 | bwd_microstep: 6338.07 | bwd_inner_microstep: 1515.26 | bwd_allreduce_microstep: 4822.75 | step_microstep: 37.95 [2024-06-11 05:32:30,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15172.63 | bwd: 45407.94 | bwd_inner: 40584.22 | bwd_allreduce: 4823.01 | step: 39.50 {'loss': 1.1617, 'learning_rate': 2.138958734841623e-07, 'epoch': 0.95} dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2403 [2024-06-11 05:32:31,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.57 | bwd_microstep: 957.50 | bwd_inner_microstep: 957.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-11 05:32:33,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.08 | bwd_microstep: 1247.14 | bwd_inner_microstep: 1247.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 05:32:34,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.70 | bwd_microstep: 1244.13 | bwd_inner_microstep: 1244.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 05:32:36,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.30 | bwd_microstep: 1370.95 | bwd_inner_microstep: 1370.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 05:32:38,810] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.06 | bwd_microstep: 1481.50 | bwd_inner_microstep: 1481.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3539 [2024-06-11 05:32:40,480] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1199.66 | bwd_inner_microstep: 1199.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 05:32:42,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.23 | bwd_microstep: 1383.05 | bwd_inner_microstep: 1383.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3754 [2024-06-11 05:32:44,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.64 | bwd_microstep: 1638.49 | bwd_inner_microstep: 1638.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455 [2024-06-11 05:32:46,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.04 | bwd_microstep: 1318.75 | bwd_inner_microstep: 1318.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 05:32:48,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.62 | bwd_microstep: 1478.71 | bwd_inner_microstep: 1478.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3635 [2024-06-11 05:32:50,817] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 616.97 | bwd_microstep: 1682.71 | bwd_inner_microstep: 1682.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 05:32:52,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.26 | bwd_microstep: 1250.08 | bwd_inner_microstep: 1250.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3429 [2024-06-11 05:32:54,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.49 | bwd_microstep: 1440.55 | bwd_inner_microstep: 1440.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1943 [2024-06-11 05:32:55,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.84 | bwd_microstep: 697.45 | bwd_inner_microstep: 697.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3516 [2024-06-11 05:32:57,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.81 | bwd_microstep: 1228.68 | bwd_inner_microstep: 1228.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 05:32:58,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.43 | bwd_microstep: 1255.50 | bwd_inner_microstep: 1255.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3639 [2024-06-11 05:33:01,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.34 | bwd_microstep: 1513.12 | bwd_inner_microstep: 1513.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 05:33:02,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-11 05:33:04,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.30 | bwd_microstep: 1289.18 | bwd_inner_microstep: 1289.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1404 [2024-06-11 05:33:05,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 203.64 | bwd_microstep: 528.59 | bwd_inner_microstep: 528.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3822 [2024-06-11 05:33:07,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.91 | bwd_microstep: 1587.54 | bwd_inner_microstep: 1587.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3543 [2024-06-11 05:33:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1296.62 | bwd_inner_microstep: 1296.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3450 [2024-06-11 05:33:11,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.39 | bwd_microstep: 1354.40 | bwd_inner_microstep: 1354.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2140 [2024-06-11 05:33:12,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.30 | bwd_microstep: 834.18 | bwd_inner_microstep: 834.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3741 [2024-06-11 05:33:14,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.01 | bwd_microstep: 1538.06 | bwd_inner_microstep: 1538.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3730 [2024-06-11 05:33:16,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.17 | bwd_microstep: 1538.54 | bwd_inner_microstep: 1538.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3808 [2024-06-11 05:33:18,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.27 | bwd_microstep: 1618.23 | bwd_inner_microstep: 1618.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3698 [2024-06-11 05:33:20,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.87 | bwd_microstep: 1455.87 | bwd_inner_microstep: 1455.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2288 [2024-06-11 05:33:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.06 | bwd_microstep: 974.87 | bwd_inner_microstep: 974.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 05:33:24,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.43 | bwd_microstep: 1478.33 | bwd_inner_microstep: 1478.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3803 [2024-06-11 05:33:26,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.03 | bwd_microstep: 1752.87 | bwd_inner_microstep: 1752.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 05:33:30,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.17 | optimizer_step: 6.63 [2024-06-11 05:33:30,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.10 | bwd_microstep: 3280.23 | bwd_inner_microstep: 1867.89 | bwd_allreduce_microstep: 1412.26 | step_microstep: 38.90 [2024-06-11 05:33:30,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15960.00 | bwd: 44312.39 | bwd_inner: 42899.19 | bwd_allreduce: 1412.50 | step: 40.36 95%|█████████▌| 1644/1726 [28:50:25<1:41:51, 74.53s/it] 95%|█████████▌| 1645/1726 [28:52:02<1:49:46, 81.31s/it] 95%|█████████▌| 1645/1726 [28:52:02<1:49:46, 81.31s/it] 95%|█████████▌| 1646/1726 [28:53:04<1:40:46, 75.58s/it] 95%|█████████▌| 1646/1726 [28:53:04<1:40:46, 75.58s/it] 95%|█████████▌| 1647/1726 [28:54:05<1:33:56, 71.35s/it] 95%|█████████▌| 1647/1726 [28:54:05<1:33:56, 71.35s/it] 95%|█████████▌| 1648/1726 [28:55:06<1:28:41, 68.22s/it] 95%|█████████▌| 1648/1726 [28:55:06<1:28:41, 68.22s/it] 96%|█████████▌| 1649/1726 [28:56:07<1:{'loss': 1.1828, 'learning_rate': 2.0845600732342987e-07, 'epoch': 0.96} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 05:33:32,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.45 | bwd_microstep: 1449.59 | bwd_inner_microstep: 1449.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4343 [2024-06-11 05:33:34,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.22 | bwd_microstep: 1601.21 | bwd_inner_microstep: 1601.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 05:33:36,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.16 | bwd_microstep: 1286.41 | bwd_inner_microstep: 1286.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3469 [2024-06-11 05:33:38,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.57 | bwd_microstep: 1238.27 | bwd_inner_microstep: 1238.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 05:33:40,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1382.78 | bwd_inner_microstep: 1382.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3746 [2024-06-11 05:33:42,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.72 | bwd_microstep: 1469.25 | bwd_inner_microstep: 1469.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-11 05:33:43,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.84 | bwd_microstep: 1152.42 | bwd_inner_microstep: 1152.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2464 [2024-06-11 05:33:45,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.61 | bwd_microstep: 952.04 | bwd_inner_microstep: 952.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 05:33:47,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.62 | bwd_microstep: 1521.91 | bwd_inner_microstep: 1521.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3748 [2024-06-11 05:33:49,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.98 | bwd_microstep: 1640.08 | bwd_inner_microstep: 1640.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 05:33:51,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.04 | bwd_microstep: 1246.08 | bwd_inner_microstep: 1246.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 05:33:53,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.61 | bwd_microstep: 1284.95 | bwd_inner_microstep: 1284.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3478 [2024-06-11 05:33:54,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.91 | bwd_microstep: 1312.46 | bwd_inner_microstep: 1312.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3510 [2024-06-11 05:33:56,876] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.14 | bwd_microstep: 1394.18 | bwd_inner_microstep: 1394.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3486 [2024-06-11 05:33:58,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.09 | bwd_microstep: 1266.95 | bwd_inner_microstep: 1266.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3502 [2024-06-11 05:34:00,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.65 | bwd_microstep: 1549.99 | bwd_inner_microstep: 1549.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3616 [2024-06-11 05:34:03,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.37 | bwd_microstep: 1708.90 | bwd_inner_microstep: 1708.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526 [2024-06-11 05:34:05,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.27 | bwd_microstep: 1455.30 | bwd_inner_microstep: 1455.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3679 [2024-06-11 05:34:07,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.51 | bwd_microstep: 1728.72 | bwd_inner_microstep: 1728.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3830 [2024-06-11 05:34:09,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.80 | bwd_microstep: 1661.87 | bwd_inner_microstep: 1661.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2062 [2024-06-11 05:34:11,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 350.81 | bwd_microstep: 946.08 | bwd_inner_microstep: 946.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2069 [2024-06-11 05:34:12,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 359.31 | bwd_microstep: 976.95 | bwd_inner_microstep: 976.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 05:34:14,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.88 | bwd_microstep: 1385.26 | bwd_inner_microstep: 1385.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-11 05:34:15,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.23 | bwd_microstep: 1161.42 | bwd_inner_microstep: 1161.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 05:34:17,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.37 | bwd_microstep: 1457.26 | bwd_inner_microstep: 1457.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-11 05:34:20,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.43 | bwd_microstep: 1555.09 | bwd_inner_microstep: 1555.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-11 05:34:22,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1402.94 | bwd_inner_microstep: 1402.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 05:34:23,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.54 | bwd_microstep: 1399.05 | bwd_inner_microstep: 1399.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3768 [2024-06-11 05:34:26,365] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.19 | bwd_microstep: 1741.08 | bwd_inner_microstep: 1741.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-11 05:34:28,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.36 | bwd_microstep: 1400.19 | bwd_inner_microstep: 1400.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 05:34:30,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.61 | bwd_microstep: 1400.13 | bwd_inner_microstep: 1400.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3443 [2024-06-11 05:34:32,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.98 | optimizer_gradients: 4.18 | optimizer_step: 6.59 [2024-06-11 05:34:32,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.58 | bwd_microstep: 1352.94 | bwd_inner_microstep: 1219.64 | bwd_allreduce_microstep: 133.25 | step_microstep: 37.91 [2024-06-11 05:34:32,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16572.20 | bwd: 44481.77 | bwd_inner: 44347.61 | bwd_allreduce: 133.48 | step: 39.45 {'loss': 1.1761, 'learning_rate': 2.0308584687775745e-07, 'epoch': 0.96} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1989 [2024-06-11 05:34:33,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.79 | bwd_microstep: 889.30 | bwd_inner_microstep: 889.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-11 05:34:35,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.51 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-11 05:34:37,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.19 | bwd_microstep: 1560.86 | bwd_inner_microstep: 1560.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1426 [2024-06-11 05:34:38,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 217.45 | bwd_microstep: 567.88 | bwd_inner_microstep: 567.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3763 [2024-06-11 05:34:40,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.07 | bwd_microstep: 1541.47 | bwd_inner_microstep: 1541.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3481 [2024-06-11 05:34:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.37 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-11 05:34:43,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.41 | bwd_microstep: 1150.85 | bwd_inner_microstep: 1150.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1884 [2024-06-11 05:34:44,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.22 | bwd_microstep: 682.46 | bwd_inner_microstep: 682.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3412 [2024-06-11 05:34:46,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.74 | bwd_microstep: 1332.76 | bwd_inner_microstep: 1332.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-11 05:34:48,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.84 | bwd_microstep: 1486.76 | bwd_inner_microstep: 1486.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2139 [2024-06-11 05:34:49,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.98 | bwd_microstep: 1025.37 | bwd_inner_microstep: 1025.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3702 [2024-06-11 05:34:52,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.69 | bwd_microstep: 1724.51 | bwd_inner_microstep: 1724.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449 [2024-06-11 05:34:54,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.31 | bwd_microstep: 1419.89 | bwd_inner_microstep: 1419.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1967 [2024-06-11 05:34:55,148] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.49 | bwd_microstep: 734.18 | bwd_inner_microstep: 734.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3648 [2024-06-11 05:34:57,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.34 | bwd_microstep: 1626.98 | bwd_inner_microstep: 1626.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3454 [2024-06-11 05:34:59,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.03 | bwd_microstep: 1192.54 | bwd_inner_microstep: 1192.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 05:35:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.96 | bwd_microstep: 1485.70 | bwd_inner_microstep: 1485.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3427 [2024-06-11 05:35:02,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.10 | bwd_microstep: 1346.22 | bwd_inner_microstep: 1346.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3490 [2024-06-11 05:35:04,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1417.53 | bwd_inner_microstep: 1417.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948 [2024-06-11 05:35:05,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 697.79 | bwd_inner_microstep: 697.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2290 [2024-06-11 05:35:07,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.02 | bwd_microstep: 879.16 | bwd_inner_microstep: 879.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3509 [2024-06-11 05:35:08,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.80 | bwd_microstep: 1292.84 | bwd_inner_microstep: 1292.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3438 [2024-06-11 05:35:10,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.00 | bwd_microstep: 1187.34 | bwd_inner_microstep: 1187.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3608 [2024-06-11 05:35:12,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1414.94 | bwd_inner_microstep: 1414.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605 [2024-06-11 05:35:14,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.56 | bwd_microstep: 1430.23 | bwd_inner_microstep: 1430.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3791 [2024-06-11 05:35:16,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.38 | bwd_microstep: 1659.71 | bwd_inner_microstep: 1659.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2411 [2024-06-11 05:35:18,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.11 | bwd_microstep: 969.71 | bwd_inner_microstep: 969.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 05:35:19,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.51 | bwd_microstep: 1254.94 | bwd_inner_microstep: 1254.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3565 [2024-06-11 05:35:21,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1498.89 | bwd_inner_microstep: 1498.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3768 [2024-06-11 05:35:24,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.70 | bwd_microstep: 1543.42 | bwd_inner_microstep: 1543.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-11 05:35:26,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.63 | bwd_microstep: 1647.19 | bwd_inner_microstep: 1647.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3793 [2024-06-11 05:35:30,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 05:35:30,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.63 | bwd_microstep: 3506.06 | bwd_inner_microstep: 2236.97 | bwd_allreduce_microstep: 1269.03 | step_microstep: 38.93 [2024-06-11 05:35:30,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15286.33 | bwd: 42730.72 | bwd_inner: 41460.75 | bwd_allreduce: 1269.27 | step: 40.49 {'loss': 1.1595, 'learning_rate': 1.9778541106081572e-07, 'epoch': 0.96} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 05:35:32,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.54 | bwd_microstep: 1241.06 | bwd_inner_microstep: 1241.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3926 [2024-06-11 05:35:34,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.92 | bwd_microstep: 1687.12 | bwd_inner_microstep: 1687.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 05:35:36,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.13 | bwd_microstep: 1242.83 | bwd_inner_microstep: 1242.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 05:35:38,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.03 | bwd_microstep: 1378.84 | bwd_inner_microstep: 1378.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 05:35:39,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.98 | bwd_microstep: 1283.74 | bwd_inner_microstep: 1283.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3757 [2024-06-11 05:35:41,935] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.00 | bwd_microstep: 1471.52 | bwd_inner_microstep: 1471.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 05:35:43,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.50 | bwd_microstep: 1380.24 | bwd_inner_microstep: 1380.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3779 [2024-06-11 05:35:46,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.75 | bwd_microstep: 1653.55 | bwd_inner_microstep: 1653.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3499 [2024-06-11 05:35:48,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.85 | bwd_microstep: 1390.95 | bwd_inner_microstep: 1390.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3505 [2024-06-11 05:35:50,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.83 | bwd_microstep: 1484.84 | bwd_inner_microstep: 1484.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3509 [2024-06-11 05:35:52,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.23 | bwd_microstep: 1416.99 | bwd_inner_microstep: 1416.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3510 [2024-06-11 05:35:53,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.37 | bwd_microstep: 1333.82 | bwd_inner_microstep: 1333.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3462 [2024-06-11 05:35:55,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.80 | bwd_microstep: 1213.30 | bwd_inner_microstep: 1213.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-11 05:35:57,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.10 | bwd_microstep: 1489.69 | bwd_inner_microstep: 1489.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-11 05:35:59,846] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1616.94 | bwd_inner_microstep: 1616.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3381 [2024-06-11 05:36:01,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1336.44 | bwd_inner_microstep: 1336.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 05:36:03,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.42 | bwd_microstep: 1491.93 | bwd_inner_microstep: 1491.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3517 [2024-06-11 05:36:05,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1615.99 | bwd_inner_microstep: 1615.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3533 [2024-06-11 05:36:08,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1519.99 | bwd_inner_microstep: 1519.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3581 [2024-06-11 05:36:09,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.93 | bwd_microstep: 1308.46 | bwd_inner_microstep: 1308.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3419 [2024-06-11 05:36:11,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.20 | bwd_microstep: 1344.49 | bwd_inner_microstep: 1344.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 05:36:13,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.53 | bwd_microstep: 1401.61 | bwd_inner_microstep: 1401.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558 [2024-06-11 05:36:15,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.44 | bwd_microstep: 1302.07 | bwd_inner_microstep: 1302.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3451 [2024-06-11 05:36:17,493] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1447.40 | bwd_inner_microstep: 1447.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 526 [2024-06-11 05:36:17,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 97.33 | bwd_microstep: 240.40 | bwd_inner_microstep: 240.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2025 [2024-06-11 05:36:19,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.10 | bwd_microstep: 901.12 | bwd_inner_microstep: 901.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3532 [2024-06-11 05:36:21,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.02 | bwd_microstep: 1492.67 | bwd_inner_microstep: 1492.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425 [2024-06-11 05:36:22,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 386.23 | bwd_microstep: 1035.05 | bwd_inner_microstep: 1035.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3813 [2024-06-11 05:36:24,906] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.45 | bwd_microstep: 1698.96 | bwd_inner_microstep: 1698.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3570 [2024-06-11 05:36:27,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.65 | bwd_microstep: 1526.40 | bwd_inner_microstep: 1526.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-11 05:36:29,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.85 | bwd_microstep: 1608.33 | bwd_inner_microstep: 1608.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 05:36:31,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.61 [2024-06-11 05:36:31,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.91 | bwd_microstep: 1420.56 | bwd_inner_microstep: 1412.12 | bwd_allreduce_microstep: 8.37 | step_microstep: 38.92 [2024-06-11 05:36:31,215] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16417.13 | bwd: 43977.31 | bwd_inner: 43968.02 | bwd_allreduce: 8.61 | step: 40.59 {'loss': 1.1985, 'learning_rate': 1.9255471854071616e-07, 'epoch': 0.96} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3467 [2024-06-11 05:36:33,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.85 | bwd_microstep: 1441.44 | bwd_inner_microstep: 1441.28 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2386 [2024-06-11 05:36:34,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.80 | bwd_microstep: 904.71 | bwd_inner_microstep: 904.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3404 [2024-06-11 05:36:36,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.30 | bwd_microstep: 1279.50 | bwd_inner_microstep: 1279.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-11 05:36:38,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.06 | bwd_microstep: 1642.41 | bwd_inner_microstep: 1642.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-11 05:36:40,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1398.12 | bwd_inner_microstep: 1398.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 05:36:42,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.13 | bwd_microstep: 1376.39 | bwd_inner_microstep: 1376.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3760 [2024-06-11 05:36:44,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.08 | bwd_microstep: 1436.18 | bwd_inner_microstep: 1436.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2220 [2024-06-11 05:36:45,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.31 | bwd_microstep: 958.13 | bwd_inner_microstep: 958.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 05:36:47,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.56 | bwd_microstep: 1390.75 | bwd_inner_microstep: 1390.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 05:36:49,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.90 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3542 [2024-06-11 05:36:51,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.84 | bwd_microstep: 1260.62 | bwd_inner_microstep: 1260.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 05:36:52,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1383.48 | bwd_inner_microstep: 1383.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3445 [2024-06-11 05:36:54,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.71 | bwd_microstep: 1217.95 | bwd_inner_microstep: 1217.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-11 05:36:56,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.86 | bwd_microstep: 1290.10 | bwd_inner_microstep: 1290.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3659 [2024-06-11 05:36:58,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.10 | bwd_microstep: 1418.06 | bwd_inner_microstep: 1418.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2991 [2024-06-11 05:37:00,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.16 | bwd_microstep: 1204.64 | bwd_inner_microstep: 1204.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3632 [2024-06-11 05:37:01,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.01 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 05:37:03,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.05 | bwd_microstep: 1279.08 | bwd_inner_microstep: 1279.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 05:37:05,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.21 | bwd_microstep: 1396.45 | bwd_inner_microstep: 1396.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 05:37:07,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1391.91 | bwd_inner_microstep: 1391.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2125 [2024-06-11 05:37:08,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.93 | bwd_microstep: 991.61 | bwd_inner_microstep: 991.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463 [2024-06-11 05:37:10,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.53 | bwd_microstep: 1407.89 | bwd_inner_microstep: 1407.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3867 [2024-06-11 05:37:12,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.35 | bwd_microstep: 1272.65 | bwd_inner_microstep: 1272.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3517 [2024-06-11 05:37:14,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.60 | bwd_microstep: 1490.42 | bwd_inner_microstep: 1490.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3543 [2024-06-11 05:37:16,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.13 | bwd_microstep: 1496.88 | bwd_inner_microstep: 1496.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3451 [2024-06-11 05:37:18,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.62 | bwd_microstep: 1477.63 | bwd_inner_microstep: 1477.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3819 [2024-06-11 05:37:20,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.88 | bwd_microstep: 1450.75 | bwd_inner_microstep: 1450.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3687 [2024-06-11 05:37:23,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.54 | bwd_microstep: 1618.47 | bwd_inner_microstep: 1618.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-11 05:37:25,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.41 | bwd_microstep: 1597.44 | bwd_inner_microstep: 1597.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2277 [2024-06-11 05:37:26,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.59 | bwd_microstep: 1070.08 | bwd_inner_microstep: 1070.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 05:37:29,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.35 | bwd_microstep: 1637.10 | bwd_inner_microstep: 1637.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2030 [2024-06-11 05:37:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.77 | optimizer_gradients: 4.20 | optimizer_step: 6.62 [2024-06-11 05:37:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 284.55 | bwd_microstep: 3233.80 | bwd_inner_microstep: 853.97 | bwd_allreduce_microstep: 2379.78 | step_microstep: 37.96 [2024-06-11 05:37:32,570] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15954.22 | bwd: 45058.80 | bwd_inner: 42677.98 | bwd_allreduce: 2380.07 | step: 39.48 {'loss': 1.1893, 'learning_rate': 1.87393787739929e-07, 'epoch': 0.96} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3457 [2024-06-11 05:37:34,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.12 | bwd_microstep: 1362.92 | bwd_inner_microstep: 1362.78 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3885 [2024-06-11 05:37:36,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.14 | bwd_microstep: 1679.52 | bwd_inner_microstep: 1679.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-11 05:37:38,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.57 | bwd_microstep: 1446.02 | bwd_inner_microstep: 1446.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3848 [2024-06-11 05:37:40,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.23 | bwd_microstep: 1557.88 | bwd_inner_microstep: 1557.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 05:37:42,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.93 | bwd_microstep: 1245.07 | bwd_inner_microstep: 1245.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1929 [2024-06-11 05:37:43,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.83 | bwd_microstep: 790.24 | bwd_inner_microstep: 790.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3420 [2024-06-11 05:37:45,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.47 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1377 [2024-06-11 05:37:46,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 201.30 | bwd_microstep: 525.12 | bwd_inner_microstep: 525.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 05:37:48,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.79 | bwd_microstep: 1385.44 | bwd_inner_microstep: 1385.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3982 [2024-06-11 05:37:50,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.23 | bwd_microstep: 1523.09 | bwd_inner_microstep: 1523.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2654 [2024-06-11 05:37:51,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.47 | bwd_microstep: 952.29 | bwd_inner_microstep: 952.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377 [2024-06-11 05:37:53,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.13 | bwd_microstep: 1333.81 | bwd_inner_microstep: 1333.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3674 [2024-06-11 05:37:55,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.12 | bwd_microstep: 1690.43 | bwd_inner_microstep: 1690.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657 [2024-06-11 05:37:57,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.85 | bwd_microstep: 1519.35 | bwd_inner_microstep: 1519.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 05:37:59,647] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.10 | bwd_microstep: 1246.38 | bwd_inner_microstep: 1246.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 05:38:01,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.66 | bwd_microstep: 1344.33 | bwd_inner_microstep: 1344.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3506 [2024-06-11 05:38:03,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.94 | bwd_microstep: 1484.65 | bwd_inner_microstep: 1484.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3505 [2024-06-11 05:38:05,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.11 | bwd_microstep: 1430.88 | bwd_inner_microstep: 1430.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3658 [2024-06-11 05:38:07,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.25 | bwd_microstep: 1667.59 | bwd_inner_microstep: 1667.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3437 [2024-06-11 05:38:09,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.36 | bwd_microstep: 1347.84 | bwd_inner_microstep: 1347.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676 [2024-06-11 05:38:11,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.44 | bwd_microstep: 1547.95 | bwd_inner_microstep: 1547.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288 [2024-06-11 05:38:13,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.59 | bwd_microstep: 878.26 | bwd_inner_microstep: 878.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 05:38:15,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.07 | bwd_microstep: 1509.40 | bwd_inner_microstep: 1509.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2029 [2024-06-11 05:38:16,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.71 | bwd_microstep: 714.77 | bwd_inner_microstep: 714.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3540 [2024-06-11 05:38:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.42 | bwd_microstep: 1295.11 | bwd_inner_microstep: 1295.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2242 [2024-06-11 05:38:19,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.38 | bwd_microstep: 807.24 | bwd_inner_microstep: 807.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2545 [2024-06-11 05:38:20,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 396.78 | bwd_microstep: 1063.21 | bwd_inner_microstep: 1063.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3815 [2024-06-11 05:38:22,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.13 | bwd_microstep: 1599.38 | bwd_inner_microstep: 1599.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3732 [2024-06-11 05:38:24,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.86 | bwd_microstep: 1274.66 | bwd_inner_microstep: 1274.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3780 [2024-06-11 05:38:26,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.94 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 3586 [2024-06-11 05:38:28,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.56 | bwd_microstep: 1288.02 | bwd_inner_microstep: 1288.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 05:38:34,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.16 | optimizer_step: 6.60 [2024-06-11 05:38:34,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.38 | bwd_microstep: 5597.56 | bwd_inner_microstep: 1558.66 | bwd_allreduce_microstep: 4038.84 | step_microstep: 39.40 [2024-06-11 05:38:34,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15623.49 | bwd: 45951.85 | bwd_inner: 41911.97 | bwd_allreduce: 4039.14 | step: 40.97 24:37, 65.94s/it] 96%|█████████▌| 1649/1726 [28:56:07<1:24:37, 65.94s/it] 96%|█████████▌| 1650/1726 [28:57:08<1:21:47, 64.58s/it] 96%|█████████▌| 1650/1726 [28:57:08<1:21:47, 64.58s/it] 96%|█████████▌| 1651/1726 [28:58:07<1:18:23, 62.71s/it] 96%|█████████▌| 1651/1726 [28:58:07<1:18:23, 62.71s/it] 96%|█████████▌| 1652/1726 [28:59:07<1:16:37, 62.12s/it] 96%|█████████▌| 1652/1726 [28:59:07<1:16:37, 62.12s/it] 96%|█████████▌| 1653/1726 [29:00:09<1:15:18, 61.89s/it] 96%|█████████▌| 1653/1726 [29:00:09<1:15:18, 61.89s/it] 96%|██████{'loss': 1.1477, 'learning_rate': 1.823026368352232e-07, 'epoch': 0.96} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3388 [2024-06-11 05:38:36,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.04 | bwd_microstep: 1294.21 | bwd_inner_microstep: 1294.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3461 [2024-06-11 05:38:38,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1336.20 | bwd_inner_microstep: 1336.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 05:38:39,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.53 | bwd_microstep: 1349.42 | bwd_inner_microstep: 1349.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 05:38:42,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.51 | bwd_microstep: 1546.59 | bwd_inner_microstep: 1546.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3460 [2024-06-11 05:38:44,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1474.00 | bwd_inner_microstep: 1473.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-11 05:38:46,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.28 | bwd_microstep: 1630.24 | bwd_inner_microstep: 1630.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 05:38:48,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.55 | bwd_microstep: 1250.24 | bwd_inner_microstep: 1250.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954 [2024-06-11 05:38:49,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.27 | bwd_microstep: 701.21 | bwd_inner_microstep: 701.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3687 [2024-06-11 05:38:51,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.05 | bwd_microstep: 1431.37 | bwd_inner_microstep: 1431.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3486 [2024-06-11 05:38:52,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.35 | bwd_microstep: 1342.92 | bwd_inner_microstep: 1342.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3683 [2024-06-11 05:38:55,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.16 | bwd_microstep: 1515.95 | bwd_inner_microstep: 1515.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 05:38:57,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.75 | bwd_microstep: 1476.22 | bwd_inner_microstep: 1476.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2047 [2024-06-11 05:38:58,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.29 | bwd_microstep: 717.97 | bwd_inner_microstep: 717.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3998 [2024-06-11 05:39:00,299] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.05 | bwd_microstep: 1608.41 | bwd_inner_microstep: 1608.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 05:39:02,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.86 | bwd_microstep: 1553.65 | bwd_inner_microstep: 1553.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 05:39:04,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.52 | bwd_microstep: 1402.98 | bwd_inner_microstep: 1402.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2019 [2024-06-11 05:39:05,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.62 | bwd_microstep: 805.81 | bwd_inner_microstep: 805.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3494 [2024-06-11 05:39:07,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.76 | bwd_microstep: 1189.84 | bwd_inner_microstep: 1189.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 05:39:09,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.73 | bwd_microstep: 1488.10 | bwd_inner_microstep: 1488.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2990 [2024-06-11 05:39:10,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.71 | bwd_microstep: 1294.76 | bwd_inner_microstep: 1294.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3421 [2024-06-11 05:39:13,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 1470.84 | bwd_inner_microstep: 1470.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3598 [2024-06-11 05:39:14,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1408.75 | bwd_inner_microstep: 1408.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2277 [2024-06-11 05:39:16,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.38 | bwd_microstep: 1069.22 | bwd_inner_microstep: 1069.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3601 [2024-06-11 05:39:18,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.39 | bwd_microstep: 1214.02 | bwd_inner_microstep: 1214.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3811 [2024-06-11 05:39:19,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.07 | bwd_microstep: 1355.96 | bwd_inner_microstep: 1355.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3386 [2024-06-11 05:39:21,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.24 | bwd_microstep: 1436.21 | bwd_inner_microstep: 1436.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2275 [2024-06-11 05:39:23,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.37 | bwd_microstep: 940.10 | bwd_inner_microstep: 940.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3591 [2024-06-11 05:39:25,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.68 | bwd_microstep: 1598.09 | bwd_inner_microstep: 1598.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-11 05:39:27,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.40 | bwd_microstep: 1411.51 | bwd_inner_microstep: 1411.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3625 [2024-06-11 05:39:29,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1602.28 | bwd_inner_microstep: 1602.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.97 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3732 [2024-06-11 05:39:31,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.86 | bwd_microstep: 1730.19 | bwd_inner_microstep: 1730.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3816 [2024-06-11 05:39:37,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-11 05:39:37,942] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.11 | bwd_microstep: 5319.07 | bwd_inner_microstep: 1806.79 | bwd_allreduce_microstep: 3512.20 | step_microstep: 39.87 [2024-06-11 05:39:37,945] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16147.97 | bwd: 46966.36 | bwd_inner: 43453.22 | bwd_allreduce: 3512.44 | step: 43.29 {'loss': 1.191, 'learning_rate': 1.7728128375760877e-07, 'epoch': 0.96} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3455 [2024-06-11 05:39:39,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.35 | bwd_microstep: 1150.76 | bwd_inner_microstep: 1150.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3983 [2024-06-11 05:39:41,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.04 | bwd_microstep: 1533.63 | bwd_inner_microstep: 1533.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3845 [2024-06-11 05:39:43,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.04 | bwd_microstep: 1660.05 | bwd_inner_microstep: 1660.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-11 05:39:45,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.10 | bwd_microstep: 1454.95 | bwd_inner_microstep: 1454.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3788 [2024-06-11 05:39:48,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.60 | bwd_microstep: 1547.65 | bwd_inner_microstep: 1547.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3441 [2024-06-11 05:39:49,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.26 | bwd_microstep: 1345.20 | bwd_inner_microstep: 1345.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 05:39:51,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.54 | bwd_microstep: 1392.52 | bwd_inner_microstep: 1392.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 05:39:53,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.78 | bwd_microstep: 1279.94 | bwd_inner_microstep: 1279.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1874 [2024-06-11 05:39:54,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.76 | bwd_microstep: 680.50 | bwd_inner_microstep: 680.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3686 [2024-06-11 05:39:56,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.04 | bwd_microstep: 1523.09 | bwd_inner_microstep: 1523.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3682 [2024-06-11 05:39:58,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.90 | bwd_microstep: 1628.26 | bwd_inner_microstep: 1628.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3671 [2024-06-11 05:40:01,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.83 | bwd_microstep: 1521.62 | bwd_inner_microstep: 1521.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3673 [2024-06-11 05:40:03,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1423.50 | bwd_inner_microstep: 1423.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 05:40:05,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.48 | bwd_microstep: 1476.98 | bwd_inner_microstep: 1476.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3460 [2024-06-11 05:40:06,982] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.24 | bwd_microstep: 1407.60 | bwd_inner_microstep: 1407.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-11 05:40:09,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.96 | bwd_microstep: 1580.99 | bwd_inner_microstep: 1580.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3578 [2024-06-11 05:40:11,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.43 | bwd_microstep: 1336.25 | bwd_inner_microstep: 1336.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1920 [2024-06-11 05:40:11,963] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.79 | bwd_microstep: 686.52 | bwd_inner_microstep: 686.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 05:40:14,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.09 | bwd_microstep: 1657.45 | bwd_inner_microstep: 1657.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3835 [2024-06-11 05:40:16,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.17 | bwd_microstep: 1359.42 | bwd_inner_microstep: 1359.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-11 05:40:18,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.46 | bwd_microstep: 1510.19 | bwd_inner_microstep: 1510.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3529 [2024-06-11 05:40:20,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.68 | bwd_microstep: 1493.41 | bwd_inner_microstep: 1493.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 05:40:22,199] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1397.92 | bwd_inner_microstep: 1397.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3699 [2024-06-11 05:40:24,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.30 | bwd_microstep: 1427.49 | bwd_inner_microstep: 1427.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3603 [2024-06-11 05:40:25,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.22 | bwd_microstep: 1312.24 | bwd_inner_microstep: 1312.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 05:40:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.78 | bwd_microstep: 1288.50 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3573 [2024-06-11 05:40:29,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.93 | bwd_microstep: 1362.57 | bwd_inner_microstep: 1362.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 05:40:31,728] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.72 | bwd_microstep: 1498.39 | bwd_inner_microstep: 1498.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3725 [2024-06-11 05:40:33,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.07 | bwd_microstep: 1627.68 | bwd_inner_microstep: 1627.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3581 [2024-06-11 05:40:36,124] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.90 | bwd_microstep: 1565.58 | bwd_inner_microstep: 1565.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2552 [2024-06-11 05:40:37,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.24 | bwd_microstep: 1062.05 | bwd_inner_microstep: 1062.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2047 [2024-06-11 05:40:44,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.30 | optimizer_step: 6.58 [2024-06-11 05:40:44,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.78 | bwd_microstep: 6040.42 | bwd_inner_microstep: 1037.63 | bwd_allreduce_microstep: 5002.72 | step_microstep: 40.05 [2024-06-11 05:40:44,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16488.19 | bwd: 49233.35 | bwd_inner: 44229.70 | bwd_allreduce: 5002.96 | step: 41.59 {'loss': 1.2292, 'learning_rate': 1.7232974619226572e-07, 'epoch': 0.96} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2041 [2024-06-11 05:40:45,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.22 | bwd_microstep: 800.97 | bwd_inner_microstep: 800.86 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 05:40:46,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.24 | bwd_microstep: 1279.22 | bwd_inner_microstep: 1279.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3836 [2024-06-11 05:40:48,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.05 | bwd_microstep: 1448.49 | bwd_inner_microstep: 1448.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3841 [2024-06-11 05:40:50,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.86 | bwd_microstep: 1484.72 | bwd_inner_microstep: 1484.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2228 [2024-06-11 05:40:52,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.00 | bwd_microstep: 955.71 | bwd_inner_microstep: 955.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3745 [2024-06-11 05:40:54,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.66 | bwd_microstep: 1532.83 | bwd_inner_microstep: 1532.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 05:40:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.07 | bwd_microstep: 1475.57 | bwd_inner_microstep: 1475.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3413 [2024-06-11 05:40:58,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.00 | bwd_microstep: 1149.84 | bwd_inner_microstep: 1149.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1675 [2024-06-11 05:40:58,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 252.72 | bwd_microstep: 665.86 | bwd_inner_microstep: 665.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 05:41:00,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.22 | bwd_microstep: 1278.12 | bwd_inner_microstep: 1278.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 05:41:02,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.98 | bwd_microstep: 1385.02 | bwd_inner_microstep: 1384.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1972 [2024-06-11 05:41:03,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.45 | bwd_microstep: 828.77 | bwd_inner_microstep: 828.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2685 [2024-06-11 05:41:05,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.14 | bwd_microstep: 1119.81 | bwd_inner_microstep: 1119.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521 [2024-06-11 05:41:07,334] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.95 | bwd_microstep: 1449.83 | bwd_inner_microstep: 1449.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3439 [2024-06-11 05:41:09,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.01 | bwd_microstep: 1442.32 | bwd_inner_microstep: 1442.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 05:41:11,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.18 | bwd_microstep: 1356.62 | bwd_inner_microstep: 1356.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3446 [2024-06-11 05:41:12,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.52 | bwd_microstep: 1218.53 | bwd_inner_microstep: 1218.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620 [2024-06-11 05:41:13,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.47 | bwd_microstep: 260.44 | bwd_inner_microstep: 260.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3470 [2024-06-11 05:41:14,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.54 | bwd_microstep: 1184.42 | bwd_inner_microstep: 1184.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3616 [2024-06-11 05:41:16,628] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.35 | bwd_microstep: 1246.57 | bwd_inner_microstep: 1246.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3453 [2024-06-11 05:41:18,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.68 | bwd_microstep: 1258.37 | bwd_inner_microstep: 1258.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 05:41:20,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.44 | bwd_microstep: 1290.81 | bwd_inner_microstep: 1290.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3383 [2024-06-11 05:41:22,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.12 | bwd_microstep: 1435.64 | bwd_inner_microstep: 1435.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2021 [2024-06-11 05:41:23,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.81 | bwd_microstep: 714.92 | bwd_inner_microstep: 714.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3815 [2024-06-11 05:41:25,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.81 | bwd_microstep: 1486.44 | bwd_inner_microstep: 1486.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3519 [2024-06-11 05:41:26,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.79 | bwd_microstep: 1292.57 | bwd_inner_microstep: 1292.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 05:41:28,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.64 | bwd_microstep: 1450.76 | bwd_inner_microstep: 1450.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3440 [2024-06-11 05:41:30,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 436.51 | bwd_microstep: 1136.15 | bwd_inner_microstep: 1136.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 05:41:32,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.47 | bwd_microstep: 1555.46 | bwd_inner_microstep: 1555.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3470 [2024-06-11 05:41:34,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.43 | bwd_microstep: 1408.99 | bwd_inner_microstep: 1408.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3612 [2024-06-11 05:41:36,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.11 | bwd_microstep: 1573.86 | bwd_inner_microstep: 1573.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3027 [2024-06-11 05:41:45,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.79 | optimizer_gradients: 4.08 | optimizer_step: 6.61 [2024-06-11 05:41:45,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 435.75 | bwd_microstep: 8161.44 | bwd_inner_microstep: 1278.74 | bwd_allreduce_microstep: 6882.65 | step_microstep: 37.88 [2024-06-11 05:41:45,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14771.81 | bwd: 46329.09 | bwd_inner: 39445.44 | bwd_allreduce: 6882.92 | step: 39.36 {'loss': 1.1811, 'learning_rate': 1.6744804157848183e-07, 'epoch': 0.96} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-11 05:41:47,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1467.76 | bwd_inner_microstep: 1467.57 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3509 [2024-06-11 05:41:49,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.25 | bwd_microstep: 1217.91 | bwd_inner_microstep: 1217.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3471 [2024-06-11 05:41:51,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.26 | bwd_microstep: 1478.36 | bwd_inner_microstep: 1478.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3807 [2024-06-11 05:41:53,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.54 | bwd_microstep: 1413.37 | bwd_inner_microstep: 1413.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3514 [2024-06-11 05:41:54,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.48 | bwd_microstep: 1316.77 | bwd_inner_microstep: 1316.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-11 05:41:56,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.37 | bwd_microstep: 787.61 | bwd_inner_microstep: 787.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-11 05:41:58,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.03 | bwd_microstep: 1430.87 | bwd_inner_microstep: 1430.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3441 [2024-06-11 05:41:59,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.40 | bwd_microstep: 1185.71 | bwd_inner_microstep: 1185.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3438 [2024-06-11 05:42:01,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.63 | bwd_microstep: 1152.16 | bwd_inner_microstep: 1152.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2002 [2024-06-11 05:42:02,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.44 | bwd_microstep: 803.87 | bwd_inner_microstep: 803.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 05:42:04,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.28 | bwd_microstep: 1254.65 | bwd_inner_microstep: 1254.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-11 05:42:06,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.61 | bwd_microstep: 1623.07 | bwd_inner_microstep: 1623.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2333 [2024-06-11 05:42:07,496] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.24 | bwd_microstep: 795.50 | bwd_inner_microstep: 795.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2298 [2024-06-11 05:42:08,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.63 | bwd_microstep: 1069.94 | bwd_inner_microstep: 1069.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3634 [2024-06-11 05:42:11,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.98 | bwd_microstep: 1615.23 | bwd_inner_microstep: 1615.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3511 [2024-06-11 05:42:13,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.69 | bwd_microstep: 1483.20 | bwd_inner_microstep: 1483.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-11 05:42:15,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.95 | bwd_microstep: 1486.41 | bwd_inner_microstep: 1486.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-11 05:42:17,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.46 | bwd_microstep: 1317.86 | bwd_inner_microstep: 1317.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 05:42:19,019] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.05 | bwd_microstep: 1386.22 | bwd_inner_microstep: 1386.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3684 [2024-06-11 05:42:21,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.84 | bwd_microstep: 1487.76 | bwd_inner_microstep: 1487.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 05:42:23,206] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.43 | bwd_microstep: 1549.17 | bwd_inner_microstep: 1549.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3459 [2024-06-11 05:42:24,855] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.93 | bwd_microstep: 1183.79 | bwd_inner_microstep: 1183.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3446 [2024-06-11 05:42:26,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.14 | bwd_microstep: 1159.36 | bwd_inner_microstep: 1159.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3568 [2024-06-11 05:42:28,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.59 | bwd_microstep: 1498.04 | bwd_inner_microstep: 1498.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 05:42:30,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.61 | bwd_microstep: 1560.24 | bwd_inner_microstep: 1560.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1926 [2024-06-11 05:42:31,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.96 | bwd_microstep: 788.12 | bwd_inner_microstep: 788.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2613 [2024-06-11 05:42:33,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.21 | bwd_microstep: 1047.91 | bwd_inner_microstep: 1047.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3553 [2024-06-11 05:42:35,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1455.74 | bwd_inner_microstep: 1455.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3561 [2024-06-11 05:42:37,083] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.29 | bwd_microstep: 1332.05 | bwd_inner_microstep: 1332.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 05:42:39,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.17 | bwd_microstep: 1557.21 | bwd_inner_microstep: 1557.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3767 [2024-06-11 05:42:41,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.91 | bwd_microstep: 1572.53 | bwd_inner_microstep: 1572.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3567 [2024-06-11 05:42:47,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.44 | optimizer_step: 6.58 [2024-06-11 05:42:47,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.73 | bwd_microstep: 5525.35 | bwd_inner_microstep: 1377.45 | bwd_allreduce_microstep: 4147.82 | step_microstep: 40.05 [2024-06-11 05:42:47,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15647.42 | bwd: 46003.76 | bwd_inner: 41854.86 | bwd_allreduce: 4148.15 | step: 41.58 {'loss': 1.22, 'learning_rate': 1.6263618710959494e-07, 'epoch': 0.96} ███▌| 1654/1726 [29:01:11<1:14:16, 61.90s/it] 96%|█████████▌| 1654/1726 [29:01:11<1:14:16, 61.90s/it] 96%|█████████▌| 1655/1726 [29:02:14<1:13:48, 62.37s/it] 96%|█████████▌| 1655/1726 [29:02:14<1:13:48, 62.37s/it] 96%|█████████▌| 1656/1726 [29:03:20<1:14:03, 63.48s/it] 96%|█████████▌| 1656/1726 [29:03:20<1:14:03, 63.48s/it] 96%|█████████▌| 1657/1726 [29:04:22<1:12:17, 62.86s/it] 96%|█████████▌| 1657/1726 [29:04:22<1:12:17, 62.86s/it] 96%|█████████▌| 1658/1726 [29:05:24<1:10:57, 62.60s/it] 96%|█████████▌| 1658/1726 [29:05:24<1:10:57dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-11 05:42:49,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.54 | bwd_microstep: 1236.23 | bwd_inner_microstep: 1236.17 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4304 [2024-06-11 05:42:51,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.20 | bwd_microstep: 1583.58 | bwd_inner_microstep: 1583.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3415 [2024-06-11 05:42:53,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1306.81 | bwd_inner_microstep: 1306.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3465 [2024-06-11 05:42:55,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1405.06 | bwd_inner_microstep: 1405.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1867 [2024-06-11 05:42:56,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.09 | bwd_microstep: 709.09 | bwd_inner_microstep: 709.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3494 [2024-06-11 05:42:58,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.46 | bwd_microstep: 1384.66 | bwd_inner_microstep: 1384.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4015 [2024-06-11 05:43:00,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.16 | bwd_microstep: 1609.06 | bwd_inner_microstep: 1609.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 2.69 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-11 05:43:01,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.78 | bwd_microstep: 794.82 | bwd_inner_microstep: 794.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3479 [2024-06-11 05:43:03,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.40 | bwd_microstep: 1311.51 | bwd_inner_microstep: 1311.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1994 [2024-06-11 05:43:04,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.54 | bwd_microstep: 898.04 | bwd_inner_microstep: 898.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 05:43:06,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.30 | bwd_microstep: 1484.81 | bwd_inner_microstep: 1484.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 05:43:08,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.89 | bwd_microstep: 1474.50 | bwd_inner_microstep: 1474.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-11 05:43:10,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.81 | bwd_microstep: 1480.27 | bwd_inner_microstep: 1480.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 05:43:12,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.35 | bwd_microstep: 1481.97 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-11 05:43:14,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.37 | bwd_microstep: 1281.74 | bwd_inner_microstep: 1281.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3633 [2024-06-11 05:43:16,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.32 | bwd_microstep: 1513.30 | bwd_inner_microstep: 1513.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2157 [2024-06-11 05:43:17,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.64 | bwd_microstep: 949.23 | bwd_inner_microstep: 949.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 05:43:19,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.17 | bwd_microstep: 1353.00 | bwd_inner_microstep: 1352.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 05:43:21,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.22 | bwd_microstep: 1255.43 | bwd_inner_microstep: 1255.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3630 [2024-06-11 05:43:23,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.15 | bwd_microstep: 1315.42 | bwd_inner_microstep: 1315.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3818 [2024-06-11 05:43:25,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.00 | bwd_microstep: 1358.96 | bwd_inner_microstep: 1358.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3835 [2024-06-11 05:43:27,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.63 | bwd_microstep: 1556.00 | bwd_inner_microstep: 1555.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2008 [2024-06-11 05:43:28,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.48 | bwd_microstep: 803.28 | bwd_inner_microstep: 803.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3603 [2024-06-11 05:43:29,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.88 | bwd_microstep: 1211.79 | bwd_inner_microstep: 1211.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 05:43:31,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.47 | bwd_microstep: 1391.37 | bwd_inner_microstep: 1391.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3429 [2024-06-11 05:43:33,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.71 | bwd_microstep: 1281.44 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3806 [2024-06-11 05:43:35,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.45 | bwd_microstep: 1598.18 | bwd_inner_microstep: 1598.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3633 [2024-06-11 05:43:37,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.45 | bwd_microstep: 1540.65 | bwd_inner_microstep: 1540.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2064 [2024-06-11 05:43:39,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 914.10 | bwd_inner_microstep: 914.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3583 [2024-06-11 05:43:41,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.97 | bwd_microstep: 1599.97 | bwd_inner_microstep: 1599.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3803 [2024-06-11 05:43:43,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.80 | bwd_microstep: 1452.97 | bwd_inner_microstep: 1452.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3749 [2024-06-11 05:43:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.07 | optimizer_step: 6.60 [2024-06-11 05:43:47,825] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.56 | bwd_microstep: 3697.91 | bwd_inner_microstep: 1927.84 | bwd_allreduce_microstep: 1770.02 | step_microstep: 38.24 [2024-06-11 05:43:47,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15802.28 | bwd: 44235.21 | bwd_inner: 42464.24 | bwd_allreduce: 1770.26 | step: 42.49 {'loss': 1.1647, 'learning_rate': 1.5789419973293306e-07, 'epoch': 0.96} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3406 [2024-06-11 05:43:49,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1441.07 | bwd_inner_microstep: 1441.00 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2413 [2024-06-11 05:43:51,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.95 | bwd_microstep: 1001.39 | bwd_inner_microstep: 1001.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1980 [2024-06-11 05:43:52,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.19 | bwd_microstep: 795.17 | bwd_inner_microstep: 795.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3797 [2024-06-11 05:43:54,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.65 | bwd_microstep: 1415.68 | bwd_inner_microstep: 1415.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232 [2024-06-11 05:43:55,600] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.65 | bwd_microstep: 965.42 | bwd_inner_microstep: 965.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3405 [2024-06-11 05:43:57,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1286.55 | bwd_inner_microstep: 1286.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 05:43:59,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.21 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3721 [2024-06-11 05:44:01,403] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.67 | bwd_microstep: 1631.06 | bwd_inner_microstep: 1631.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 05:44:03,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.41 | bwd_microstep: 1388.19 | bwd_inner_microstep: 1388.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3406 [2024-06-11 05:44:05,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.03 | bwd_microstep: 1248.72 | bwd_inner_microstep: 1248.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3569 [2024-06-11 05:44:06,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.64 | bwd_microstep: 1237.36 | bwd_inner_microstep: 1237.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3711 [2024-06-11 05:44:08,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.21 | bwd_microstep: 1519.23 | bwd_inner_microstep: 1519.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504 [2024-06-11 05:44:10,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.80 | bwd_microstep: 1434.83 | bwd_inner_microstep: 1434.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3672 [2024-06-11 05:44:13,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.62 | bwd_microstep: 1616.02 | bwd_inner_microstep: 1615.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3519 [2024-06-11 05:44:15,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.76 | bwd_microstep: 1556.65 | bwd_inner_microstep: 1556.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2886 [2024-06-11 05:44:16,760] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.09 | bwd_microstep: 1123.92 | bwd_inner_microstep: 1123.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2362 [2024-06-11 05:44:18,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 399.03 | bwd_microstep: 1086.16 | bwd_inner_microstep: 1086.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3445 [2024-06-11 05:44:19,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.73 | bwd_microstep: 1255.29 | bwd_inner_microstep: 1255.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3401 [2024-06-11 05:44:21,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.16 | bwd_microstep: 1371.69 | bwd_inner_microstep: 1371.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-11 05:44:23,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.52 | bwd_microstep: 1181.14 | bwd_inner_microstep: 1181.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2558 [2024-06-11 05:44:24,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.93 | bwd_microstep: 1034.55 | bwd_inner_microstep: 1034.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3503 [2024-06-11 05:44:26,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.72 | bwd_microstep: 1319.79 | bwd_inner_microstep: 1319.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2085 [2024-06-11 05:44:27,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.19 | bwd_microstep: 851.67 | bwd_inner_microstep: 851.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3599 [2024-06-11 05:44:29,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1406.04 | bwd_inner_microstep: 1406.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3545 [2024-06-11 05:44:31,710] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.11 | bwd_microstep: 1297.22 | bwd_inner_microstep: 1297.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2452 [2024-06-11 05:44:32,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.41 | bwd_microstep: 923.15 | bwd_inner_microstep: 923.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-11 05:44:35,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.47 | bwd_microstep: 1527.32 | bwd_inner_microstep: 1527.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1920 [2024-06-11 05:44:36,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.62 | bwd_microstep: 811.81 | bwd_inner_microstep: 811.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3587 [2024-06-11 05:44:38,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.28 | bwd_microstep: 1528.39 | bwd_inner_microstep: 1528.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3776 [2024-06-11 05:44:40,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.66 | bwd_microstep: 1647.26 | bwd_inner_microstep: 1647.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3823 [2024-06-11 05:44:42,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.28 | bwd_microstep: 1582.87 | bwd_inner_microstep: 1582.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2014 [2024-06-11 05:44:50,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.18 | optimizer_step: 6.60 [2024-06-11 05:44:50,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.85 | bwd_microstep: 7771.36 | bwd_inner_microstep: 877.26 | bwd_allreduce_microstep: 6894.04 | step_microstep: 39.16 [2024-06-11 05:44:50,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15181.03 | bwd: 47539.07 | bwd_inner: 40644.06 | bwd_allreduce: 6894.31 | step: 40.67 {'loss': 1.1898, 'learning_rate': 1.5322209614975214e-07, 'epoch': 0.96} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-11 05:44:52,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.86 | bwd_microstep: 1332.60 | bwd_inner_microstep: 1332.47 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3881 [2024-06-11 05:44:54,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.10 | bwd_microstep: 1477.73 | bwd_inner_microstep: 1477.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2317 [2024-06-11 05:44:55,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.00 | bwd_microstep: 880.38 | bwd_inner_microstep: 880.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3563 [2024-06-11 05:44:58,012] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.88 | bwd_microstep: 1455.90 | bwd_inner_microstep: 1455.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-11 05:44:59,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.33 | bwd_microstep: 1184.01 | bwd_inner_microstep: 1183.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 05:45:01,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1389.75 | bwd_inner_microstep: 1389.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-11 05:45:03,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.27 | bwd_microstep: 1527.78 | bwd_inner_microstep: 1527.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1952 [2024-06-11 05:45:04,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.54 | bwd_microstep: 789.27 | bwd_inner_microstep: 789.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-11 05:45:05,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.61 | bwd_microstep: 701.55 | bwd_inner_microstep: 701.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3710 [2024-06-11 05:45:07,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.31 | bwd_microstep: 1530.31 | bwd_inner_microstep: 1530.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3542 [2024-06-11 05:45:09,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.14 | bwd_microstep: 1326.07 | bwd_inner_microstep: 1326.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3487 [2024-06-11 05:45:11,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.17 | bwd_microstep: 1341.70 | bwd_inner_microstep: 1341.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 05:45:13,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.57 | bwd_microstep: 1500.94 | bwd_inner_microstep: 1500.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3383 [2024-06-11 05:45:15,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.23 | bwd_microstep: 1336.02 | bwd_inner_microstep: 1335.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3518 [2024-06-11 05:45:17,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.40 | bwd_microstep: 1575.63 | bwd_inner_microstep: 1575.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 05:45:19,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.73 | bwd_microstep: 1386.12 | bwd_inner_microstep: 1386.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2532 [2024-06-11 05:45:20,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.76 | bwd_microstep: 962.57 | bwd_inner_microstep: 962.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 05:45:22,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.62 | bwd_microstep: 1499.97 | bwd_inner_microstep: 1499.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-11 05:45:24,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.69 | bwd_microstep: 797.46 | bwd_inner_microstep: 797.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-11 05:45:26,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.16 | bwd_microstep: 1655.94 | bwd_inner_microstep: 1655.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3474 [2024-06-11 05:45:28,030] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.92 | bwd_microstep: 1216.12 | bwd_inner_microstep: 1216.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3821 [2024-06-11 05:45:30,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.99 | bwd_microstep: 1580.52 | bwd_inner_microstep: 1580.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3676 [2024-06-11 05:45:32,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.43 | bwd_microstep: 1756.17 | bwd_inner_microstep: 1756.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 05:45:34,514] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.37 | bwd_microstep: 1379.73 | bwd_inner_microstep: 1379.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 05:45:36,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.89 | bwd_microstep: 1345.10 | bwd_inner_microstep: 1345.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2086 [2024-06-11 05:45:37,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.34 | bwd_microstep: 1012.70 | bwd_inner_microstep: 1012.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-11 05:45:39,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.05 | bwd_microstep: 1549.20 | bwd_inner_microstep: 1549.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3686 [2024-06-11 05:45:42,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 661.58 | bwd_microstep: 1829.78 | bwd_inner_microstep: 1829.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3567 [2024-06-11 05:45:44,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.70 | bwd_microstep: 1493.07 | bwd_inner_microstep: 1493.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3733 [2024-06-11 05:45:46,582] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.44 | bwd_microstep: 1536.24 | bwd_inner_microstep: 1536.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.25 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3747 [2024-06-11 05:45:48,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.98 | bwd_microstep: 1542.55 | bwd_inner_microstep: 1542.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-11 05:45:55,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.68 | optimizer_gradients: 4.16 | optimizer_step: 6.61 [2024-06-11 05:45:55,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.28 | bwd_microstep: 6010.83 | bwd_inner_microstep: 1755.02 | bwd_allreduce_microstep: 4255.76 | step_microstep: 38.89 [2024-06-11 05:45:55,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16207.45 | bwd: 47903.76 | bwd_inner: 43646.95 | bwd_allreduce: 4256.05 | step: 40.65 {'loss': 1.1844, 'learning_rate': 1.4861989281517386e-07, 'epoch': 0.96} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3558 [2024-06-11 05:45:57,149] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.73 | bwd_microstep: 1287.94 | bwd_inner_microstep: 1287.75 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3858 [2024-06-11 05:45:59,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.18 | bwd_microstep: 1560.11 | bwd_inner_microstep: 1560.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 05:46:01,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.68 | bwd_microstep: 1402.23 | bwd_inner_microstep: 1402.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3810 [2024-06-11 05:46:03,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.38 | bwd_microstep: 1648.96 | bwd_inner_microstep: 1648.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 05:46:05,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1388.17 | bwd_inner_microstep: 1388.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-11 05:46:07,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.93 | bwd_microstep: 1158.82 | bwd_inner_microstep: 1158.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-11 05:46:08,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.75 | bwd_microstep: 797.54 | bwd_inner_microstep: 797.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2062 [2024-06-11 05:46:09,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.62 | bwd_microstep: 815.82 | bwd_inner_microstep: 815.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3616 [2024-06-11 05:46:10,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.17 | bwd_microstep: 1217.56 | bwd_inner_microstep: 1217.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3614 [2024-06-11 05:46:13,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.93 | bwd_microstep: 1508.32 | bwd_inner_microstep: 1508.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3685 [2024-06-11 05:46:15,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.84 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2024 [2024-06-11 05:46:16,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.86 | bwd_microstep: 744.09 | bwd_inner_microstep: 744.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-11 05:46:18,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.68 | bwd_microstep: 1281.44 | bwd_inner_microstep: 1281.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3520 [2024-06-11 05:46:20,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.81 | bwd_microstep: 1575.93 | bwd_inner_microstep: 1575.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3512 [2024-06-11 05:46:22,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.43 | bwd_microstep: 1577.91 | bwd_inner_microstep: 1577.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3688 [2024-06-11 05:46:24,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.73 | bwd_microstep: 1690.39 | bwd_inner_microstep: 1690.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3389 [2024-06-11 05:46:26,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.19 | bwd_microstep: 1303.94 | bwd_inner_microstep: 1303.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3631 [2024-06-11 05:46:28,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.14 | bwd_microstep: 1573.18 | bwd_inner_microstep: 1573.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3690 [2024-06-11 05:46:30,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.45 | bwd_microstep: 1577.36 | bwd_inner_microstep: 1577.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3522 [2024-06-11 05:46:32,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.18 | bwd_microstep: 1392.86 | bwd_inner_microstep: 1392.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-11 05:46:34,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.21 | bwd_microstep: 1295.42 | bwd_inner_microstep: 1295.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 05:46:36,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.17 | bwd_microstep: 1657.16 | bwd_inner_microstep: 1657.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3471 [2024-06-11 05:46:38,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.46 | bwd_microstep: 1184.18 | bwd_inner_microstep: 1184.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3556 [2024-06-11 05:46:40,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.32 | bwd_microstep: 1401.36 | bwd_inner_microstep: 1401.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3584 [2024-06-11 05:46:42,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.61 | bwd_microstep: 1239.66 | bwd_inner_microstep: 1239.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3632 [2024-06-11 05:46:43,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.15 | bwd_microstep: 1314.32 | bwd_inner_microstep: 1314.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3549 [2024-06-11 05:46:45,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.94 | bwd_microstep: 1329.90 | bwd_inner_microstep: 1329.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2286 [2024-06-11 05:46:47,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.63 | bwd_microstep: 880.49 | bwd_inner_microstep: 880.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3817 [2024-06-11 05:46:48,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1387.28 | bwd_inner_microstep: 1387.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3576 [2024-06-11 05:46:50,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.04 | bwd_microstep: 1494.51 | bwd_inner_microstep: 1494.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822 [2024-06-11 05:46:52,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.03 | bwd_microstep: 1393.74 | bwd_inner_microstep: 1393.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1977 [2024-06-11 05:46:57,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.14 | optimizer_step: 6.57 [2024-06-11 05:46:57,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.62 | bwd_microstep: 4464.00 | bwd_inner_microstep: 834.23 | bwd_allreduce_microstep: 3629.71 | step_microstep: 38.92 [2024-06-11 05:46:57,709] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15885.44 | bwd: 46131.37 | bwd_inner: 42500.60 | bwd_allreduce: 3630.02 | step: 40.49 {'loss': 1.1454, 'learning_rate': 1.4408760593813463e-07, 'epoch': 0.96} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3407 [2024-06-11 05:46:59,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.22 | bwd_microstep: 1236.51 | bwd_inner_microstep: 1236.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3482 [2024-06-11 05:47:01,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.58 | bwd_microstep: 1186.49 | bwd_inner_microstep: 1186.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3868 [2024-06-11 05:47:03,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.43 | bwd_microstep: 1469.09 | bwd_inner_microstep: 1469.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2304 [2024-06-11 05:47:04,367] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.18 | bwd_microstep: 908.42 | bwd_inner_microstep: 908.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3758 [2024-06-11 05:47:06,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.24 | bwd_microstep: 1469.92 | bwd_inner_microstep: 1469.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-11 05:47:08,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.89 | bwd_microstep: 1182.52 | bwd_inner_microstep: 1182.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 05:47:09,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.08 | bwd_microstep: 1293.18 | bwd_inner_microstep: 1293.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3407 [2024-06-11 05:47:11,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.43 | bwd_microstep: 1344.76 | bwd_inner_microstep: 1344.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3493 [2024-06-11 05:47:13,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1483.45 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1988 [2024-06-11 05:47:14,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.50 | bwd_microstep: 898.86 | bwd_inner_microstep: 898.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417 [2024-06-11 05:47:16,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.21 | bwd_microstep: 1369.72 | bwd_inner_microstep: 1369.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1917 [2024-06-11 05:47:17,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.51 | bwd_microstep: 785.43 | bwd_inner_microstep: 785.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3533 [2024-06-11 05:47:20,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.43 | bwd_microstep: 1585.48 | bwd_inner_microstep: 1585.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 21, images per sample: 5.25, dynamic token length: 2723 [2024-06-11 05:47:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 406.09 | bwd_microstep: 1076.36 | bwd_inner_microstep: 1076.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3519 [2024-06-11 05:47:23,339] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.12 | bwd_microstep: 1241.62 | bwd_inner_microstep: 1241.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469 [2024-06-11 05:47:25,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.83 | bwd_microstep: 1341.30 | bwd_inner_microstep: 1341.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2185 [2024-06-11 05:47:26,322] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.81 | bwd_microstep: 805.89 | bwd_inner_microstep: 805.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3547 [2024-06-11 05:47:28,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.60 | bwd_microstep: 1296.20 | bwd_inner_microstep: 1296.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 05:47:30,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.86 | bwd_microstep: 1661.61 | bwd_inner_microstep: 1661.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3790 [2024-06-11 05:47:32,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.13 | bwd_microstep: 1554.39 | bwd_inner_microstep: 1554.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 05:47:34,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.64 | bwd_microstep: 1399.28 | bwd_inner_microstep: 1399.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 05:47:36,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.22 | bwd_microstep: 1277.18 | bwd_inner_microstep: 1277.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 05:47:37,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.04 | bwd_microstep: 975.86 | bwd_inner_microstep: 975.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3613 [2024-06-11 05:47:39,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.93 | bwd_microstep: 1609.46 | bwd_inner_microstep: 1609.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568 [2024-06-11 05:47:41,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.67 | bwd_microstep: 1302.24 | bwd_inner_microstep: 1302.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3807 [2024-06-11 05:47:43,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.78 | bwd_microstep: 1357.52 | bwd_inner_microstep: 1357.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 05:47:45,786] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.17 | bwd_microstep: 1665.04 | bwd_inner_microstep: 1665.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3781 [2024-06-11 05:47:47,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.04 | bwd_microstep: 1495.02 | bwd_inner_microstep: 1494.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436 [2024-06-11 05:47:49,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.11 | bwd_microstep: 1312.57 | bwd_inner_microstep: 1312.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3442 [2024-06-11 05:47:51,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.59 | bwd_microstep: 1348.88 | bwd_inner_microstep: 1348.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3596 [2024-06-11 05:47:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.47 | bwd_microstep: 1705.35 | bwd_inner_microstep: 1705.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3807 [2024-06-11 05:47:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.23 | optimizer_step: 6.60 [2024-06-11 05:47:59,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.62 | bwd_microstep: 4706.76 | bwd_inner_microstep: 1753.76 | bwd_allreduce_microstep: 2952.94 | step_microstep: 39.50 [2024-06-11 05:47:59,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15810.79 | bwd: 45346.38 | bwd_inner: 42392.52 | bwd_allreduce: 2953.18 | step: 40.93 {'loss': 1.146, 'learning_rate': 1.396252514813279e-07, 'epoch': 0.96} , 62.60s/it] 96%|█████████▌| 1659/1726 [29:06:24<1:09:09, 61.94s/it] 96%|█████████▌| 1659/1726 [29:06:24<1:09:09, 61.94s/it] 96%|█████████▌| 1660/1726 [29:07:27<1:08:30, 62.27s/it] 96%|█████████▌| 1660/1726 [29:07:27<1:08:30, 62.27s/it] 96%|█████████▌| 1661/1726 [29:08:32<1:08:10, 62.93s/it] 96%|█████████▌| 1661/1726 [29:08:32<1:08:10, 62.93s/it] 96%|█████████▋| 1662/1726 [29:09:34<1:06:56, 62.76s/it] 96%|█████████▋| 1662/1726 [29:09:34<1:06:56, 62.76s/it] 96%|█████████▋| 1663/1726 [29:10:35<1:05:29, 62.38s/it] 96%|███████�dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 05:48:01,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.13 | bwd_microstep: 1342.14 | bwd_inner_microstep: 1341.99 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-11 05:48:02,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.11 | bwd_microstep: 1240.52 | bwd_inner_microstep: 1240.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 05:48:04,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.23 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2302 [2024-06-11 05:48:05,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.54 | bwd_microstep: 974.67 | bwd_inner_microstep: 974.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3402 [2024-06-11 05:48:07,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.74 | bwd_microstep: 1277.07 | bwd_inner_microstep: 1277.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 05:48:09,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.17 | bwd_microstep: 1281.29 | bwd_inner_microstep: 1281.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3559 [2024-06-11 05:48:11,325] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.11 | bwd_microstep: 1298.12 | bwd_inner_microstep: 1298.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2718 [2024-06-11 05:48:12,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.11 | bwd_microstep: 969.39 | bwd_inner_microstep: 969.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 05:48:14,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.21 | bwd_microstep: 1248.07 | bwd_inner_microstep: 1248.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3404 [2024-06-11 05:48:16,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.85 | bwd_microstep: 1181.59 | bwd_inner_microstep: 1181.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3706 [2024-06-11 05:48:18,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.68 | bwd_microstep: 1620.46 | bwd_inner_microstep: 1620.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-11 05:48:19,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.24 | bwd_microstep: 797.03 | bwd_inner_microstep: 797.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3494 [2024-06-11 05:48:21,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.39 | bwd_microstep: 1483.66 | bwd_inner_microstep: 1483.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3385 [2024-06-11 05:48:23,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.25 | bwd_microstep: 1334.59 | bwd_inner_microstep: 1334.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3486 [2024-06-11 05:48:25,341] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.70 | bwd_microstep: 1510.37 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3589 [2024-06-11 05:48:27,161] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.79 | bwd_microstep: 1315.02 | bwd_inner_microstep: 1314.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3652 [2024-06-11 05:48:28,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.41 | bwd_microstep: 1326.13 | bwd_inner_microstep: 1326.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 05:48:30,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.64 | bwd_microstep: 1385.54 | bwd_inner_microstep: 1385.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3512 [2024-06-11 05:48:32,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.38 | bwd_microstep: 1289.03 | bwd_inner_microstep: 1289.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3697 [2024-06-11 05:48:34,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1631.29 | bwd_inner_microstep: 1631.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-11 05:48:37,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.11 | bwd_microstep: 1528.35 | bwd_inner_microstep: 1528.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 05:48:38,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.69 | bwd_microstep: 1254.38 | bwd_inner_microstep: 1254.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3429 [2024-06-11 05:48:40,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.50 | bwd_microstep: 1252.72 | bwd_inner_microstep: 1252.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2170 [2024-06-11 05:48:41,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 323.89 | bwd_microstep: 856.57 | bwd_inner_microstep: 856.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2180 [2024-06-11 05:48:42,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.79 | bwd_microstep: 764.08 | bwd_inner_microstep: 764.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-11 05:48:44,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1407.03 | bwd_inner_microstep: 1407.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 05:48:46,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.13 | bwd_microstep: 1381.03 | bwd_inner_microstep: 1381.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3720 [2024-06-11 05:48:48,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.66 | bwd_microstep: 1467.99 | bwd_inner_microstep: 1467.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-11 05:48:50,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1492.36 | bwd_inner_microstep: 1492.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 05:48:52,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.61 | bwd_microstep: 1351.10 | bwd_inner_microstep: 1351.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3749 [2024-06-11 05:48:54,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.55 | bwd_microstep: 1572.58 | bwd_inner_microstep: 1572.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3435 [2024-06-11 05:49:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.59 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 05:49:00,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 4808.40 | bwd_inner_microstep: 1765.40 | bwd_allreduce_microstep: 3042.94 | step_microstep: 39.25 [2024-06-11 05:49:00,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15589.94 | bwd: 44986.04 | bwd_inner: 41942.08 | bwd_allreduce: 3043.24 | step: 40.82 {'loss': 1.1778, 'learning_rate': 1.3523284516113955e-07, 'epoch': 0.96} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 05:49:01,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1242.66 | bwd_inner_microstep: 1242.48 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-11 05:49:02,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.30 | bwd_microstep: 785.90 | bwd_inner_microstep: 785.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 05:49:05,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.17 | bwd_microstep: 1550.59 | bwd_inner_microstep: 1550.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 05:49:07,201] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.58 | bwd_microstep: 1548.79 | bwd_inner_microstep: 1548.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-11 05:49:08,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.32 | bwd_microstep: 1246.83 | bwd_inner_microstep: 1246.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 05:49:10,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.52 | bwd_microstep: 1280.58 | bwd_inner_microstep: 1280.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 05:49:12,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.78 | bwd_microstep: 1383.56 | bwd_inner_microstep: 1383.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1904 [2024-06-11 05:49:13,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.63 | bwd_microstep: 684.67 | bwd_inner_microstep: 684.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3438 [2024-06-11 05:49:15,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.34 | bwd_microstep: 1253.57 | bwd_inner_microstep: 1253.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1958 [2024-06-11 05:49:16,412] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.47 | bwd_microstep: 793.61 | bwd_inner_microstep: 793.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2924 [2024-06-11 05:49:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.80 | bwd_microstep: 1126.76 | bwd_inner_microstep: 1126.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-11 05:49:19,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.66 | bwd_microstep: 1437.93 | bwd_inner_microstep: 1437.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3411 [2024-06-11 05:49:21,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.49 | bwd_microstep: 1256.49 | bwd_inner_microstep: 1256.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3835 [2024-06-11 05:49:24,105] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.36 | bwd_microstep: 1755.02 | bwd_inner_microstep: 1754.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 05:49:26,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1376.91 | bwd_inner_microstep: 1376.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 05:49:27,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.32 | bwd_microstep: 1374.36 | bwd_inner_microstep: 1374.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.22 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 05:49:30,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.48 | bwd_microstep: 1517.51 | bwd_inner_microstep: 1517.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2072 [2024-06-11 05:49:31,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.92 | bwd_microstep: 848.28 | bwd_inner_microstep: 848.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-11 05:49:32,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.32 | bwd_microstep: 789.55 | bwd_inner_microstep: 789.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3625 [2024-06-11 05:49:34,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.85 | bwd_microstep: 1508.67 | bwd_inner_microstep: 1508.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1918 [2024-06-11 05:49:35,324] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 266.22 | bwd_microstep: 688.56 | bwd_inner_microstep: 688.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 05:49:37,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.27 | bwd_microstep: 1456.95 | bwd_inner_microstep: 1456.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 05:49:39,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.34 | bwd_microstep: 1415.02 | bwd_inner_microstep: 1414.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3817 [2024-06-11 05:49:41,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.29 | bwd_microstep: 1657.44 | bwd_inner_microstep: 1657.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3468 [2024-06-11 05:49:43,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.90 | bwd_microstep: 1214.70 | bwd_inner_microstep: 1214.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-11 05:49:45,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.95 | bwd_microstep: 1452.03 | bwd_inner_microstep: 1452.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2092 [2024-06-11 05:49:46,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.10 | bwd_microstep: 919.11 | bwd_inner_microstep: 919.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3598 [2024-06-11 05:49:48,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.20 | bwd_microstep: 1609.19 | bwd_inner_microstep: 1609.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3807 [2024-06-11 05:49:50,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.61 | bwd_microstep: 1631.57 | bwd_inner_microstep: 1631.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3469 [2024-06-11 05:49:52,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.76 | bwd_microstep: 1405.63 | bwd_inner_microstep: 1405.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3601 [2024-06-11 05:49:54,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.05 | bwd_microstep: 1414.51 | bwd_inner_microstep: 1414.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 05:50:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.30 | optimizer_step: 6.56 [2024-06-11 05:50:02,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.16 | bwd_microstep: 7005.38 | bwd_inner_microstep: 1547.40 | bwd_allreduce_microstep: 5457.90 | step_microstep: 39.94 [2024-06-11 05:50:02,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15379.95 | bwd: 46632.37 | bwd_inner: 41173.40 | bwd_allreduce: 5458.21 | step: 41.73 {'loss': 1.2043, 'learning_rate': 1.30910402447606e-07, 'epoch': 0.96} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3468 [2024-06-11 05:50:04,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.12 | bwd_microstep: 1567.09 | bwd_inner_microstep: 1567.01 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3401 [2024-06-11 05:50:06,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.51 | bwd_microstep: 1341.15 | bwd_inner_microstep: 1341.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3555 [2024-06-11 05:50:08,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.55 | bwd_microstep: 1393.12 | bwd_inner_microstep: 1393.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-11 05:50:10,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.52 | bwd_microstep: 1451.58 | bwd_inner_microstep: 1451.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2924 [2024-06-11 05:50:11,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.44 | bwd_microstep: 1093.81 | bwd_inner_microstep: 1093.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 05:50:14,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.60 | bwd_microstep: 1552.51 | bwd_inner_microstep: 1552.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 05:50:15,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1379.70 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 05:50:17,895] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.32 | bwd_microstep: 1381.48 | bwd_inner_microstep: 1381.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 855 [2024-06-11 05:50:18,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 144.59 | bwd_microstep: 379.79 | bwd_inner_microstep: 379.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1974 [2024-06-11 05:50:19,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.05 | bwd_microstep: 895.02 | bwd_inner_microstep: 894.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3507 [2024-06-11 05:50:21,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.03 | bwd_microstep: 1483.96 | bwd_inner_microstep: 1483.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2092 [2024-06-11 05:50:22,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.60 | bwd_microstep: 821.10 | bwd_inner_microstep: 821.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-11 05:50:23,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 790.00 | bwd_inner_microstep: 789.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3509 [2024-06-11 05:50:26,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.72 | bwd_microstep: 1584.61 | bwd_inner_microstep: 1584.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3479 [2024-06-11 05:50:28,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.22 | bwd_microstep: 1446.29 | bwd_inner_microstep: 1446.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3609 [2024-06-11 05:50:30,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.30 | bwd_microstep: 1503.88 | bwd_inner_microstep: 1503.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3825 [2024-06-11 05:50:32,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.62 | bwd_microstep: 1719.48 | bwd_inner_microstep: 1719.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-11 05:50:34,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.75 | bwd_microstep: 1420.03 | bwd_inner_microstep: 1420.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3507 [2024-06-11 05:50:36,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.81 | bwd_microstep: 1333.89 | bwd_inner_microstep: 1333.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3836 [2024-06-11 05:50:38,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.99 | bwd_microstep: 1662.61 | bwd_inner_microstep: 1662.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 05:50:40,368] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.84 | bwd_microstep: 1251.24 | bwd_inner_microstep: 1251.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299 [2024-06-11 05:50:41,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.82 | bwd_microstep: 976.50 | bwd_inner_microstep: 976.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3613 [2024-06-11 05:50:43,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1413.41 | bwd_inner_microstep: 1413.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616 [2024-06-11 05:50:45,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.63 | bwd_microstep: 1311.03 | bwd_inner_microstep: 1311.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2269 [2024-06-11 05:50:46,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.43 | bwd_microstep: 970.96 | bwd_inner_microstep: 970.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2548 [2024-06-11 05:50:48,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.25 | bwd_microstep: 1094.22 | bwd_inner_microstep: 1094.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3518 [2024-06-11 05:50:49,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.18 | bwd_microstep: 1192.53 | bwd_inner_microstep: 1192.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-11 05:50:52,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.65 | bwd_microstep: 1497.80 | bwd_inner_microstep: 1497.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3599 [2024-06-11 05:50:54,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.06 | bwd_microstep: 1703.97 | bwd_inner_microstep: 1703.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3436 [2024-06-11 05:50:56,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.01 | bwd_microstep: 1334.04 | bwd_inner_microstep: 1334.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3805 [2024-06-11 05:50:58,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.04 | bwd_microstep: 1449.26 | bwd_inner_microstep: 1449.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3773 [2024-06-11 05:51:06,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.13 | optimizer_step: 6.60 [2024-06-11 05:51:06,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.83 | bwd_microstep: 7250.96 | bwd_inner_microstep: 1550.60 | bwd_allreduce_microstep: 5700.30 | step_microstep: 38.30 [2024-06-11 05:51:06,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15602.36 | bwd: 47647.02 | bwd_inner: 41945.74 | bwd_allreduce: 5700.57 | step: 39.84 {'loss': 1.1698, 'learning_rate': 1.2665793856434516e-07, 'epoch': 0.97} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 05:51:07,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.12 | bwd_microstep: 1286.92 | bwd_inner_microstep: 1286.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3700 [2024-06-11 05:51:09,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.91 | bwd_microstep: 1522.38 | bwd_inner_microstep: 1522.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3838 [2024-06-11 05:51:12,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.12 | bwd_microstep: 1513.20 | bwd_inner_microstep: 1513.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3880 [2024-06-11 05:51:13,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.55 | bwd_microstep: 1415.95 | bwd_inner_microstep: 1415.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3544 [2024-06-11 05:51:15,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.53 | bwd_microstep: 1425.20 | bwd_inner_microstep: 1425.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1887 [2024-06-11 05:51:16,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.57 | bwd_microstep: 681.13 | bwd_inner_microstep: 681.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 05:51:18,822] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.13 | bwd_microstep: 1387.49 | bwd_inner_microstep: 1387.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3496 [2024-06-11 05:51:20,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.14 | bwd_microstep: 1251.83 | bwd_inner_microstep: 1251.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2940 [2024-06-11 05:51:22,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.49 | bwd_microstep: 1099.53 | bwd_inner_microstep: 1099.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3474 [2024-06-11 05:51:23,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.87 | bwd_microstep: 1313.39 | bwd_inner_microstep: 1313.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1963 [2024-06-11 05:51:25,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.32 | bwd_microstep: 893.87 | bwd_inner_microstep: 893.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3561 [2024-06-11 05:51:27,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.34 | bwd_microstep: 1694.93 | bwd_inner_microstep: 1694.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664 [2024-06-11 05:51:29,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.45 | bwd_microstep: 1483.17 | bwd_inner_microstep: 1483.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3517 [2024-06-11 05:51:31,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.67 | bwd_microstep: 1584.30 | bwd_inner_microstep: 1584.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3440 [2024-06-11 05:51:33,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.76 | bwd_microstep: 1283.21 | bwd_inner_microstep: 1283.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3529 [2024-06-11 05:51:35,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.06 | bwd_microstep: 1449.07 | bwd_inner_microstep: 1449.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3560 [2024-06-11 05:51:37,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.91 | bwd_microstep: 1334.70 | bwd_inner_microstep: 1334.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3508 [2024-06-11 05:51:39,363] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.35 | bwd_microstep: 1487.18 | bwd_inner_microstep: 1487.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3829 [2024-06-11 05:51:41,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.49 | bwd_microstep: 1461.94 | bwd_inner_microstep: 1461.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-11 05:51:42,730] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.95 | bwd_microstep: 977.13 | bwd_inner_microstep: 977.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3672 [2024-06-11 05:51:44,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.82 | bwd_microstep: 1527.39 | bwd_inner_microstep: 1527.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3606 [2024-06-11 05:51:46,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.98 | bwd_microstep: 1513.40 | bwd_inner_microstep: 1513.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1936 [2024-06-11 05:51:47,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.69 | bwd_microstep: 729.73 | bwd_inner_microstep: 729.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3550 [2024-06-11 05:51:50,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.85 | bwd_microstep: 1567.01 | bwd_inner_microstep: 1566.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1999 [2024-06-11 05:51:51,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.56 | bwd_microstep: 836.12 | bwd_inner_microstep: 836.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 615 [2024-06-11 05:51:51,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.74 | bwd_microstep: 259.99 | bwd_inner_microstep: 259.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3431 [2024-06-11 05:51:53,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.20 | bwd_microstep: 1281.63 | bwd_inner_microstep: 1281.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 05:51:55,447] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.19 | bwd_microstep: 1487.12 | bwd_inner_microstep: 1487.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 05:51:57,452] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1450.68 | bwd_inner_microstep: 1450.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3630 [2024-06-11 05:51:59,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.35 | bwd_microstep: 1405.26 | bwd_inner_microstep: 1405.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3766 [2024-06-11 05:52:01,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.08 | bwd_microstep: 1566.22 | bwd_inner_microstep: 1566.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3615 [2024-06-11 05:52:09,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.31 | optimizer_step: 6.60 [2024-06-11 05:52:09,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.69 | bwd_microstep: 7289.32 | bwd_inner_microstep: 1660.24 | bwd_allreduce_microstep: 5629.00 | step_microstep: 39.99 [2024-06-11 05:52:09,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15589.89 | bwd: 47460.45 | bwd_inner: 41830.51 | bwd_allreduce: 5629.25 | step: 41.55 {'loss': 1.156, 'learning_rate': 1.224754684885099e-07, 'epoch': 0.97} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2052 [2024-06-11 05:52:10,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 338.81 | bwd_microstep: 907.22 | bwd_inner_microstep: 907.08 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3562 [2024-06-11 05:52:12,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.70 | bwd_microstep: 1495.35 | bwd_inner_microstep: 1495.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3861 [2024-06-11 05:52:14,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.35 | bwd_microstep: 1557.98 | bwd_inner_microstep: 1557.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3496 [2024-06-11 05:52:16,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.67 | bwd_microstep: 1483.30 | bwd_inner_microstep: 1483.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3545 [2024-06-11 05:52:18,893] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.26 | bwd_microstep: 1395.77 | bwd_inner_microstep: 1395.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2233 [2024-06-11 05:52:20,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.46 | bwd_microstep: 864.02 | bwd_inner_microstep: 863.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 05:52:21,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1281.33 | bwd_inner_microstep: 1281.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1968 [2024-06-11 05:52:23,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.09 | bwd_microstep: 892.11 | bwd_inner_microstep: 892.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 05:52:25,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.73 | bwd_microstep: 1411.30 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3729 [2024-06-11 05:52:27,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.69 | bwd_microstep: 1732.59 | bwd_inner_microstep: 1732.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4017 [2024-06-11 05:52:29,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.76 | bwd_microstep: 1704.33 | bwd_inner_microstep: 1704.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1977 [2024-06-11 05:52:30,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.40 | bwd_microstep: 798.00 | bwd_inner_microstep: 797.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 05:52:32,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.18 | bwd_microstep: 1383.51 | bwd_inner_microstep: 1383.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1865 [2024-06-11 05:52:33,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.83 | bwd_microstep: 709.03 | bwd_inner_microstep: 709.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1869 [2024-06-11 05:52:34,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 260.92 | bwd_microstep: 677.93 | bwd_inner_microstep: 677.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3402 [2024-06-11 05:52:36,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.08 | bwd_microstep: 1436.07 | bwd_inner_microstep: 1436.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3640 [2024-06-11 05:52:38,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.16 | bwd_microstep: 1314.96 | bwd_inner_microstep: 1314.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 05:52:40,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.24 | bwd_microstep: 1489.46 | bwd_inner_microstep: 1489.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3530 [2024-06-11 05:52:42,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.47 | bwd_microstep: 1293.27 | bwd_inner_microstep: 1293.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3828 [2024-06-11 05:52:44,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.76 | bwd_microstep: 1511.21 | bwd_inner_microstep: 1511.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3539 [2024-06-11 05:52:46,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.97 | bwd_microstep: 1594.31 | bwd_inner_microstep: 1594.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3482 [2024-06-11 05:52:48,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.02 | bwd_microstep: 1441.32 | bwd_inner_microstep: 1441.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3553 [2024-06-11 05:53:07,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.44 | bwd_microstep: 1410.96 | bwd_inner_microstep: 1410.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 05:53:09,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.29 | bwd_microstep: 1651.74 | bwd_inner_microstep: 1651.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3476 [2024-06-11 05:53:11,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1374.88 | bwd_inner_microstep: 1374.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3719 [2024-06-11 05:53:13,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.37 | bwd_microstep: 1433.35 | bwd_inner_microstep: 1433.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3777 [2024-06-11 05:53:15,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.89 | bwd_microstep: 1347.91 | bwd_inner_microstep: 1347.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3561 [2024-06-11 05:53:17,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.88 | bwd_microstep: 1295.41 | bwd_inner_microstep: 1295.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3822 [2024-06-11 05:53:19,548] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.02 | bwd_microstep: 1555.41 | bwd_inner_microstep: 1555.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3616 [2024-06-11 05:53:21,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.62 | bwd_microstep: 1307.04 | bwd_inner_microstep: 1307.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3389 [2024-06-11 05:53:23,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1396.42 | bwd_inner_microstep: 1396.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3388 [2024-06-11 05:53:25,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.06 | optimizer_step: 6.62 [2024-06-11 05:53:25,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.45 | bwd_microstep: 1468.34 | bwd_inner_microstep: 1460.28 | bwd_allreduce_microstep: 8.01 | step_microstep: 38.76 [2024-06-11 05:53:25,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15962.63 | bwd: 42615.84 | bwd_inner: 42606.82 | bwd_allreduce: 8.29 | step: 40.35 �█▋| 1663/1726 [29:10:35<1:05:29, 62.38s/it] 96%|█████████▋| 1664/1726 [29:11:36<1:04:00, 61.94s/it] 96%|█████████▋| 1664/1726 [29:11:36<1:04:00, 61.94s/it] 96%|█████████▋| 1665/1726 [29:12:39<1:03:05, 62.06s/it] 96%|█████████▋| 1665/1726 [29:12:39<1:03:05, 62.06s/it] 97%|█████████▋| 1666/1726 [29:13:42<1:02:31, 62.52s/it] 97%|█████████▋| 1666/1726 [29:13:42<1:02:31, 62.52s/it] 97%|█████████▋| 1667/1726 [29:14:46<1:01:44, 62.78s/it] 97%|█████████▋| 1667/1726 [29:14:46<1:01:44, 62.78s/it] 97%|█████████▋| 1668/1726 [29:16:02<1:04:29, 66.71s/it] {'loss': 1.1779, 'learning_rate': 1.1836300695074354e-07, 'epoch': 0.97} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4476 [2024-06-11 05:53:27,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 672.25 | bwd_microstep: 1812.88 | bwd_inner_microstep: 1812.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3923 [2024-06-11 05:53:29,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.30 | bwd_microstep: 1423.19 | bwd_inner_microstep: 1423.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 05:53:31,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.21 | bwd_microstep: 1343.19 | bwd_inner_microstep: 1343.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400 [2024-06-11 05:53:33,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.18 | bwd_microstep: 1181.85 | bwd_inner_microstep: 1181.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3600 [2024-06-11 05:53:35,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.07 | bwd_microstep: 1438.42 | bwd_inner_microstep: 1438.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-11 05:53:37,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.68 | bwd_microstep: 1431.57 | bwd_inner_microstep: 1431.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-11 05:53:38,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.13 | bwd_microstep: 1150.08 | bwd_inner_microstep: 1150.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3598 [2024-06-11 05:53:40,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.30 | bwd_microstep: 1310.50 | bwd_inner_microstep: 1310.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-11 05:53:42,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.96 | bwd_microstep: 1531.34 | bwd_inner_microstep: 1531.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3570 [2024-06-11 05:53:44,715] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1395.51 | bwd_inner_microstep: 1395.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3502 [2024-06-11 05:53:46,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.24 | bwd_microstep: 1585.82 | bwd_inner_microstep: 1585.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492 [2024-06-11 05:53:49,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1534.58 | bwd_inner_microstep: 1534.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3506 [2024-06-11 05:53:50,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.76 | bwd_microstep: 1381.36 | bwd_inner_microstep: 1381.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 41, images per sample: 10.25, dynamic token length: 3638 [2024-06-11 05:53:53,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.81 | bwd_microstep: 1621.46 | bwd_inner_microstep: 1621.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3637 [2024-06-11 05:53:55,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.28 | bwd_microstep: 1514.45 | bwd_inner_microstep: 1514.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3690 [2024-06-11 05:53:57,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.26 | bwd_microstep: 1330.57 | bwd_inner_microstep: 1330.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3638 [2024-06-11 05:53:58,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.79 | bwd_microstep: 1280.86 | bwd_inner_microstep: 1280.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3631 [2024-06-11 05:54:00,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1416.81 | bwd_inner_microstep: 1416.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3592 [2024-06-11 05:54:03,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.64 | bwd_microstep: 1606.85 | bwd_inner_microstep: 1606.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3820 [2024-06-11 05:54:05,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.79 | bwd_microstep: 1554.69 | bwd_inner_microstep: 1554.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1974 [2024-06-11 05:54:06,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.51 | bwd_microstep: 703.42 | bwd_inner_microstep: 703.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-11 05:54:08,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.51 | bwd_microstep: 1537.51 | bwd_inner_microstep: 1537.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611 [2024-06-11 05:54:10,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.44 | bwd_microstep: 1310.23 | bwd_inner_microstep: 1310.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3541 [2024-06-11 05:54:12,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.57 | bwd_microstep: 1497.42 | bwd_inner_microstep: 1497.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 05:54:13,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.12 | bwd_microstep: 1353.27 | bwd_inner_microstep: 1353.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 05:54:16,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.29 | bwd_microstep: 1496.97 | bwd_inner_microstep: 1496.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 05:54:17,989] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.54 | bwd_microstep: 1395.33 | bwd_inner_microstep: 1395.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 05:54:19,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.13 | bwd_microstep: 1394.39 | bwd_inner_microstep: 1394.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 05:54:21,957] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.01 | bwd_microstep: 1474.04 | bwd_inner_microstep: 1474.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 05:54:23,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.76 | bwd_microstep: 1251.08 | bwd_inner_microstep: 1251.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3594 [2024-06-11 05:54:25,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.60 | bwd_microstep: 1363.63 | bwd_inner_microstep: 1363.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3795 [2024-06-11 05:56:00,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.58 [2024-06-11 05:56:00,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 637.37 | bwd_microstep: 94215.50 | bwd_inner_microstep: 1981.51 | bwd_allreduce_microstep: 92233.92 | step_microstep: 40.01 [2024-06-11 05:56:00,479] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16960.24 | bwd: 137838.80 | bwd_inner: 45603.96 | bwd_allreduce: 92234.16 | step: 41.46 {'loss': 1.1635, 'learning_rate': 1.1432056843511342e-07, 'epoch': 0.97} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3472 [2024-06-11 05:56:02,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.75 | bwd_microstep: 1397.28 | bwd_inner_microstep: 1397.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 05:56:04,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.79 | bwd_microstep: 1369.63 | bwd_inner_microstep: 1369.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3883 [2024-06-11 05:56:06,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 615.24 | bwd_microstep: 1671.98 | bwd_inner_microstep: 1671.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3832 [2024-06-11 05:56:08,611] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.06 | bwd_microstep: 1446.25 | bwd_inner_microstep: 1446.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3821 [2024-06-11 05:56:10,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.16 | bwd_microstep: 1444.37 | bwd_inner_microstep: 1444.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3500 [2024-06-11 05:56:12,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.12 | bwd_microstep: 1280.78 | bwd_inner_microstep: 1280.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3428 [2024-06-11 05:56:14,230] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.42 | bwd_microstep: 1341.28 | bwd_inner_microstep: 1341.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3474 [2024-06-11 05:56:15,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.82 | bwd_microstep: 1187.32 | bwd_inner_microstep: 1187.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2025 [2024-06-11 05:56:17,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.32 | bwd_microstep: 808.88 | bwd_inner_microstep: 808.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3406 [2024-06-11 05:56:18,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.29 | bwd_microstep: 1147.71 | bwd_inner_microstep: 1147.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2167 [2024-06-11 05:56:19,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.80 | bwd_microstep: 788.55 | bwd_inner_microstep: 788.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 39, images per sample: 9.75, dynamic token length: 3812 [2024-06-11 05:56:21,926] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.96 | bwd_microstep: 1622.19 | bwd_inner_microstep: 1622.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3504 [2024-06-11 05:56:23,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.78 | bwd_microstep: 1482.31 | bwd_inner_microstep: 1482.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3938 [2024-06-11 05:56:26,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 645.29 | bwd_microstep: 1758.53 | bwd_inner_microstep: 1758.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 05:56:28,290] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.77 | bwd_microstep: 1383.19 | bwd_inner_microstep: 1383.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2103 [2024-06-11 05:56:29,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.01 | bwd_microstep: 952.84 | bwd_inner_microstep: 952.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-11 05:56:31,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1425.34 | bwd_inner_microstep: 1425.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922 [2024-06-11 05:56:32,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.31 | bwd_microstep: 696.58 | bwd_inner_microstep: 696.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 05:56:34,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.15 | bwd_microstep: 1279.67 | bwd_inner_microstep: 1279.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3573 [2024-06-11 05:56:36,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.76 | bwd_microstep: 1235.42 | bwd_inner_microstep: 1235.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-11 05:56:37,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.64 | bwd_microstep: 1394.41 | bwd_inner_microstep: 1394.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3465 [2024-06-11 05:56:39,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.08 | bwd_microstep: 1212.76 | bwd_inner_microstep: 1212.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3824 [2024-06-11 05:56:41,579] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.41 | bwd_microstep: 1386.89 | bwd_inner_microstep: 1386.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 05:56:43,355] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.38 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3549 [2024-06-11 05:56:45,323] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.74 | bwd_microstep: 1420.76 | bwd_inner_microstep: 1420.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3684 [2024-06-11 05:56:47,510] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 1588.80 | bwd_inner_microstep: 1588.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3798 [2024-06-11 05:56:49,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.74 | bwd_microstep: 1413.93 | bwd_inner_microstep: 1413.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3431 [2024-06-11 05:56:51,200] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.17 | bwd_microstep: 1254.10 | bwd_inner_microstep: 1254.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2481 [2024-06-11 05:56:52,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.54 | bwd_microstep: 1062.89 | bwd_inner_microstep: 1062.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3576 [2024-06-11 05:56:54,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.69 | bwd_microstep: 1528.73 | bwd_inner_microstep: 1528.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3465 [2024-06-11 05:56:56,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.62 | bwd_microstep: 1570.09 | bwd_inner_microstep: 1570.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-11 05:57:01,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-11 05:57:01,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.39 | bwd_microstep: 4313.79 | bwd_inner_microstep: 1813.57 | bwd_allreduce_microstep: 2500.16 | step_microstep: 38.94 [2024-06-11 05:57:01,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15918.39 | bwd: 45152.17 | bwd_inner: 42651.11 | bwd_allreduce: 2500.39 | step: 40.45 {'loss': 1.2243, 'learning_rate': 1.1034816717906405e-07, 'epoch': 0.97} dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3476 [2024-06-11 05:57:04,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.50 | bwd_microstep: 1534.67 | bwd_inner_microstep: 1534.58 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.09 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3871 [2024-06-11 05:57:06,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.82 | bwd_microstep: 1628.08 | bwd_inner_microstep: 1628.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 05:57:08,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.95 | bwd_microstep: 1492.28 | bwd_inner_microstep: 1492.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3915 [2024-06-11 05:57:10,639] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.89 | bwd_microstep: 1688.56 | bwd_inner_microstep: 1688.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3516 [2024-06-11 05:57:12,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.25 | bwd_microstep: 1387.07 | bwd_inner_microstep: 1387.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 05:57:14,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.87 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 05:57:15,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.12 | bwd_microstep: 792.64 | bwd_inner_microstep: 792.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3501 [2024-06-11 05:57:17,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.98 | bwd_microstep: 1388.28 | bwd_inner_microstep: 1388.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612 [2024-06-11 05:57:19,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.29 | bwd_microstep: 1310.11 | bwd_inner_microstep: 1310.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3466 [2024-06-11 05:57:20,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.97 | bwd_microstep: 1277.80 | bwd_inner_microstep: 1277.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447 [2024-06-11 05:57:22,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.52 | bwd_microstep: 1281.14 | bwd_inner_microstep: 1281.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 05:57:24,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.31 | bwd_microstep: 1279.29 | bwd_inner_microstep: 1279.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4812 [2024-06-11 05:57:26,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 692.65 | bwd_microstep: 1830.16 | bwd_inner_microstep: 1830.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1998 [2024-06-11 05:57:28,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.53 | bwd_microstep: 892.92 | bwd_inner_microstep: 892.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 05:57:30,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.20 | bwd_microstep: 1347.49 | bwd_inner_microstep: 1347.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2143 [2024-06-11 05:57:31,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.17 | bwd_microstep: 957.07 | bwd_inner_microstep: 957.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2169 [2024-06-11 05:57:32,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 366.95 | bwd_microstep: 994.92 | bwd_inner_microstep: 994.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 05:57:34,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.38 | bwd_microstep: 1337.76 | bwd_inner_microstep: 1337.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3672 [2024-06-11 05:57:36,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.31 | bwd_microstep: 1356.25 | bwd_inner_microstep: 1356.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3618 [2024-06-11 05:57:38,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.22 | bwd_microstep: 1509.75 | bwd_inner_microstep: 1509.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 05:57:40,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.02 | bwd_microstep: 1407.88 | bwd_inner_microstep: 1407.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3820 [2024-06-11 05:57:42,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.88 | bwd_microstep: 1257.06 | bwd_inner_microstep: 1257.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3761 [2024-06-11 05:57:44,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.40 | bwd_microstep: 1439.60 | bwd_inner_microstep: 1439.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3486 [2024-06-11 05:57:45,889] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.61 | bwd_microstep: 1186.31 | bwd_inner_microstep: 1186.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3591 [2024-06-11 05:57:47,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.79 | bwd_microstep: 1307.28 | bwd_inner_microstep: 1307.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3721 [2024-06-11 05:57:49,812] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.77 | bwd_microstep: 1534.01 | bwd_inner_microstep: 1533.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3556 [2024-06-11 05:57:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.88 | bwd_microstep: 1261.91 | bwd_inner_microstep: 1261.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-11 05:57:52,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.26 | bwd_microstep: 686.15 | bwd_inner_microstep: 686.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3440 [2024-06-11 05:57:54,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.07 | bwd_microstep: 1471.28 | bwd_inner_microstep: 1471.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 05:57:56,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.52 | bwd_microstep: 1644.59 | bwd_inner_microstep: 1644.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3815 [2024-06-11 05:57:58,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.55 | bwd_microstep: 1583.49 | bwd_inner_microstep: 1583.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 05:58:04,294] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.24 | optimizer_step: 6.59 [2024-06-11 05:58:04,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.11 | bwd_microstep: 4731.30 | bwd_inner_microstep: 1576.09 | bwd_allreduce_microstep: 3155.12 | step_microstep: 39.63 [2024-06-11 05:58:04,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16025.41 | bwd: 46043.89 | bwd_inner: 42887.75 | bwd_allreduce: 3155.41 | step: 41.14 {'loss': 1.151, 'learning_rate': 1.0644581717337288e-07, 'epoch': 0.97} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 5015 [2024-06-11 05:58:06,777] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 675.36 | bwd_microstep: 1786.81 | bwd_inner_microstep: 1786.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3481 [2024-06-11 05:58:08,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.25 | bwd_microstep: 1214.84 | bwd_inner_microstep: 1214.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3870 [2024-06-11 05:58:10,490] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1463.10 | bwd_inner_microstep: 1463.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 05:58:12,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.07 | bwd_microstep: 1481.34 | bwd_inner_microstep: 1481.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806 [2024-06-11 05:58:14,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.42 | bwd_microstep: 1512.27 | bwd_inner_microstep: 1512.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 05:58:16,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1281.87 | bwd_inner_microstep: 1281.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3733 [2024-06-11 05:58:18,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.90 | bwd_microstep: 1496.05 | bwd_inner_microstep: 1496.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1959 [2024-06-11 05:58:19,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.35 | bwd_microstep: 795.15 | bwd_inner_microstep: 795.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3711 [2024-06-11 05:58:21,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.36 | bwd_microstep: 1630.86 | bwd_inner_microstep: 1630.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1915 [2024-06-11 05:58:22,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.23 | bwd_microstep: 687.37 | bwd_inner_microstep: 687.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 05:58:24,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.69 | bwd_microstep: 1391.57 | bwd_inner_microstep: 1391.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3627 [2024-06-11 05:58:26,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.97 | bwd_microstep: 1475.26 | bwd_inner_microstep: 1475.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2943 [2024-06-11 05:58:28,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.14 | bwd_microstep: 1194.38 | bwd_inner_microstep: 1194.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2504 [2024-06-11 05:58:29,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.06 | bwd_microstep: 1056.84 | bwd_inner_microstep: 1056.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2743 [2024-06-11 05:58:31,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.34 | bwd_microstep: 1046.06 | bwd_inner_microstep: 1046.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389 [2024-06-11 05:58:32,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.24 | bwd_microstep: 1245.79 | bwd_inner_microstep: 1245.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3530 [2024-06-11 05:58:35,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.51 | bwd_microstep: 1490.74 | bwd_inner_microstep: 1490.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3514 [2024-06-11 05:58:37,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.08 | bwd_microstep: 1486.58 | bwd_inner_microstep: 1486.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3587 [2024-06-11 05:58:39,051] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.22 | bwd_microstep: 1411.39 | bwd_inner_microstep: 1411.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2290 [2024-06-11 05:58:40,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.48 | bwd_microstep: 940.36 | bwd_inner_microstep: 940.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3554 [2024-06-11 05:58:42,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.26 | bwd_microstep: 1502.89 | bwd_inner_microstep: 1502.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3509 [2024-06-11 05:58:44,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.27 | bwd_microstep: 1355.24 | bwd_inner_microstep: 1355.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2097 [2024-06-11 05:58:45,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.42 | bwd_microstep: 827.74 | bwd_inner_microstep: 827.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3473 [2024-06-11 05:58:47,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.89 | bwd_microstep: 1376.67 | bwd_inner_microstep: 1376.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3832 [2024-06-11 05:58:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.13 | bwd_microstep: 1493.33 | bwd_inner_microstep: 1493.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3396 [2024-06-11 05:58:51,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.40 | bwd_microstep: 1247.07 | bwd_inner_microstep: 1247.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3873 [2024-06-11 05:58:53,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.53 | bwd_microstep: 1580.02 | bwd_inner_microstep: 1580.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 05:58:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.22 | bwd_microstep: 1546.46 | bwd_inner_microstep: 1546.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3571 [2024-06-11 05:58:57,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1592.77 | bwd_inner_microstep: 1592.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3782 [2024-06-11 05:58:59,894] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.83 | bwd_microstep: 1639.06 | bwd_inner_microstep: 1639.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 05:59:01,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.85 | bwd_microstep: 1487.94 | bwd_inner_microstep: 1487.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390 [2024-06-11 05:59:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.76 | optimizer_gradients: 4.11 | optimizer_step: 6.61 [2024-06-11 05:59:04,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.85 | bwd_microstep: 2357.32 | bwd_inner_microstep: 1490.01 | bwd_allreduce_microstep: 867.25 | step_microstep: 38.71 [2024-06-11 05:59:04,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16123.65 | bwd: 44095.15 | bwd_inner: 43226.99 | bwd_allreduce: 867.48 | step: 40.18 {'loss': 1.1802, 'learning_rate': 1.0261353216209691e-07, 'epoch': 0.97} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3408 [2024-06-11 05:59:06,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.46 | bwd_microstep: 1241.75 | bwd_inner_microstep: 1241.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 05:59:08,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.86 | bwd_microstep: 1381.27 | bwd_inner_microstep: 1381.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1934 [2024-06-11 05:59:09,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.01 | bwd_microstep: 694.90 | bwd_inner_microstep: 694.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3857 [2024-06-11 05:59:11,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.91 | bwd_microstep: 1362.87 | bwd_inner_microstep: 1362.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1871 [2024-06-11 05:59:12,343] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.04 | bwd_microstep: 708.54 | bwd_inner_microstep: 708.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3467 [2024-06-11 05:59:14,289] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.82 | bwd_microstep: 1407.64 | bwd_inner_microstep: 1407.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-11 05:59:16,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.25 | bwd_microstep: 1252.44 | bwd_inner_microstep: 1252.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 05:59:17,881] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.77 | bwd_microstep: 1346.57 | bwd_inner_microstep: 1346.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3430 [2024-06-11 05:59:19,488] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.52 | bwd_microstep: 1157.85 | bwd_inner_microstep: 1157.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 4012 [2024-06-11 05:59:21,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.62 | bwd_microstep: 1443.65 | bwd_inner_microstep: 1443.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 05:59:23,541] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.86 | bwd_microstep: 1488.05 | bwd_inner_microstep: 1488.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3538 [2024-06-11 05:59:25,696] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.40 | bwd_microstep: 1564.20 | bwd_inner_microstep: 1564.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661 [2024-06-11 05:59:27,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.24 | bwd_microstep: 1571.95 | bwd_inner_microstep: 1571.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-11 05:59:30,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.84 | bwd_microstep: 1586.75 | bwd_inner_microstep: 1586.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3681 [2024-06-11 05:59:32,260] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.53 | bwd_microstep: 1614.75 | bwd_inner_microstep: 1614.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 05:59:34,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.01 | bwd_microstep: 1353.54 | bwd_inner_microstep: 1353.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3527 [2024-06-11 05:59:36,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.52 | bwd_microstep: 1422.55 | bwd_inner_microstep: 1422.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969 [2024-06-11 05:59:37,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.37 | bwd_microstep: 831.43 | bwd_inner_microstep: 831.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3532 [2024-06-11 05:59:39,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.47 | bwd_microstep: 1450.14 | bwd_inner_microstep: 1450.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102 [2024-06-11 05:59:40,526] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.78 | bwd_microstep: 922.26 | bwd_inner_microstep: 922.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 05:59:42,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.20 | bwd_microstep: 1557.13 | bwd_inner_microstep: 1557.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3556 [2024-06-11 05:59:44,829] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.46 | bwd_microstep: 1568.55 | bwd_inner_microstep: 1568.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3716 [2024-06-11 05:59:46,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.61 | bwd_microstep: 1535.30 | bwd_inner_microstep: 1535.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3718 [2024-06-11 05:59:48,938] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.65 | bwd_microstep: 1443.95 | bwd_inner_microstep: 1443.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 05:59:50,869] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.13 | bwd_microstep: 1394.60 | bwd_inner_microstep: 1394.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2018 [2024-06-11 05:59:51,874] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 276.81 | bwd_microstep: 716.91 | bwd_inner_microstep: 716.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3717 [2024-06-11 05:59:53,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.69 | bwd_microstep: 1337.31 | bwd_inner_microstep: 1337.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3559 [2024-06-11 05:59:55,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.54 | bwd_microstep: 1332.41 | bwd_inner_microstep: 1332.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2271 [2024-06-11 05:59:56,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.66 | bwd_microstep: 970.95 | bwd_inner_microstep: 970.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3595 [2024-06-11 05:59:58,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.89 | bwd_microstep: 1308.56 | bwd_inner_microstep: 1308.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3587 [2024-06-11 06:00:00,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.69 | bwd_microstep: 1630.82 | bwd_inner_microstep: 1630.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2270 [2024-06-11 06:00:06,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.25 | optimizer_step: 6.60 [2024-06-11 06:00:06,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.26 | bwd_microstep: 4721.70 | bwd_inner_microstep: 990.28 | bwd_allreduce_microstep: 3731.36 | step_microstep: 38.06 [2024-06-11 06:00:06,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15550.52 | bwd: 45321.30 | bwd_inner: 41589.02 | bwd_allreduce: 3731.60 | step: 39.59 97%|█████████▋| 1668/1726 [29:16:02<1:04:29, 66.71s/it] 97%|█████████▋| 1669/1726 [29:18:37<1:28:34, 93.24s/it] 97%|█████████▋| 1669/1726 [29:18:37<1:28:34, 93.24s/it] 97%|█████████▋| 1670/1726 [29:19:38<1:18:06, 83.69s/it] 97%|█████████▋| 1670/1726 [29:19:38<1:18:06, 83.69s/it] 97%|█████████▋| 1671/1726 [29:20:41<1:10:51, 77.31s/it] 97%|█████████▋| 1671/1726 [29:20:41<1:10:51, 77.31s/it] 97%|█████████▋| 1672/1726 [29:21:41<1:05:03, 72.28s/it] 97%|█████████▋| 1672/1726 [29:21:41<1:05:03, 72.28s/it] 97%|█████████▋| 1673/1726 [29:22:42<1:00:54, 68.96s/it] {'loss': 1.1712, 'learning_rate': 9.885132564252386e-08, 'epoch': 0.97} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 06:00:07,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.79 | bwd_microstep: 1342.80 | bwd_inner_microstep: 1342.62 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3404 [2024-06-11 06:00:09,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.98 | bwd_microstep: 1146.02 | bwd_inner_microstep: 1146.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3850 [2024-06-11 06:00:11,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.56 | bwd_microstep: 1468.16 | bwd_inner_microstep: 1468.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3848 [2024-06-11 06:00:13,578] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.58 | bwd_microstep: 1464.80 | bwd_inner_microstep: 1464.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 06:00:15,353] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.02 | bwd_microstep: 1279.23 | bwd_inner_microstep: 1279.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-11 06:00:17,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.26 | bwd_microstep: 1403.53 | bwd_inner_microstep: 1403.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1997 [2024-06-11 06:00:18,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 275.56 | bwd_microstep: 710.45 | bwd_inner_microstep: 710.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3414 [2024-06-11 06:00:19,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.18 | bwd_microstep: 1153.33 | bwd_inner_microstep: 1153.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3447 [2024-06-11 06:00:21,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.46 | bwd_microstep: 1285.02 | bwd_inner_microstep: 1284.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1915 [2024-06-11 06:00:22,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.65 | bwd_microstep: 786.63 | bwd_inner_microstep: 786.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3510 [2024-06-11 06:00:24,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683 [2024-06-11 06:00:27,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.84 | bwd_microstep: 1722.54 | bwd_inner_microstep: 1722.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3929 [2024-06-11 06:00:29,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.45 | bwd_microstep: 1789.29 | bwd_inner_microstep: 1789.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3647 [2024-06-11 06:00:31,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.53 | bwd_microstep: 1510.69 | bwd_inner_microstep: 1510.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 656 [2024-06-11 06:00:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 112.89 | bwd_microstep: 277.53 | bwd_inner_microstep: 277.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3633 [2024-06-11 06:00:33,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.00 | bwd_microstep: 1316.96 | bwd_inner_microstep: 1316.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3628 [2024-06-11 06:00:35,866] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.33 | bwd_microstep: 1409.62 | bwd_inner_microstep: 1409.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3509 [2024-06-11 06:00:37,520] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.44 | bwd_microstep: 1192.15 | bwd_inner_microstep: 1192.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3753 [2024-06-11 06:00:39,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.33 | bwd_microstep: 1643.97 | bwd_inner_microstep: 1643.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3629 [2024-06-11 06:00:41,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.18 | bwd_microstep: 1512.77 | bwd_inner_microstep: 1512.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3546 [2024-06-11 06:00:43,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.60 | bwd_microstep: 1494.51 | bwd_inner_microstep: 1494.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3726 [2024-06-11 06:00:46,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.25 | bwd_microstep: 1638.90 | bwd_inner_microstep: 1638.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2112 [2024-06-11 06:00:47,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.30 | bwd_microstep: 925.59 | bwd_inner_microstep: 925.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2001 [2024-06-11 06:00:48,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.82 | bwd_microstep: 800.91 | bwd_inner_microstep: 800.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3757 [2024-06-11 06:00:50,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.94 | bwd_microstep: 1547.62 | bwd_inner_microstep: 1547.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3606 [2024-06-11 06:00:52,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.53 | bwd_microstep: 1613.05 | bwd_inner_microstep: 1609.93 | bwd_allreduce_microstep: 3.05 | step_microstep: 0.15 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 1949 [2024-06-11 06:00:53,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.38 | bwd_microstep: 776.95 | bwd_inner_microstep: 776.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3098 [2024-06-11 06:00:55,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 411.02 | bwd_microstep: 1057.55 | bwd_inner_microstep: 1057.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3265 [2024-06-11 06:00:57,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.13 | bwd_microstep: 1481.76 | bwd_inner_microstep: 1481.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3816 [2024-06-11 06:00:59,516] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.56 | bwd_microstep: 1455.62 | bwd_inner_microstep: 1455.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 35, images per sample: 8.75, dynamic token length: 3592 [2024-06-11 06:01:01,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.28 | bwd_microstep: 1516.04 | bwd_inner_microstep: 1516.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1924 [2024-06-11 06:01:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.35 | optimizer_step: 6.58 [2024-06-11 06:01:05,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 296.83 | bwd_microstep: 3669.94 | bwd_inner_microstep: 1033.70 | bwd_allreduce_microstep: 2636.18 | step_microstep: 40.08 [2024-06-11 06:01:05,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15324.45 | bwd: 43879.14 | bwd_inner: 41238.82 | bwd_allreduce: 2639.55 | step: 41.99 {'loss': 1.0958, 'learning_rate': 9.51592108651278e-08, 'epoch': 0.97} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3481 [2024-06-11 06:01:07,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.93 | bwd_microstep: 1468.87 | bwd_inner_microstep: 1468.79 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.12 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3406 [2024-06-11 06:01:09,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.18 | bwd_microstep: 1277.24 | bwd_inner_microstep: 1277.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3400 [2024-06-11 06:01:11,318] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.76 | bwd_microstep: 1369.64 | bwd_inner_microstep: 1369.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3830 [2024-06-11 06:01:13,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1558.87 | bwd_inner_microstep: 1558.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796 [2024-06-11 06:01:15,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.45 | bwd_microstep: 1651.62 | bwd_inner_microstep: 1651.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 06:01:17,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.02 | bwd_microstep: 1346.72 | bwd_inner_microstep: 1346.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 06:01:19,835] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.18 | bwd_microstep: 1633.13 | bwd_inner_microstep: 1633.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3518 [2024-06-11 06:01:21,620] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.60 | bwd_microstep: 1289.13 | bwd_inner_microstep: 1289.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3403 [2024-06-11 06:01:23,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.74 | bwd_microstep: 1278.86 | bwd_inner_microstep: 1278.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3569 [2024-06-11 06:01:25,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.11 | bwd_microstep: 1404.68 | bwd_inner_microstep: 1404.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-11 06:01:26,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.57 | bwd_microstep: 1156.65 | bwd_inner_microstep: 1156.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1983 [2024-06-11 06:01:28,049] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.65 | bwd_microstep: 798.07 | bwd_inner_microstep: 798.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-11 06:01:30,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.45 | bwd_microstep: 1529.31 | bwd_inner_microstep: 1529.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3666 [2024-06-11 06:01:32,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.89 | bwd_microstep: 1612.58 | bwd_inner_microstep: 1612.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3659 [2024-06-11 06:01:34,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.57 | bwd_microstep: 1549.67 | bwd_inner_microstep: 1549.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3688 [2024-06-11 06:01:36,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 596.02 | bwd_microstep: 1623.86 | bwd_inner_microstep: 1623.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3423 [2024-06-11 06:01:38,545] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.02 | bwd_microstep: 1307.73 | bwd_inner_microstep: 1307.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3635 [2024-06-11 06:01:40,528] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1438.54 | bwd_inner_microstep: 1438.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3447 [2024-06-11 06:01:42,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.38 | bwd_microstep: 1158.00 | bwd_inner_microstep: 1157.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3626 [2024-06-11 06:01:44,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.49 | bwd_microstep: 1654.81 | bwd_inner_microstep: 1654.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2030 [2024-06-11 06:01:45,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.53 | bwd_microstep: 904.80 | bwd_inner_microstep: 904.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 06:01:47,921] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.21 | bwd_microstep: 1647.18 | bwd_inner_microstep: 1647.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-11 06:01:49,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.26 | bwd_microstep: 1427.01 | bwd_inner_microstep: 1426.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2024 [2024-06-11 06:01:51,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.20 | bwd_microstep: 808.12 | bwd_inner_microstep: 808.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3609 [2024-06-11 06:01:52,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.55 | bwd_microstep: 1375.60 | bwd_inner_microstep: 1375.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 06:01:54,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1313.88 | bwd_inner_microstep: 1313.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3473 [2024-06-11 06:01:56,555] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.90 | bwd_microstep: 1313.16 | bwd_inner_microstep: 1313.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2090 [2024-06-11 06:01:57,612] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.93 | bwd_microstep: 757.65 | bwd_inner_microstep: 757.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3816 [2024-06-11 06:01:59,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.45 | bwd_microstep: 1385.58 | bwd_inner_microstep: 1385.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-11 06:02:01,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.68 | bwd_microstep: 1545.14 | bwd_inner_microstep: 1545.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3389 [2024-06-11 06:02:03,504] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.89 | bwd_microstep: 1339.40 | bwd_inner_microstep: 1339.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3564 [2024-06-11 06:02:06,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.16 | optimizer_step: 6.59 [2024-06-11 06:02:06,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.16 | bwd_microstep: 2832.80 | bwd_inner_microstep: 1605.58 | bwd_allreduce_microstep: 1227.14 | step_microstep: 39.43 [2024-06-11 06:02:06,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16192.02 | bwd: 44758.35 | bwd_inner: 43530.22 | bwd_allreduce: 1227.43 | step: 41.03 {'loss': 1.2308, 'learning_rate': 9.153720083351358e-08, 'epoch': 0.97} dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1873 [2024-06-11 06:02:07,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.41 | bwd_microstep: 736.16 | bwd_inner_microstep: 736.10 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.08 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-11 06:02:09,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.91 | bwd_microstep: 790.52 | bwd_inner_microstep: 790.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1972 [2024-06-11 06:02:10,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.62 | bwd_microstep: 738.97 | bwd_inner_microstep: 738.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2299 [2024-06-11 06:02:11,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.57 | bwd_microstep: 973.39 | bwd_inner_microstep: 973.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 06:02:13,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.10 | bwd_microstep: 1379.14 | bwd_inner_microstep: 1379.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3493 [2024-06-11 06:02:15,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.89 | bwd_microstep: 1286.06 | bwd_inner_microstep: 1286.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3440 [2024-06-11 06:02:16,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.82 | bwd_microstep: 1256.41 | bwd_inner_microstep: 1256.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3428 [2024-06-11 06:02:18,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.22 | bwd_microstep: 1252.86 | bwd_inner_microstep: 1252.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3500 [2024-06-11 06:02:20,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.41 | bwd_microstep: 1194.36 | bwd_inner_microstep: 1194.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3432 [2024-06-11 06:02:21,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.91 | bwd_microstep: 1254.73 | bwd_inner_microstep: 1254.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1886 [2024-06-11 06:02:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.50 | bwd_microstep: 711.22 | bwd_inner_microstep: 711.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3643 [2024-06-11 06:02:25,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.05 | bwd_microstep: 1512.42 | bwd_inner_microstep: 1512.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3514 [2024-06-11 06:02:26,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.73 | bwd_microstep: 1384.73 | bwd_inner_microstep: 1384.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3985 [2024-06-11 06:02:29,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 694.63 | bwd_microstep: 1910.20 | bwd_inner_microstep: 1910.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3690 [2024-06-11 06:02:31,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.06 | bwd_microstep: 1375.24 | bwd_inner_microstep: 1375.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 06:02:33,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.48 | bwd_microstep: 1384.35 | bwd_inner_microstep: 1384.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3821 [2024-06-11 06:02:35,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.06 | bwd_microstep: 1652.54 | bwd_inner_microstep: 1652.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3664 [2024-06-11 06:02:37,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.62 | bwd_microstep: 1418.71 | bwd_inner_microstep: 1418.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3588 [2024-06-11 06:02:39,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.75 | bwd_microstep: 1507.34 | bwd_inner_microstep: 1507.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-11 06:02:41,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.10 | bwd_microstep: 1603.46 | bwd_inner_microstep: 1603.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3714 [2024-06-11 06:02:44,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 585.15 | bwd_microstep: 1581.42 | bwd_inner_microstep: 1581.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-11 06:02:46,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.98 | bwd_microstep: 1604.24 | bwd_inner_microstep: 1604.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446 [2024-06-11 06:02:48,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.78 | bwd_microstep: 1382.69 | bwd_inner_microstep: 1382.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3453 [2024-06-11 06:02:50,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.13 | bwd_microstep: 1454.98 | bwd_inner_microstep: 1454.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3450 [2024-06-11 06:02:51,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1284.77 | bwd_inner_microstep: 1284.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2171 [2024-06-11 06:02:53,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.25 | bwd_microstep: 857.45 | bwd_inner_microstep: 857.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3630 [2024-06-11 06:02:55,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.08 | bwd_microstep: 1363.09 | bwd_inner_microstep: 1363.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3703 [2024-06-11 06:02:57,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.28 | bwd_microstep: 1632.26 | bwd_inner_microstep: 1632.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2051 [2024-06-11 06:02:58,540] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.19 | bwd_microstep: 911.78 | bwd_inner_microstep: 911.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 06:03:00,456] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.86 | bwd_microstep: 1384.47 | bwd_inner_microstep: 1384.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 06:03:02,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.95 | bwd_microstep: 1380.00 | bwd_inner_microstep: 1379.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3583 [2024-06-11 06:03:07,046] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 06:03:07,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.20 | bwd_microstep: 4103.58 | bwd_inner_microstep: 1577.22 | bwd_allreduce_microstep: 2526.30 | step_microstep: 39.31 [2024-06-11 06:03:07,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15518.62 | bwd: 44263.55 | bwd_inner: 41736.30 | bwd_allreduce: 2526.56 | step: 40.93 {'loss': 1.1386, 'learning_rate': 8.798530830438579e-08, 'epoch': 0.97} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1922 [2024-06-11 06:03:08,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 328.12 | bwd_microstep: 877.12 | bwd_inner_microstep: 877.00 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3996 [2024-06-11 06:03:10,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.23 | bwd_microstep: 1606.03 | bwd_inner_microstep: 1606.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3788 [2024-06-11 06:03:12,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.08 | bwd_microstep: 1642.29 | bwd_inner_microstep: 1642.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 06:03:14,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.52 | bwd_microstep: 1398.28 | bwd_inner_microstep: 1398.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 06:03:16,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.19 | bwd_microstep: 1355.17 | bwd_inner_microstep: 1355.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-11 06:03:18,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.25 | bwd_microstep: 1632.76 | bwd_inner_microstep: 1632.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3809 [2024-06-11 06:03:20,806] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.71 | bwd_microstep: 1454.51 | bwd_inner_microstep: 1454.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 06:03:21,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.94 | bwd_microstep: 796.90 | bwd_inner_microstep: 796.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3720 [2024-06-11 06:03:23,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.60 | bwd_microstep: 1442.95 | bwd_inner_microstep: 1442.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3534 [2024-06-11 06:03:25,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.71 | bwd_microstep: 1397.17 | bwd_inner_microstep: 1397.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3406 [2024-06-11 06:03:27,771] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.94 | bwd_microstep: 1403.48 | bwd_inner_microstep: 1403.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 06:03:29,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1482.78 | bwd_inner_microstep: 1482.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3428 [2024-06-11 06:03:31,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.65 | bwd_microstep: 1511.36 | bwd_inner_microstep: 1511.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3625 [2024-06-11 06:03:34,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.24 | bwd_microstep: 1538.88 | bwd_inner_microstep: 1538.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3428 [2024-06-11 06:03:35,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.65 | bwd_microstep: 1156.25 | bwd_inner_microstep: 1156.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3626 [2024-06-11 06:03:37,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.44 | bwd_microstep: 1416.79 | bwd_inner_microstep: 1416.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 06:03:39,848] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.88 | bwd_microstep: 1656.59 | bwd_inner_microstep: 1656.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3485 [2024-06-11 06:03:41,502] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.01 | bwd_microstep: 1186.92 | bwd_inner_microstep: 1186.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3670 [2024-06-11 06:03:43,476] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.80 | bwd_microstep: 1429.77 | bwd_inner_microstep: 1429.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-11 06:03:45,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.84 | bwd_microstep: 1510.03 | bwd_inner_microstep: 1510.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2143 [2024-06-11 06:03:46,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 316.53 | bwd_microstep: 833.69 | bwd_inner_microstep: 833.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3612 [2024-06-11 06:03:48,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.74 | bwd_microstep: 1314.00 | bwd_inner_microstep: 1313.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1910 [2024-06-11 06:03:49,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 264.15 | bwd_microstep: 687.24 | bwd_inner_microstep: 687.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 06:03:51,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.04 | bwd_microstep: 1288.49 | bwd_inner_microstep: 1288.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3512 [2024-06-11 06:03:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.33 | bwd_microstep: 1224.28 | bwd_inner_microstep: 1224.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3586 [2024-06-11 06:03:54,823] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.17 | bwd_microstep: 1338.89 | bwd_inner_microstep: 1338.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3816 [2024-06-11 06:03:57,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.04 | bwd_microstep: 1687.58 | bwd_inner_microstep: 1687.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3806 [2024-06-11 06:03:59,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.81 | bwd_microstep: 1516.42 | bwd_inner_microstep: 1516.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3577 [2024-06-11 06:04:01,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.49 | bwd_microstep: 1459.34 | bwd_inner_microstep: 1459.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2065 [2024-06-11 06:04:02,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.95 | bwd_microstep: 950.48 | bwd_inner_microstep: 950.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1985 [2024-06-11 06:04:03,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.76 | bwd_microstep: 799.55 | bwd_inner_microstep: 799.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-11 06:04:07,659] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.40 | optimizer_step: 6.59 [2024-06-11 06:04:07,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.91 | bwd_microstep: 3640.49 | bwd_inner_microstep: 914.04 | bwd_allreduce_microstep: 2726.38 | step_microstep: 39.90 [2024-06-11 06:04:07,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15629.19 | bwd: 44636.52 | bwd_inner: 41909.12 | bwd_allreduce: 2726.67 | step: 41.44 {'loss': 1.1698, 'learning_rate': 8.450354578748876e-08, 'epoch': 0.97} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-11 06:04:09,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.41 | bwd_microstep: 1369.49 | bwd_inner_microstep: 1369.34 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3698 [2024-06-11 06:04:11,797] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.34 | bwd_microstep: 1625.98 | bwd_inner_microstep: 1625.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 06:04:13,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.47 | bwd_microstep: 1386.60 | bwd_inner_microstep: 1386.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3840 [2024-06-11 06:04:15,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.00 | bwd_microstep: 1654.95 | bwd_inner_microstep: 1654.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 06:04:17,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.41 | bwd_microstep: 1247.18 | bwd_inner_microstep: 1247.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 06:04:19,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.45 | bwd_microstep: 1246.20 | bwd_inner_microstep: 1246.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 06:04:21,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.29 | bwd_microstep: 1385.63 | bwd_inner_microstep: 1385.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3733 [2024-06-11 06:04:23,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1337.33 | bwd_inner_microstep: 1337.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3736 [2024-06-11 06:04:25,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.38 | bwd_microstep: 1634.33 | bwd_inner_microstep: 1634.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 06:04:27,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.44 | bwd_microstep: 1290.61 | bwd_inner_microstep: 1290.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3488 [2024-06-11 06:04:29,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.05 | bwd_microstep: 1390.20 | bwd_inner_microstep: 1390.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1963 [2024-06-11 06:04:30,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.11 | bwd_microstep: 798.92 | bwd_inner_microstep: 798.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3502 [2024-06-11 06:04:32,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.44 | bwd_microstep: 1437.82 | bwd_inner_microstep: 1437.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3727 [2024-06-11 06:04:34,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.61 | bwd_microstep: 1729.86 | bwd_inner_microstep: 1729.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3697 [2024-06-11 06:04:36,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.59 | bwd_microstep: 1689.69 | bwd_inner_microstep: 1689.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 06:04:38,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.74 | bwd_microstep: 1246.77 | bwd_inner_microstep: 1246.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683 [2024-06-11 06:04:41,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.34 | bwd_microstep: 1725.08 | bwd_inner_microstep: 1725.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2673 [2024-06-11 06:04:42,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.93 | bwd_microstep: 1120.66 | bwd_inner_microstep: 1120.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 06:04:44,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1387.72 | bwd_inner_microstep: 1387.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 06:04:46,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.78 | bwd_microstep: 1410.48 | bwd_inner_microstep: 1410.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919 [2024-06-11 06:04:47,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.04 | bwd_microstep: 687.41 | bwd_inner_microstep: 687.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3460 [2024-06-11 06:04:49,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.22 | bwd_microstep: 1437.25 | bwd_inner_microstep: 1437.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 604 [2024-06-11 06:04:49,773] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 101.99 | bwd_microstep: 260.37 | bwd_inner_microstep: 260.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3526 [2024-06-11 06:04:51,705] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1395.94 | bwd_inner_microstep: 1395.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3553 [2024-06-11 06:04:53,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.14 | bwd_microstep: 1396.78 | bwd_inner_microstep: 1396.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-11 06:04:55,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.19 | bwd_microstep: 1652.24 | bwd_inner_microstep: 1652.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2299 [2024-06-11 06:04:57,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.67 | bwd_microstep: 1009.99 | bwd_inner_microstep: 1009.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3531 [2024-06-11 06:04:59,235] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1393.64 | bwd_inner_microstep: 1393.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3821 [2024-06-11 06:05:01,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.21 | bwd_microstep: 1414.05 | bwd_inner_microstep: 1414.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3799 [2024-06-11 06:05:03,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.48 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3439 [2024-06-11 06:05:05,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.18 | bwd_microstep: 1347.16 | bwd_inner_microstep: 1347.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-11 06:05:09,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 19.56 | optimizer_gradients: 4.14 | optimizer_step: 6.64 [2024-06-11 06:05:09,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.51 | bwd_microstep: 4311.36 | bwd_inner_microstep: 1620.78 | bwd_allreduce_microstep: 2690.52 | step_microstep: 41.80 [2024-06-11 06:05:09,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16067.72 | bwd: 45869.74 | bwd_inner: 43178.19 | bwd_allreduce: 2690.81 | step: 43.59 97%|█████████▋| 1673/1726 [29:22:42<1:00:54, 68.96s/it] 97%|█████████▋| 1674/1726 [29:23:42<57:19, 66.14s/it] 97%|█████████▋| 1674/1726 [29:23:42<57:19, 66.14s/it] 97%|█████████▋| 1675/1726 [29:24:43<54:59, 64.69s/it] 97%|█████████▋| 1675/1726 [29:24:43<54:59, 64.69s/it] 97%|█████████▋| 1676/1726 [29:25:43<52:45, 63.32s/it] 97%|█████████▋| 1676/1726 [29:25:43<52:45, 63.32s/it] 97%|█████████▋| 1677/1726 [29:26:44<51:02, 62.51s/it] 97%|█████████▋| 1677/1726 [29:26:44<51:02, 62.51s/it] 97%|█████████▋| 1678/1726 [29:27:46<49:57, 62.44s/it] {'loss': 1.2019, 'learning_rate': 8.109192554557333e-08, 'epoch': 0.97} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 06:05:11,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.79 | bwd_microstep: 1369.44 | bwd_inner_microstep: 1369.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2350 [2024-06-11 06:05:13,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 346.55 | bwd_microstep: 920.46 | bwd_inner_microstep: 920.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1928 [2024-06-11 06:05:14,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.54 | bwd_microstep: 723.56 | bwd_inner_microstep: 723.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3839 [2024-06-11 06:05:16,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.70 | bwd_microstep: 1649.81 | bwd_inner_microstep: 1649.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 06:05:18,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1378.06 | bwd_inner_microstep: 1378.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3707 [2024-06-11 06:05:20,320] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.36 | bwd_microstep: 1457.42 | bwd_inner_microstep: 1457.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 06:05:22,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.03 | bwd_microstep: 1286.75 | bwd_inner_microstep: 1286.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 06:05:23,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.52 | bwd_microstep: 1153.69 | bwd_inner_microstep: 1153.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3704 [2024-06-11 06:05:25,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.33 | bwd_microstep: 1527.69 | bwd_inner_microstep: 1527.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 06:05:27,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.72 | bwd_microstep: 1287.90 | bwd_inner_microstep: 1287.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2900 [2024-06-11 06:05:29,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.90 | bwd_microstep: 1031.61 | bwd_inner_microstep: 1031.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3447 [2024-06-11 06:05:30,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.41 | bwd_microstep: 1350.67 | bwd_inner_microstep: 1350.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3511 [2024-06-11 06:05:33,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.39 | bwd_microstep: 1535.69 | bwd_inner_microstep: 1535.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3643 [2024-06-11 06:05:35,225] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.65 | bwd_microstep: 1617.52 | bwd_inner_microstep: 1617.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2296 [2024-06-11 06:05:36,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.15 | bwd_microstep: 880.42 | bwd_inner_microstep: 880.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3540 [2024-06-11 06:05:38,160] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.32 | bwd_microstep: 1229.87 | bwd_inner_microstep: 1229.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3667 [2024-06-11 06:05:40,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.09 | bwd_microstep: 1427.37 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1986 [2024-06-11 06:05:41,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.47 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1386 [2024-06-11 06:05:42,015] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 212.61 | bwd_microstep: 555.60 | bwd_inner_microstep: 555.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2330 [2024-06-11 06:05:43,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.30 | bwd_microstep: 889.72 | bwd_inner_microstep: 889.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3823 [2024-06-11 06:05:45,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.91 | bwd_microstep: 1559.95 | bwd_inner_microstep: 1559.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 06:05:47,335] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.51 | bwd_microstep: 1400.46 | bwd_inner_microstep: 1400.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729 [2024-06-11 06:05:49,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.50 | bwd_microstep: 1469.50 | bwd_inner_microstep: 1469.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3575 [2024-06-11 06:05:51,093] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.43 | bwd_microstep: 1242.29 | bwd_inner_microstep: 1242.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3554 [2024-06-11 06:05:52,898] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.37 | bwd_microstep: 1300.12 | bwd_inner_microstep: 1300.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3711 [2024-06-11 06:05:54,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.10 | bwd_microstep: 1427.53 | bwd_inner_microstep: 1427.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1927 [2024-06-11 06:05:55,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.21 | bwd_microstep: 792.76 | bwd_inner_microstep: 792.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-11 06:05:58,168] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.52 | bwd_microstep: 1602.05 | bwd_inner_microstep: 1602.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422 [2024-06-11 06:06:00,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.42 | bwd_microstep: 1378.47 | bwd_inner_microstep: 1378.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3791 [2024-06-11 06:06:02,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.33 | bwd_microstep: 1858.05 | bwd_inner_microstep: 1858.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2900 [2024-06-11 06:06:04,255] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.12 | bwd_microstep: 1183.62 | bwd_inner_microstep: 1183.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1868 [2024-06-11 06:06:15,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.30 | optimizer_step: 6.60 [2024-06-11 06:06:15,059] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 290.06 | bwd_microstep: 10466.07 | bwd_inner_microstep: 882.09 | bwd_allreduce_microstep: 9583.90 | step_microstep: 39.99 [2024-06-11 06:06:15,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15018.18 | bwd: 49752.77 | bwd_inner: 40167.94 | bwd_allreduce: 9584.14 | step: 41.52 {'loss': 1.1877, 'learning_rate': 7.775045959434568e-08, 'epoch': 0.97} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 06:06:17,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.01 | bwd_microstep: 1471.39 | bwd_inner_microstep: 1471.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2369 [2024-06-11 06:06:18,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 407.92 | bwd_microstep: 1089.54 | bwd_inner_microstep: 1089.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3469 [2024-06-11 06:06:20,470] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.47 | bwd_microstep: 1338.75 | bwd_inner_microstep: 1338.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3400 [2024-06-11 06:06:22,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.56 | bwd_microstep: 1175.41 | bwd_inner_microstep: 1175.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3467 [2024-06-11 06:06:24,156] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.55 | bwd_microstep: 1481.41 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 06:06:25,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.81 | bwd_microstep: 1275.10 | bwd_inner_microstep: 1275.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-11 06:06:27,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.56 | bwd_microstep: 1341.97 | bwd_inner_microstep: 1341.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3481 [2024-06-11 06:06:29,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.28 | bwd_microstep: 1340.21 | bwd_inner_microstep: 1340.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3736 [2024-06-11 06:06:31,658] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.54 | bwd_microstep: 1461.84 | bwd_inner_microstep: 1461.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-11 06:06:33,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1414.29 | bwd_inner_microstep: 1414.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 06:06:35,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.29 | bwd_microstep: 1384.31 | bwd_inner_microstep: 1384.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1973 [2024-06-11 06:06:36,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.55 | bwd_microstep: 799.34 | bwd_inner_microstep: 799.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 1983 [2024-06-11 06:06:37,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.92 | bwd_microstep: 846.51 | bwd_inner_microstep: 846.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3557 [2024-06-11 06:06:39,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.10 | bwd_microstep: 1392.26 | bwd_inner_microstep: 1392.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3444 [2024-06-11 06:06:41,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.92 | bwd_microstep: 1299.06 | bwd_inner_microstep: 1299.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3667 [2024-06-11 06:06:43,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.63 | bwd_microstep: 1492.44 | bwd_inner_microstep: 1492.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3505 [2024-06-11 06:06:45,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.85 | bwd_microstep: 1574.30 | bwd_inner_microstep: 1574.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3640 [2024-06-11 06:06:47,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.67 | bwd_microstep: 1409.56 | bwd_inner_microstep: 1409.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 06:06:49,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.57 | bwd_microstep: 1246.26 | bwd_inner_microstep: 1246.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 06:06:51,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.64 | bwd_microstep: 1555.27 | bwd_inner_microstep: 1555.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3624 [2024-06-11 06:06:53,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.79 | bwd_microstep: 1409.23 | bwd_inner_microstep: 1409.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 06:06:55,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.82 | bwd_microstep: 1283.18 | bwd_inner_microstep: 1283.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3200 [2024-06-11 06:06:56,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.08 | bwd_microstep: 1249.21 | bwd_inner_microstep: 1249.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3705 [2024-06-11 06:06:59,033] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.87 | bwd_microstep: 1474.45 | bwd_inner_microstep: 1474.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-11 06:07:00,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.63 | bwd_microstep: 976.11 | bwd_inner_microstep: 976.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3517 [2024-06-11 06:07:02,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.12 | bwd_microstep: 1444.19 | bwd_inner_microstep: 1444.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2072 [2024-06-11 06:07:03,509] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.30 | bwd_microstep: 820.84 | bwd_inner_microstep: 820.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3622 [2024-06-11 06:07:05,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.11 | bwd_microstep: 1675.29 | bwd_inner_microstep: 1675.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3566 [2024-06-11 06:07:07,784] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.48 | bwd_microstep: 1430.13 | bwd_inner_microstep: 1430.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3774 [2024-06-11 06:07:10,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.78 | bwd_microstep: 1635.42 | bwd_inner_microstep: 1635.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2231 [2024-06-11 06:07:11,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.54 | bwd_microstep: 929.64 | bwd_inner_microstep: 929.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 48, images per sample: 12.0, dynamic token length: 3811 [2024-06-11 06:07:17,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.30 | optimizer_step: 6.58 [2024-06-11 06:07:17,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 652.47 | bwd_microstep: 5370.94 | bwd_inner_microstep: 2023.86 | bwd_allreduce_microstep: 3347.03 | step_microstep: 38.91 [2024-06-11 06:07:17,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15885.45 | bwd: 46087.90 | bwd_inner: 42739.94 | bwd_allreduce: 3347.26 | step: 40.66 {'loss': 1.19, 'learning_rate': 7.447915970243414e-08, 'epoch': 0.97} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-11 06:07:19,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.16 | bwd_microstep: 1239.33 | bwd_inner_microstep: 1239.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 06:07:21,054] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.94 | bwd_microstep: 1399.98 | bwd_inner_microstep: 1399.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 06:07:22,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1379.37 | bwd_inner_microstep: 1379.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3424 [2024-06-11 06:07:24,958] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.29 | bwd_microstep: 1446.10 | bwd_inner_microstep: 1446.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3783 [2024-06-11 06:07:27,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.81 | bwd_microstep: 1647.52 | bwd_inner_microstep: 1647.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2927 [2024-06-11 06:07:28,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 418.36 | bwd_microstep: 1092.49 | bwd_inner_microstep: 1092.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906 [2024-06-11 06:07:29,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.62 | bwd_microstep: 684.87 | bwd_inner_microstep: 684.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1944 [2024-06-11 06:07:30,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.86 | bwd_microstep: 793.69 | bwd_inner_microstep: 793.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3425 [2024-06-11 06:07:32,654] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.45 | bwd_microstep: 1346.60 | bwd_inner_microstep: 1346.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 06:07:34,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.29 | bwd_microstep: 1389.54 | bwd_inner_microstep: 1389.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3521 [2024-06-11 06:07:36,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.63 | bwd_microstep: 1491.37 | bwd_inner_microstep: 1491.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3618 [2024-06-11 06:07:38,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.21 | bwd_microstep: 1317.31 | bwd_inner_microstep: 1317.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3647 [2024-06-11 06:07:40,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1410.94 | bwd_inner_microstep: 1410.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1925 [2024-06-11 06:07:41,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.53 | bwd_microstep: 791.36 | bwd_inner_microstep: 791.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3498 [2024-06-11 06:07:43,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.97 | bwd_microstep: 1483.46 | bwd_inner_microstep: 1483.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 06:07:45,706] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.02 | bwd_microstep: 1578.56 | bwd_inner_microstep: 1578.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2103 [2024-06-11 06:07:47,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.04 | bwd_microstep: 971.79 | bwd_inner_microstep: 971.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 06:07:48,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.37 | bwd_microstep: 1295.15 | bwd_inner_microstep: 1295.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1990 [2024-06-11 06:07:49,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.16 | bwd_microstep: 807.49 | bwd_inner_microstep: 807.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3524 [2024-06-11 06:07:51,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.03 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615 [2024-06-11 06:07:53,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1313.25 | bwd_inner_microstep: 1313.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 06:07:55,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.13 | bwd_microstep: 1661.90 | bwd_inner_microstep: 1661.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983 [2024-06-11 06:07:56,840] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.02 | bwd_microstep: 707.36 | bwd_inner_microstep: 707.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 06:07:58,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.41 | bwd_microstep: 1282.70 | bwd_inner_microstep: 1282.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3419 [2024-06-11 06:08:00,395] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.27 | bwd_microstep: 1281.71 | bwd_inner_microstep: 1281.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 06:08:02,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.89 | bwd_microstep: 1284.28 | bwd_inner_microstep: 1284.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2283 [2024-06-11 06:08:03,608] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.94 | bwd_microstep: 1038.51 | bwd_inner_microstep: 1038.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3595 [2024-06-11 06:08:05,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.87 | bwd_microstep: 1656.00 | bwd_inner_microstep: 1655.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3426 [2024-06-11 06:08:08,009] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.30 | bwd_microstep: 1550.98 | bwd_inner_microstep: 1550.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3622 [2024-06-11 06:08:09,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.21 | bwd_microstep: 1418.79 | bwd_inner_microstep: 1418.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3592 [2024-06-11 06:08:12,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.40 | bwd_microstep: 1674.84 | bwd_inner_microstep: 1674.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3752 [2024-06-11 06:08:17,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.11 | optimizer_step: 6.61 [2024-06-11 06:08:17,399] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.50 | bwd_microstep: 4562.67 | bwd_inner_microstep: 1769.18 | bwd_allreduce_microstep: 2793.44 | step_microstep: 38.37 [2024-06-11 06:08:17,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15374.94 | bwd: 44292.20 | bwd_inner: 41497.86 | bwd_allreduce: 2793.67 | step: 39.97 {'loss': 1.1472, 'learning_rate': 7.12780373913402e-08, 'epoch': 0.97} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3490 [2024-06-11 06:08:19,258] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.18 | bwd_microstep: 1340.22 | bwd_inner_microstep: 1340.13 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3950 [2024-06-11 06:08:21,319] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.50 | bwd_microstep: 1491.61 | bwd_inner_microstep: 1491.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 06:08:23,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.95 | bwd_microstep: 1341.17 | bwd_inner_microstep: 1341.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3834 [2024-06-11 06:08:25,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.68 | bwd_microstep: 1651.11 | bwd_inner_microstep: 1651.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3737 [2024-06-11 06:08:27,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.53 | bwd_microstep: 1436.05 | bwd_inner_microstep: 1436.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3424 [2024-06-11 06:08:29,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 449.85 | bwd_microstep: 1180.71 | bwd_inner_microstep: 1180.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 06:08:30,796] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.51 | bwd_microstep: 1250.81 | bwd_inner_microstep: 1250.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 06:08:32,574] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.66 | bwd_microstep: 1284.80 | bwd_inner_microstep: 1284.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3715 [2024-06-11 06:08:34,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.67 | bwd_microstep: 1436.02 | bwd_inner_microstep: 1436.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3514 [2024-06-11 06:08:36,529] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.09 | bwd_microstep: 1431.10 | bwd_inner_microstep: 1431.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2476 [2024-06-11 06:08:37,729] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.63 | bwd_microstep: 858.60 | bwd_inner_microstep: 858.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 06:08:39,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.59 | bwd_microstep: 1490.36 | bwd_inner_microstep: 1490.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2120 [2024-06-11 06:08:41,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.90 | bwd_microstep: 923.51 | bwd_inner_microstep: 923.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3684 [2024-06-11 06:08:43,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.73 | bwd_microstep: 1694.93 | bwd_inner_microstep: 1694.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2000 [2024-06-11 06:08:44,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.87 | bwd_microstep: 896.62 | bwd_inner_microstep: 896.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3449 [2024-06-11 06:08:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.22 | bwd_microstep: 1399.29 | bwd_inner_microstep: 1399.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3539 [2024-06-11 06:08:48,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.35 | bwd_microstep: 1294.97 | bwd_inner_microstep: 1294.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3530 [2024-06-11 06:08:50,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.77 | bwd_microstep: 1585.53 | bwd_inner_microstep: 1585.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 06:08:52,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.39 | bwd_microstep: 1496.38 | bwd_inner_microstep: 1496.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3630 [2024-06-11 06:08:54,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.37 | bwd_microstep: 1604.53 | bwd_inner_microstep: 1604.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 06:08:56,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.74 | bwd_microstep: 1554.26 | bwd_inner_microstep: 1554.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 06:08:59,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.64 | bwd_microstep: 1506.32 | bwd_inner_microstep: 1506.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3724 [2024-06-11 06:09:00,917] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.68 | bwd_microstep: 1369.56 | bwd_inner_microstep: 1369.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3608 [2024-06-11 06:09:03,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.50 | bwd_microstep: 1609.53 | bwd_inner_microstep: 1609.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3608 [2024-06-11 06:09:05,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.97 | bwd_microstep: 1508.82 | bwd_inner_microstep: 1508.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3558 [2024-06-11 06:09:07,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.83 | bwd_microstep: 1589.72 | bwd_inner_microstep: 1589.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3808 [2024-06-11 06:09:09,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.50 | bwd_microstep: 1659.81 | bwd_inner_microstep: 1659.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2078 [2024-06-11 06:09:10,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.81 | bwd_microstep: 817.83 | bwd_inner_microstep: 817.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-11 06:09:12,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.69 | bwd_microstep: 976.76 | bwd_inner_microstep: 976.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3766 [2024-06-11 06:09:13,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 477.64 | bwd_microstep: 1250.79 | bwd_inner_microstep: 1250.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3573 [2024-06-11 06:09:15,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.37 | bwd_microstep: 1210.65 | bwd_inner_microstep: 1210.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2266 [2024-06-11 06:09:21,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.16 | optimizer_step: 6.56 [2024-06-11 06:09:21,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.80 | bwd_microstep: 5882.36 | bwd_inner_microstep: 994.77 | bwd_allreduce_microstep: 4887.54 | step_microstep: 38.99 [2024-06-11 06:09:21,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16065.26 | bwd: 48024.73 | bwd_inner: 43136.19 | bwd_allreduce: 4887.82 | step: 40.58 {'loss': 1.1586, 'learning_rate': 6.814710393539869e-08, 'epoch': 0.97} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3432 [2024-06-11 06:09:23,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.37 | bwd_microstep: 1437.79 | bwd_inner_microstep: 1437.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3939 [2024-06-11 06:09:26,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.02 | bwd_microstep: 1687.35 | bwd_inner_microstep: 1687.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3843 [2024-06-11 06:09:28,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.85 | bwd_microstep: 1460.02 | bwd_inner_microstep: 1459.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 06:09:30,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.17 | bwd_microstep: 1377.98 | bwd_inner_microstep: 1377.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 06:09:31,988] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.75 | bwd_microstep: 1382.43 | bwd_inner_microstep: 1382.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3757 [2024-06-11 06:09:33,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.82 | bwd_microstep: 1434.21 | bwd_inner_microstep: 1434.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3492 [2024-06-11 06:09:35,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.49 | bwd_microstep: 1384.86 | bwd_inner_microstep: 1384.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3594 [2024-06-11 06:09:37,857] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.91 | bwd_microstep: 1435.10 | bwd_inner_microstep: 1435.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 06:09:39,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.44 | bwd_microstep: 1248.73 | bwd_inner_microstep: 1248.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2207 [2024-06-11 06:09:40,907] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 355.64 | bwd_microstep: 957.77 | bwd_inner_microstep: 957.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3497 [2024-06-11 06:09:42,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1287.86 | bwd_inner_microstep: 1287.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2979 [2024-06-11 06:09:44,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 427.65 | bwd_microstep: 1137.10 | bwd_inner_microstep: 1137.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1991 [2024-06-11 06:09:45,251] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.34 | bwd_microstep: 707.59 | bwd_inner_microstep: 707.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3648 [2024-06-11 06:09:47,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.44 | bwd_microstep: 1574.22 | bwd_inner_microstep: 1574.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3506 [2024-06-11 06:09:49,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.51 | bwd_microstep: 1552.95 | bwd_inner_microstep: 1552.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 06:09:51,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.64 | bwd_microstep: 1349.18 | bwd_inner_microstep: 1349.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828 [2024-06-11 06:09:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.20 | bwd_microstep: 1584.77 | bwd_inner_microstep: 1584.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3479 [2024-06-11 06:09:55,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1281.58 | bwd_inner_microstep: 1281.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-11 06:09:56,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.21 | bwd_microstep: 977.20 | bwd_inner_microstep: 977.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3543 [2024-06-11 06:09:58,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.52 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3434 [2024-06-11 06:10:00,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1255.37 | bwd_inner_microstep: 1255.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3440 [2024-06-11 06:10:01,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.14 | bwd_microstep: 1155.61 | bwd_inner_microstep: 1155.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3804 [2024-06-11 06:10:04,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.31 | bwd_microstep: 1654.31 | bwd_inner_microstep: 1654.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-11 06:10:06,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.05 | bwd_microstep: 1398.93 | bwd_inner_microstep: 1398.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3827 [2024-06-11 06:10:08,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.13 | bwd_microstep: 1465.08 | bwd_inner_microstep: 1465.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 06:10:10,163] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.95 | bwd_microstep: 1395.09 | bwd_inner_microstep: 1395.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3547 [2024-06-11 06:10:12,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.37 | bwd_microstep: 1541.36 | bwd_inner_microstep: 1541.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3641 [2024-06-11 06:10:14,279] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1444.99 | bwd_inner_microstep: 1444.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-11 06:10:16,347] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1498.82 | bwd_inner_microstep: 1498.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559 [2024-06-11 06:10:18,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.33 | bwd_microstep: 1527.00 | bwd_inner_microstep: 1526.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3566 [2024-06-11 06:10:20,350] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.73 | bwd_microstep: 1368.58 | bwd_inner_microstep: 1368.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3729 [2024-06-11 06:10:22,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.89 | optimizer_gradients: 4.05 | optimizer_step: 6.62 [2024-06-11 06:10:22,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.17 | bwd_microstep: 1952.46 | bwd_inner_microstep: 1541.69 | bwd_allreduce_microstep: 410.71 | step_microstep: 86.82 [2024-06-11 06:10:22,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16394.17 | bwd: 44315.04 | bwd_inner: 43903.42 | bwd_allreduce: 410.94 | step: 88.46 97%|█████████▋| 1678/1726 [29:27:46<49:57, 62.44s/it] 97%|█████████▋| 1679/1726 [29:28:51<49:32, 63.24s/it] 97%|█████████▋| 1679/1726 [29:28:51<49:32, 63.24s/it] 97%|█████████▋| 1680/1726 [29:29:54<48:16, 62.97s/it] 97%|█████████▋| 1680/1726 [29:29:54<48:16, 62.97s/it] 97%|█████████▋| 1681/1726 [29:30:54<46:33, 62.08s/it] 97%|█████████▋| 1681/1726 [29:30:54<46:33, 62.08s/it] 97%|█████████▋| 1682/1726 [29:31:58<46:02, 62.79s/it] 97%|█████████▋| 1682/1726 [29:31:58<46:02, 62.79s/it] 98%|█████████▊| 1683/1726 [29:32:59<44:38, 62.28s/{'loss': 1.1642, 'learning_rate': 6.508637036174215e-08, 'epoch': 0.98} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3401 [2024-06-11 06:10:24,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.40 | bwd_microstep: 1433.04 | bwd_inner_microstep: 1433.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 06:10:26,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.64 | bwd_microstep: 1374.84 | bwd_inner_microstep: 1374.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3474 [2024-06-11 06:10:28,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.94 | bwd_microstep: 1383.31 | bwd_inner_microstep: 1383.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 06:10:30,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 1485.50 | bwd_inner_microstep: 1485.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 06:10:32,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.72 | bwd_microstep: 1383.77 | bwd_inner_microstep: 1383.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4190 [2024-06-11 06:10:34,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.55 | bwd_microstep: 1654.95 | bwd_inner_microstep: 1654.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 06:10:36,853] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1344.85 | bwd_inner_microstep: 1344.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 06:10:38,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.45 | bwd_microstep: 1388.73 | bwd_inner_microstep: 1388.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 06:10:40,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 06:10:42,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.24 | bwd_microstep: 1285.78 | bwd_inner_microstep: 1285.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3525 [2024-06-11 06:10:44,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.31 | bwd_microstep: 1294.36 | bwd_inner_microstep: 1294.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2130 [2024-06-11 06:10:45,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.89 | bwd_microstep: 974.62 | bwd_inner_microstep: 974.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 06:10:47,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.11 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3584 [2024-06-11 06:10:49,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.95 | bwd_microstep: 1304.46 | bwd_inner_microstep: 1304.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3490 [2024-06-11 06:10:51,096] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.87 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1924 [2024-06-11 06:10:52,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 317.03 | bwd_microstep: 847.44 | bwd_inner_microstep: 847.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2897 [2024-06-11 06:10:53,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 416.50 | bwd_microstep: 1091.15 | bwd_inner_microstep: 1091.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3515 [2024-06-11 06:10:55,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.26 | bwd_microstep: 1579.28 | bwd_inner_microstep: 1579.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3561 [2024-06-11 06:10:57,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.63 | bwd_microstep: 1446.49 | bwd_inner_microstep: 1446.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3603 [2024-06-11 06:11:00,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.34 | bwd_microstep: 1603.83 | bwd_inner_microstep: 1603.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1982 [2024-06-11 06:11:01,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.52 | bwd_microstep: 896.52 | bwd_inner_microstep: 896.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3825 [2024-06-11 06:11:03,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 618.86 | bwd_microstep: 1681.29 | bwd_inner_microstep: 1681.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 620 [2024-06-11 06:11:04,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.12 | bwd_microstep: 263.62 | bwd_inner_microstep: 263.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 06:11:06,288] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.39 | bwd_microstep: 1612.10 | bwd_inner_microstep: 1612.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3589 [2024-06-11 06:11:08,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.29 | bwd_microstep: 1406.59 | bwd_inner_microstep: 1406.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3813 [2024-06-11 06:11:10,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.13 | bwd_microstep: 1478.42 | bwd_inner_microstep: 1478.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-11 06:11:12,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.29 | bwd_microstep: 1300.12 | bwd_inner_microstep: 1300.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3504 [2024-06-11 06:11:14,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.97 | bwd_microstep: 1433.30 | bwd_inner_microstep: 1433.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3797 [2024-06-11 06:11:16,329] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.42 | bwd_microstep: 1655.01 | bwd_inner_microstep: 1654.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 06:11:18,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.14 | bwd_microstep: 1403.57 | bwd_inner_microstep: 1403.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3587 [2024-06-11 06:11:19,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.14 | bwd_microstep: 1210.49 | bwd_inner_microstep: 1210.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3776 [2024-06-11 06:11:25,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.42 | optimizer_step: 6.61 [2024-06-11 06:11:25,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 635.52 | bwd_microstep: 4840.62 | bwd_inner_microstep: 1976.15 | bwd_allreduce_microstep: 2864.39 | step_microstep: 39.97 [2024-06-11 06:11:25,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16080.75 | bwd: 46105.03 | bwd_inner: 43239.71 | bwd_allreduce: 2864.63 | step: 41.57 {'loss': 1.1887, 'learning_rate': 6.209584745025643e-08, 'epoch': 0.98} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3463 [2024-06-11 06:11:27,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.46 | bwd_microstep: 1430.41 | bwd_inner_microstep: 1430.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3994 [2024-06-11 06:11:29,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.55 | bwd_microstep: 1540.87 | bwd_inner_microstep: 1540.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 06:11:31,332] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.81 | bwd_microstep: 1252.47 | bwd_inner_microstep: 1252.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 06:11:33,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.91 | bwd_microstep: 1342.96 | bwd_inner_microstep: 1342.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:11:35,099] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.54 | bwd_microstep: 1380.98 | bwd_inner_microstep: 1380.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2032 [2024-06-11 06:11:36,218] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.29 | bwd_microstep: 807.61 | bwd_inner_microstep: 807.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3400 [2024-06-11 06:11:37,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.27 | bwd_microstep: 1152.29 | bwd_inner_microstep: 1152.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3492 [2024-06-11 06:11:39,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.34 | bwd_microstep: 1284.98 | bwd_inner_microstep: 1284.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3408 [2024-06-11 06:11:41,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 440.55 | bwd_microstep: 1151.12 | bwd_inner_microstep: 1151.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3694 [2024-06-11 06:11:43,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.26 | bwd_microstep: 1455.61 | bwd_inner_microstep: 1455.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 06:11:44,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.39 | bwd_microstep: 1288.77 | bwd_inner_microstep: 1288.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3675 [2024-06-11 06:11:47,142] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 1565.83 | bwd_inner_microstep: 1565.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3019 [2024-06-11 06:11:48,849] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 465.05 | bwd_microstep: 1232.09 | bwd_inner_microstep: 1232.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 06:11:50,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.06 | bwd_microstep: 1282.98 | bwd_inner_microstep: 1282.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3490 [2024-06-11 06:11:52,787] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.48 | bwd_microstep: 1574.18 | bwd_inner_microstep: 1574.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3516 [2024-06-11 06:11:54,916] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.26 | bwd_microstep: 1548.89 | bwd_inner_microstep: 1548.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3458 [2024-06-11 06:11:56,605] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.81 | bwd_microstep: 1212.36 | bwd_inner_microstep: 1212.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3538 [2024-06-11 06:11:58,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.38 | bwd_microstep: 1492.31 | bwd_inner_microstep: 1492.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3624 [2024-06-11 06:12:00,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.92 | bwd_microstep: 1613.00 | bwd_inner_microstep: 1612.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2028 [2024-06-11 06:12:02,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.92 | bwd_microstep: 808.68 | bwd_inner_microstep: 808.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3579 [2024-06-11 06:12:04,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.45 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1494 [2024-06-11 06:12:05,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 231.06 | bwd_microstep: 612.40 | bwd_inner_microstep: 612.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2151 [2024-06-11 06:12:06,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.54 | bwd_microstep: 803.70 | bwd_inner_microstep: 803.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3560 [2024-06-11 06:12:08,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.07 | bwd_microstep: 1500.59 | bwd_inner_microstep: 1500.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2153 [2024-06-11 06:12:09,432] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.47 | bwd_microstep: 854.21 | bwd_inner_microstep: 854.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3735 [2024-06-11 06:12:11,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.43 | bwd_microstep: 1442.50 | bwd_inner_microstep: 1442.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-11 06:12:13,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.53 | bwd_microstep: 1298.67 | bwd_inner_microstep: 1298.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3818 [2024-06-11 06:12:15,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.16 | bwd_microstep: 1388.01 | bwd_inner_microstep: 1387.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3812 [2024-06-11 06:12:17,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 656.12 | bwd_microstep: 1801.29 | bwd_inner_microstep: 1801.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3586 [2024-06-11 06:12:19,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.35 | bwd_microstep: 1506.78 | bwd_inner_microstep: 1506.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3877 [2024-06-11 06:12:21,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.98 | bwd_microstep: 1388.77 | bwd_inner_microstep: 1388.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3762 [2024-06-11 06:12:26,283] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.15 | optimizer_step: 6.61 [2024-06-11 06:12:26,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.70 | bwd_microstep: 4072.93 | bwd_inner_microstep: 1919.02 | bwd_allreduce_microstep: 2153.86 | step_microstep: 38.11 [2024-06-11 06:12:26,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15772.93 | bwd: 44686.31 | bwd_inner: 42531.54 | bwd_allreduce: 2154.09 | step: 39.66 {'loss': 1.1466, 'learning_rate': 5.917554573354967e-08, 'epoch': 0.98} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 06:12:28,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.66 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.50 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3501 [2024-06-11 06:12:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.75 | bwd_microstep: 1252.14 | bwd_inner_microstep: 1252.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3853 [2024-06-11 06:12:32,100] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.78 | bwd_microstep: 1559.51 | bwd_inner_microstep: 1559.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751 [2024-06-11 06:12:34,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.17 | bwd_microstep: 1635.60 | bwd_inner_microstep: 1635.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 06:12:36,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.88 | bwd_microstep: 1247.77 | bwd_inner_microstep: 1247.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1946 [2024-06-11 06:12:37,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.06 | bwd_microstep: 798.69 | bwd_inner_microstep: 798.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-11 06:12:38,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.40 | bwd_microstep: 1292.56 | bwd_inner_microstep: 1292.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1961 [2024-06-11 06:12:40,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.18 | bwd_microstep: 826.27 | bwd_inner_microstep: 826.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-11 06:12:42,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.37 | bwd_microstep: 1641.04 | bwd_inner_microstep: 1641.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1952 [2024-06-11 06:12:43,388] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.87 | bwd_microstep: 726.16 | bwd_inner_microstep: 726.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 06:12:45,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.12 | bwd_microstep: 1376.05 | bwd_inner_microstep: 1376.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3429 [2024-06-11 06:12:47,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.27 | bwd_microstep: 1344.34 | bwd_inner_microstep: 1344.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3518 [2024-06-11 06:12:49,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.20 | bwd_microstep: 1646.40 | bwd_inner_microstep: 1646.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3482 [2024-06-11 06:12:51,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.75 | bwd_microstep: 1476.11 | bwd_inner_microstep: 1476.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3681 [2024-06-11 06:12:53,813] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 628.49 | bwd_microstep: 1720.01 | bwd_inner_microstep: 1719.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 06:12:55,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.97 | bwd_microstep: 1341.19 | bwd_inner_microstep: 1341.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3502 [2024-06-11 06:12:57,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.79 | bwd_microstep: 1528.77 | bwd_inner_microstep: 1528.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 06:12:59,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.25 | bwd_microstep: 1187.05 | bwd_inner_microstep: 1187.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3017 [2024-06-11 06:13:00,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.11 | bwd_microstep: 1133.99 | bwd_inner_microstep: 1133.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3825 [2024-06-11 06:13:03,275] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.18 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-11 06:13:05,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.73 | bwd_microstep: 1283.87 | bwd_inner_microstep: 1283.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3547 [2024-06-11 06:13:06,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.31 | bwd_microstep: 1396.77 | bwd_inner_microstep: 1396.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3820 [2024-06-11 06:13:09,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.22 | bwd_microstep: 1483.50 | bwd_inner_microstep: 1483.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3592 [2024-06-11 06:13:11,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.10 | bwd_microstep: 1509.32 | bwd_inner_microstep: 1509.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3825 [2024-06-11 06:13:13,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.55 | bwd_microstep: 1463.97 | bwd_inner_microstep: 1463.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3563 [2024-06-11 06:13:15,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.01 | bwd_microstep: 1365.43 | bwd_inner_microstep: 1365.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3724 [2024-06-11 06:13:17,371] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.85 | bwd_microstep: 1704.17 | bwd_inner_microstep: 1704.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3824 [2024-06-11 06:13:19,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.12 | bwd_microstep: 1519.69 | bwd_inner_microstep: 1519.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3650 [2024-06-11 06:13:21,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.67 | bwd_microstep: 1589.79 | bwd_inner_microstep: 1589.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3391 [2024-06-11 06:13:23,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.88 | bwd_microstep: 1437.55 | bwd_inner_microstep: 1437.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3612 [2024-06-11 06:13:25,532] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.75 | bwd_microstep: 1372.40 | bwd_inner_microstep: 1372.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584 [2024-06-11 06:13:27,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.13 | optimizer_step: 6.65 [2024-06-11 06:13:27,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.01 | bwd_microstep: 1486.37 | bwd_inner_microstep: 1478.31 | bwd_allreduce_microstep: 8.01 | step_microstep: 38.12 [2024-06-11 06:13:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16583.13 | bwd: 44391.28 | bwd_inner: 44382.25 | bwd_allreduce: 8.29 | step: 39.74 {'loss': 1.1506, 'learning_rate': 5.632547549690559e-08, 'epoch': 0.98} dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3416 [2024-06-11 06:13:29,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.81 | bwd_microstep: 1398.10 | bwd_inner_microstep: 1398.01 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3458 [2024-06-11 06:13:31,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.39 | bwd_microstep: 1275.02 | bwd_inner_microstep: 1274.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1945 [2024-06-11 06:13:32,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 271.19 | bwd_microstep: 698.12 | bwd_inner_microstep: 698.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3833 [2024-06-11 06:13:34,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.95 | bwd_microstep: 1661.50 | bwd_inner_microstep: 1661.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2244 [2024-06-11 06:13:35,794] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.46 | bwd_microstep: 869.69 | bwd_inner_microstep: 869.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 06:13:37,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1284.66 | bwd_inner_microstep: 1284.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 06:13:39,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.82 | bwd_microstep: 1495.47 | bwd_inner_microstep: 1495.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3545 [2024-06-11 06:13:41,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.37 | bwd_microstep: 1355.96 | bwd_inner_microstep: 1355.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-11 06:13:43,308] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3499 [2024-06-11 06:13:45,092] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.05 | bwd_microstep: 1288.83 | bwd_inner_microstep: 1288.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2504 [2024-06-11 06:13:46,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.45 | bwd_microstep: 990.28 | bwd_inner_microstep: 990.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2013 [2024-06-11 06:13:47,497] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 283.65 | bwd_microstep: 743.05 | bwd_inner_microstep: 743.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3667 [2024-06-11 06:13:49,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.61 | bwd_microstep: 1615.12 | bwd_inner_microstep: 1615.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-11 06:13:51,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.07 | bwd_microstep: 1151.53 | bwd_inner_microstep: 1151.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3403 [2024-06-11 06:13:53,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.93 | bwd_microstep: 1344.33 | bwd_inner_microstep: 1344.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-11 06:13:55,233] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.65 | bwd_microstep: 1489.51 | bwd_inner_microstep: 1489.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 06:13:57,295] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.34 | bwd_microstep: 1492.09 | bwd_inner_microstep: 1492.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3533 [2024-06-11 06:13:59,232] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.35 | bwd_microstep: 1398.92 | bwd_inner_microstep: 1398.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3536 [2024-06-11 06:14:01,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.45 | bwd_microstep: 1356.53 | bwd_inner_microstep: 1356.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3835 [2024-06-11 06:14:03,391] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.84 | bwd_microstep: 1655.97 | bwd_inner_microstep: 1655.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 06:14:05,169] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.66 | bwd_microstep: 1279.92 | bwd_inner_microstep: 1279.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1992 [2024-06-11 06:14:06,409] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.70 | bwd_microstep: 896.08 | bwd_inner_microstep: 896.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2405 [2024-06-11 06:14:07,631] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.47 | bwd_microstep: 878.71 | bwd_inner_microstep: 878.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3572 [2024-06-11 06:14:09,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.93 | bwd_microstep: 1396.34 | bwd_inner_microstep: 1396.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1993 [2024-06-11 06:14:10,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 282.77 | bwd_microstep: 738.00 | bwd_inner_microstep: 737.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3467 [2024-06-11 06:14:12,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.42 | bwd_microstep: 1183.06 | bwd_inner_microstep: 1183.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-11 06:14:14,193] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.71 | bwd_microstep: 1415.34 | bwd_inner_microstep: 1415.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2061 [2024-06-11 06:14:15,455] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.67 | bwd_microstep: 914.64 | bwd_inner_microstep: 914.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 06:14:17,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.24 | bwd_microstep: 1380.77 | bwd_inner_microstep: 1380.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3823 [2024-06-11 06:14:19,460] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1516.97 | bwd_inner_microstep: 1516.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 06:14:21,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.96 | bwd_microstep: 1402.11 | bwd_inner_microstep: 1402.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3579 [2024-06-11 06:14:28,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.60 | optimizer_gradients: 4.20 | optimizer_step: 6.60 [2024-06-11 06:14:28,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.05 | bwd_microstep: 6778.34 | bwd_inner_microstep: 1578.25 | bwd_allreduce_microstep: 5200.01 | step_microstep: 38.46 [2024-06-11 06:14:28,757] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15173.14 | bwd: 45631.71 | bwd_inner: 40430.70 | bwd_allreduce: 5200.31 | step: 40.00 {'loss': 1.1441, 'learning_rate': 5.3545646778263575e-08, 'epoch': 0.98} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2460 [2024-06-11 06:14:30,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.48 | bwd_microstep: 938.89 | bwd_inner_microstep: 938.74 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.12 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3470 [2024-06-11 06:14:31,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.63 | bwd_microstep: 1210.67 | bwd_inner_microstep: 1210.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3840 [2024-06-11 06:14:33,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.88 | bwd_microstep: 1401.68 | bwd_inner_microstep: 1401.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3779 [2024-06-11 06:14:35,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.11 | bwd_microstep: 1439.51 | bwd_inner_microstep: 1439.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3796 [2024-06-11 06:14:37,947] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.22 | bwd_microstep: 1644.79 | bwd_inner_microstep: 1644.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3599 [2024-06-11 06:14:40,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.02 | bwd_microstep: 1505.32 | bwd_inner_microstep: 1505.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-11 06:14:42,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.76 | bwd_microstep: 1636.48 | bwd_inner_microstep: 1636.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-11 06:14:43,875] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.27 | bwd_microstep: 1157.04 | bwd_inner_microstep: 1157.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3451 [2024-06-11 06:14:45,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.79 | bwd_microstep: 1286.43 | bwd_inner_microstep: 1286.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3455 [2024-06-11 06:14:47,469] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.18 | bwd_microstep: 1314.43 | bwd_inner_microstep: 1314.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3524 [2024-06-11 06:14:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.41 | bwd_microstep: 1384.44 | bwd_inner_microstep: 1384.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3518 [2024-06-11 06:14:51,515] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.77 | bwd_microstep: 1548.34 | bwd_inner_microstep: 1548.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3714 [2024-06-11 06:14:53,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 631.35 | bwd_microstep: 1728.66 | bwd_inner_microstep: 1728.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3492 [2024-06-11 06:14:55,990] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.77 | bwd_microstep: 1529.26 | bwd_inner_microstep: 1529.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3654 [2024-06-11 06:14:58,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.08 | bwd_microstep: 1482.45 | bwd_inner_microstep: 1482.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2125 [2024-06-11 06:14:59,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.44 | bwd_microstep: 764.71 | bwd_inner_microstep: 764.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3901 [2024-06-11 06:15:01,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.79 | bwd_microstep: 1689.52 | bwd_inner_microstep: 1689.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3549 [2024-06-11 06:15:03,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.80 | bwd_microstep: 1397.87 | bwd_inner_microstep: 1397.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3720 [2024-06-11 06:15:05,209] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.50 | bwd_microstep: 1336.77 | bwd_inner_microstep: 1336.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3472 [2024-06-11 06:15:06,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.16 | bwd_microstep: 1280.84 | bwd_inner_microstep: 1280.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3727 [2024-06-11 06:15:08,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.44 | bwd_microstep: 1269.85 | bwd_inner_microstep: 1269.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 06:15:10,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.50 | bwd_microstep: 1416.08 | bwd_inner_microstep: 1416.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 06:15:12,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.87 | bwd_microstep: 1283.13 | bwd_inner_microstep: 1283.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3470 [2024-06-11 06:15:14,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1377.64 | bwd_inner_microstep: 1377.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3546 [2024-06-11 06:15:16,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.17 | bwd_microstep: 1329.54 | bwd_inner_microstep: 1329.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3450 [2024-06-11 06:15:17,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.36 | bwd_microstep: 1255.75 | bwd_inner_microstep: 1255.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3576 [2024-06-11 06:15:19,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.51 | bwd_microstep: 1330.61 | bwd_inner_microstep: 1330.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2231 [2024-06-11 06:15:21,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 345.19 | bwd_microstep: 929.58 | bwd_inner_microstep: 929.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 06:15:23,239] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.37 | bwd_microstep: 1545.68 | bwd_inner_microstep: 1545.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3589 [2024-06-11 06:15:25,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 612.00 | bwd_microstep: 1672.41 | bwd_inner_microstep: 1672.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2945 [2024-06-11 06:15:26,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.53 | bwd_microstep: 1038.52 | bwd_inner_microstep: 1038.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 06:15:31,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.18 | optimizer_step: 6.58 [2024-06-11 06:15:31,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.48 | bwd_microstep: 3886.24 | bwd_inner_microstep: 1534.26 | bwd_allreduce_microstep: 2351.92 | step_microstep: 39.06 [2024-06-11 06:15:31,420] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16305.34 | bwd: 46013.13 | bwd_inner: 43660.19 | bwd_allreduce: 2352.21 | step: 40.64 it] 98%|█████████▊| 1683/1726 [29:32:59<44:38, 62.28s/it] 98%|█████████▊| 1684/1726 [29:34:02<43:39, 62.36s/it] 98%|█████████▊| 1684/1726 [29:34:02<43:39, 62.36s/it] 98%|█████████▊| 1685/1726 [29:35:03<42:17, 61.89s/it] 98%|█████████▊| 1685/1726 [29:35:03<42:17, 61.89s/it] 98%|█████████▊| 1686/1726 [29:36:04<41:08, 61.72s/it] 98%|█████████▊| 1686/1726 [29:36:04<41:08, 61.72s/it] 98%|█████████▊| 1687/1726 [29:37:05<40:00, 61.55s/it] 98%|█████████▊| 1687/1726 [29:37:05<40:00, 61.55s/it] 98%|█████████▊| 1688/1726 [29:38:08<39:11, 6{'loss': 1.1412, 'learning_rate': 5.083606936815866e-08, 'epoch': 0.98} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3503 [2024-06-11 06:15:33,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.57 | bwd_microstep: 1338.54 | bwd_inner_microstep: 1338.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.15 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3927 [2024-06-11 06:15:35,603] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.30 | bwd_microstep: 1692.60 | bwd_inner_microstep: 1692.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878 [2024-06-11 06:15:37,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.23 | bwd_microstep: 1580.60 | bwd_inner_microstep: 1580.30 | bwd_allreduce_microstep: 0.16 | step_microstep: 0.27 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 06:15:39,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.47 | bwd_microstep: 1277.82 | bwd_inner_microstep: 1277.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3728 [2024-06-11 06:15:41,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.74 | bwd_microstep: 1333.36 | bwd_inner_microstep: 1333.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3400 [2024-06-11 06:15:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.29 | bwd_microstep: 1245.32 | bwd_inner_microstep: 1245.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 06:15:45,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.08 | bwd_microstep: 1386.10 | bwd_inner_microstep: 1386.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4189 [2024-06-11 06:15:47,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.76 | bwd_microstep: 1566.34 | bwd_inner_microstep: 1566.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3494 [2024-06-11 06:15:49,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1416.41 | bwd_inner_microstep: 1416.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1933 [2024-06-11 06:15:50,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.26 | bwd_microstep: 824.94 | bwd_inner_microstep: 824.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3438 [2024-06-11 06:15:52,262] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.33 | bwd_microstep: 1402.82 | bwd_inner_microstep: 1402.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273 [2024-06-11 06:15:53,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.69 | bwd_microstep: 909.23 | bwd_inner_microstep: 909.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3422 [2024-06-11 06:15:55,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.74 | bwd_microstep: 1444.27 | bwd_inner_microstep: 1444.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3660 [2024-06-11 06:15:57,610] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.44 | bwd_microstep: 1522.50 | bwd_inner_microstep: 1522.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1951 [2024-06-11 06:15:58,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.84 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3475 [2024-06-11 06:16:00,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.60 | bwd_microstep: 1219.13 | bwd_inner_microstep: 1219.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 06:16:02,321] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.94 | bwd_microstep: 1383.76 | bwd_inner_microstep: 1383.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3454 [2024-06-11 06:16:03,936] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.56 | bwd_microstep: 1158.34 | bwd_inner_microstep: 1158.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2710 [2024-06-11 06:16:05,501] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.68 | bwd_microstep: 1132.96 | bwd_inner_microstep: 1132.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 06:16:07,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.85 | bwd_microstep: 1401.75 | bwd_inner_microstep: 1401.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2919 [2024-06-11 06:16:09,007] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 428.82 | bwd_microstep: 1128.24 | bwd_inner_microstep: 1128.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3826 [2024-06-11 06:16:11,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.80 | bwd_microstep: 1492.68 | bwd_inner_microstep: 1492.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 06:16:13,208] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.78 | bwd_microstep: 1555.71 | bwd_inner_microstep: 1555.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1983 [2024-06-11 06:16:14,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.06 | bwd_microstep: 705.42 | bwd_inner_microstep: 705.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1987 [2024-06-11 06:16:15,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.18 | bwd_microstep: 797.01 | bwd_inner_microstep: 796.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3471 [2024-06-11 06:16:17,167] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 510.72 | bwd_microstep: 1347.22 | bwd_inner_microstep: 1347.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2284 [2024-06-11 06:16:18,646] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 395.75 | bwd_microstep: 1073.24 | bwd_inner_microstep: 1073.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3452 [2024-06-11 06:16:20,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.45 | bwd_microstep: 1517.46 | bwd_inner_microstep: 1517.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3421 [2024-06-11 06:16:22,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.23 | bwd_microstep: 1282.63 | bwd_inner_microstep: 1282.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3522 [2024-06-11 06:16:24,699] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.68 | bwd_microstep: 1591.82 | bwd_inner_microstep: 1591.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3597 [2024-06-11 06:16:26,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.79 | bwd_microstep: 1573.94 | bwd_inner_microstep: 1573.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2402 [2024-06-11 06:16:31,023] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.13 | optimizer_step: 6.61 [2024-06-11 06:16:31,024] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 394.14 | bwd_microstep: 3721.13 | bwd_inner_microstep: 1208.19 | bwd_allreduce_microstep: 2512.88 | step_microstep: 38.21 [2024-06-11 06:16:31,027] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15421.85 | bwd: 43820.35 | bwd_inner: 41306.07 | bwd_allreduce: 2513.36 | step: 40.33 {'loss': 1.1881, 'learning_rate': 4.819675280971492e-08, 'epoch': 0.98} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3467 [2024-06-11 06:16:32,847] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.85 | bwd_microstep: 1307.35 | bwd_inner_microstep: 1307.23 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 06:16:34,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.76 | bwd_microstep: 1376.92 | bwd_inner_microstep: 1376.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-11 06:16:36,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.02 | bwd_microstep: 1483.57 | bwd_inner_microstep: 1483.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3818 [2024-06-11 06:16:39,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.58 | bwd_microstep: 1653.71 | bwd_inner_microstep: 1653.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4199 [2024-06-11 06:16:41,366] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.76 | bwd_microstep: 1657.11 | bwd_inner_microstep: 1657.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3409 [2024-06-11 06:16:43,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.47 | bwd_microstep: 1246.96 | bwd_inner_microstep: 1246.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3744 [2024-06-11 06:16:45,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.02 | bwd_microstep: 1466.74 | bwd_inner_microstep: 1466.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1932 [2024-06-11 06:16:46,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 789.79 | bwd_inner_microstep: 789.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 06:16:47,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.56 | bwd_microstep: 1280.74 | bwd_inner_microstep: 1280.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 06:16:49,909] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.24 | bwd_microstep: 1389.15 | bwd_inner_microstep: 1389.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-11 06:16:50,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.77 | bwd_microstep: 680.26 | bwd_inner_microstep: 680.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1964 [2024-06-11 06:16:52,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.50 | bwd_microstep: 854.79 | bwd_inner_microstep: 854.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1973 [2024-06-11 06:16:53,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.01 | bwd_microstep: 859.57 | bwd_inner_microstep: 859.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3501 [2024-06-11 06:16:55,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.21 | bwd_microstep: 1447.11 | bwd_inner_microstep: 1447.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3511 [2024-06-11 06:16:57,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.57 | bwd_microstep: 1387.69 | bwd_inner_microstep: 1387.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 06:16:59,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.59 | bwd_microstep: 1384.42 | bwd_inner_microstep: 1384.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3634 [2024-06-11 06:17:01,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.77 | bwd_microstep: 1510.26 | bwd_inner_microstep: 1510.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3826 [2024-06-11 06:17:03,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.92 | bwd_microstep: 1459.60 | bwd_inner_microstep: 1459.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 06:17:05,077] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.98 | bwd_microstep: 1396.75 | bwd_inner_microstep: 1396.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3448 [2024-06-11 06:17:06,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.33 | bwd_microstep: 1283.47 | bwd_inner_microstep: 1283.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3530 [2024-06-11 06:17:08,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1391.80 | bwd_inner_microstep: 1391.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3512 [2024-06-11 06:17:10,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.53 | bwd_microstep: 1488.99 | bwd_inner_microstep: 1488.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3818 [2024-06-11 06:17:12,970] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1553.07 | bwd_inner_microstep: 1553.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 612 [2024-06-11 06:17:13,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 103.05 | bwd_microstep: 260.46 | bwd_inner_microstep: 260.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3544 [2024-06-11 06:17:15,053] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.60 | bwd_microstep: 1230.00 | bwd_inner_microstep: 1229.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994 [2024-06-11 06:17:16,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.59 | bwd_microstep: 830.79 | bwd_inner_microstep: 830.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3546 [2024-06-11 06:17:18,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1397.25 | bwd_inner_microstep: 1397.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-11 06:17:20,071] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.35 | bwd_microstep: 1396.17 | bwd_inner_microstep: 1396.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3434 [2024-06-11 06:17:22,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.24 | bwd_microstep: 1445.73 | bwd_inner_microstep: 1445.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3464 [2024-06-11 06:17:23,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 509.13 | bwd_microstep: 1345.11 | bwd_inner_microstep: 1345.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3725 [2024-06-11 06:17:26,037] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.91 | bwd_microstep: 1532.50 | bwd_inner_microstep: 1532.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2024 [2024-06-11 06:17:32,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.20 | optimizer_step: 6.59 [2024-06-11 06:17:32,622] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.00 | bwd_microstep: 6202.03 | bwd_inner_microstep: 1033.44 | bwd_allreduce_microstep: 5168.52 | step_microstep: 39.76 [2024-06-11 06:17:32,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15264.99 | bwd: 45989.89 | bwd_inner: 40820.33 | bwd_allreduce: 5168.82 | step: 41.51 {'loss': 1.1216, 'learning_rate': 4.562770639858549e-08, 'epoch': 0.98} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 06:17:34,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.30 | bwd_microstep: 1369.71 | bwd_inner_microstep: 1369.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3528 [2024-06-11 06:17:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1392.33 | bwd_inner_microstep: 1392.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-11 06:17:38,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.86 | bwd_microstep: 1453.18 | bwd_inner_microstep: 1452.90 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3785 [2024-06-11 06:17:40,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.04 | bwd_microstep: 1549.85 | bwd_inner_microstep: 1549.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-11 06:17:42,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.74 | bwd_microstep: 1414.37 | bwd_inner_microstep: 1414.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1882 [2024-06-11 06:17:43,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.65 | bwd_microstep: 716.41 | bwd_inner_microstep: 716.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1880 [2024-06-11 06:17:44,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.21 | bwd_microstep: 680.72 | bwd_inner_microstep: 680.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3708 [2024-06-11 06:17:46,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.17 | bwd_microstep: 1598.05 | bwd_inner_microstep: 1598.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1964 [2024-06-11 06:17:47,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.88 | bwd_microstep: 703.91 | bwd_inner_microstep: 703.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3513 [2024-06-11 06:17:49,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.36 | bwd_microstep: 1196.01 | bwd_inner_microstep: 1195.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3686 [2024-06-11 06:17:51,565] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.00 | bwd_microstep: 1619.04 | bwd_inner_microstep: 1619.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2893 [2024-06-11 06:17:53,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 447.98 | bwd_microstep: 1189.64 | bwd_inner_microstep: 1189.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 06:17:55,073] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.49 | bwd_microstep: 1346.10 | bwd_inner_microstep: 1346.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3700 [2024-06-11 06:17:57,259] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.30 | bwd_microstep: 1589.47 | bwd_inner_microstep: 1589.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3831 [2024-06-11 06:17:59,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.98 | bwd_microstep: 1515.70 | bwd_inner_microstep: 1515.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2093 [2024-06-11 06:18:00,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.59 | bwd_microstep: 948.26 | bwd_inner_microstep: 948.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3521 [2024-06-11 06:18:02,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1451.54 | bwd_inner_microstep: 1451.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 06:18:04,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.81 | bwd_microstep: 1256.63 | bwd_inner_microstep: 1256.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 06:18:06,554] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.24 | bwd_microstep: 1555.55 | bwd_inner_microstep: 1555.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3452 [2024-06-11 06:18:08,421] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.98 | bwd_microstep: 1351.70 | bwd_inner_microstep: 1351.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3677 [2024-06-11 06:18:10,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1431.82 | bwd_inner_microstep: 1431.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3611 [2024-06-11 06:18:12,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.52 | bwd_microstep: 1313.05 | bwd_inner_microstep: 1313.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 06:18:13,999] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.65 | bwd_microstep: 1285.55 | bwd_inner_microstep: 1285.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1624 [2024-06-11 06:18:14,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 245.24 | bwd_microstep: 646.81 | bwd_inner_microstep: 646.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3816 [2024-06-11 06:18:16,776] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.42 | bwd_microstep: 1355.02 | bwd_inner_microstep: 1354.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3603 [2024-06-11 06:18:18,852] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.54 | bwd_microstep: 1508.09 | bwd_inner_microstep: 1508.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3573 [2024-06-11 06:18:21,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.57 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3560 [2024-06-11 06:18:23,204] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.78 | bwd_microstep: 1566.02 | bwd_inner_microstep: 1566.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3814 [2024-06-11 06:18:25,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.25 | bwd_microstep: 1423.77 | bwd_inner_microstep: 1423.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3572 [2024-06-11 06:18:27,144] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.84 | bwd_microstep: 1424.17 | bwd_inner_microstep: 1424.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3430 [2024-06-11 06:18:29,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.16 | bwd_microstep: 1378.50 | bwd_inner_microstep: 1378.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2032 [2024-06-11 06:18:33,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.74 | optimizer_gradients: 4.10 | optimizer_step: 6.58 [2024-06-11 06:18:33,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.45 | bwd_microstep: 4041.84 | bwd_inner_microstep: 997.70 | bwd_allreduce_microstep: 3044.08 | step_microstep: 37.90 [2024-06-11 06:18:33,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15617.16 | bwd: 44864.40 | bwd_inner: 41819.22 | bwd_allreduce: 3044.40 | step: 39.73 {'loss': 1.1603, 'learning_rate': 4.3128939182941474e-08, 'epoch': 0.98} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3014 [2024-06-11 06:18:35,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.51 | bwd_microstep: 1217.83 | bwd_inner_microstep: 1217.66 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3965 [2024-06-11 06:18:37,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.46 | bwd_microstep: 1695.32 | bwd_inner_microstep: 1695.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3552 [2024-06-11 06:18:39,425] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.48 | bwd_microstep: 1398.72 | bwd_inner_microstep: 1398.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 06:18:40,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.49 | bwd_microstep: 972.05 | bwd_inner_microstep: 972.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3807 [2024-06-11 06:18:43,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.75 | bwd_microstep: 1653.11 | bwd_inner_microstep: 1653.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2277 [2024-06-11 06:18:44,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.01 | bwd_microstep: 973.95 | bwd_inner_microstep: 973.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 06:18:46,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.11 | bwd_microstep: 1386.14 | bwd_inner_microstep: 1386.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3408 [2024-06-11 06:18:48,076] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.18 | bwd_microstep: 1280.32 | bwd_inner_microstep: 1280.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 06:18:49,804] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.46 | bwd_microstep: 1246.87 | bwd_inner_microstep: 1246.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 06:18:51,536] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.54 | bwd_microstep: 1250.22 | bwd_inner_microstep: 1250.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3696 [2024-06-11 06:18:53,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.17 | bwd_microstep: 1488.88 | bwd_inner_microstep: 1488.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3422 [2024-06-11 06:18:55,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.81 | bwd_microstep: 1346.68 | bwd_inner_microstep: 1346.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2652 [2024-06-11 06:18:56,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.35 | bwd_microstep: 934.53 | bwd_inner_microstep: 934.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 06:18:58,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.66 | bwd_microstep: 1484.53 | bwd_inner_microstep: 1484.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3666 [2024-06-11 06:19:01,287] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 660.48 | bwd_microstep: 1823.06 | bwd_inner_microstep: 1823.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-11 06:19:03,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.76 | bwd_microstep: 1450.07 | bwd_inner_microstep: 1450.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2098 [2024-06-11 06:19:04,593] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 351.90 | bwd_microstep: 948.81 | bwd_inner_microstep: 948.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3450 [2024-06-11 06:19:06,285] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 462.70 | bwd_microstep: 1220.97 | bwd_inner_microstep: 1220.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3838 [2024-06-11 06:19:08,305] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.11 | bwd_microstep: 1463.70 | bwd_inner_microstep: 1463.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3480 [2024-06-11 06:19:10,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.78 | bwd_microstep: 1284.31 | bwd_inner_microstep: 1284.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2080 [2024-06-11 06:19:11,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.56 | bwd_microstep: 821.96 | bwd_inner_microstep: 821.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3731 [2024-06-11 06:19:13,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.55 | bwd_microstep: 1441.95 | bwd_inner_microstep: 1441.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3614 [2024-06-11 06:19:15,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.28 | bwd_microstep: 1414.57 | bwd_inner_microstep: 1414.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-11 06:19:16,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.45 | bwd_microstep: 1281.70 | bwd_inner_microstep: 1281.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 06:19:18,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.10 | bwd_microstep: 1282.41 | bwd_inner_microstep: 1282.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2150 [2024-06-11 06:19:20,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 352.79 | bwd_microstep: 949.22 | bwd_inner_microstep: 949.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2087 [2024-06-11 06:19:21,376] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.36 | bwd_microstep: 966.41 | bwd_inner_microstep: 966.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3584 [2024-06-11 06:19:23,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.22 | bwd_microstep: 1451.98 | bwd_inner_microstep: 1451.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3617 [2024-06-11 06:19:25,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.91 | bwd_microstep: 1544.18 | bwd_inner_microstep: 1544.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3781 [2024-06-11 06:19:27,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.62 | bwd_microstep: 1546.24 | bwd_inner_microstep: 1546.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3622 [2024-06-11 06:19:29,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1507.26 | bwd_inner_microstep: 1507.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 06:19:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.14 | optimizer_step: 6.58 [2024-06-11 06:19:35,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.60 | bwd_microstep: 4723.60 | bwd_inner_microstep: 1579.90 | bwd_allreduce_microstep: 3143.64 | step_microstep: 39.28 [2024-06-11 06:19:35,020] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15760.84 | bwd: 45451.58 | bwd_inner: 42306.90 | bwd_allreduce: 3143.93 | step: 40.93 {'loss': 1.1953, 'learning_rate': 4.070045996342975e-08, 'epoch': 0.98} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3483 [2024-06-11 06:19:37,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.18 | bwd_microstep: 1569.98 | bwd_inner_microstep: 1569.79 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.17 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4185 [2024-06-11 06:19:39,286] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.65 | bwd_microstep: 1510.49 | bwd_inner_microstep: 1510.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3395 [2024-06-11 06:19:41,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.28 | bwd_microstep: 1375.23 | bwd_inner_microstep: 1375.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3848 [2024-06-11 06:19:43,245] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.44 | bwd_microstep: 1493.31 | bwd_inner_microstep: 1493.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3843 [2024-06-11 06:19:45,531] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 614.72 | bwd_microstep: 1659.90 | bwd_inner_microstep: 1659.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 06:19:47,382] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.83 | bwd_microstep: 1341.13 | bwd_inner_microstep: 1341.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3537 [2024-06-11 06:19:49,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.27 | bwd_microstep: 1396.88 | bwd_inner_microstep: 1396.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 06:19:51,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.14 | bwd_microstep: 1386.07 | bwd_inner_microstep: 1386.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2179 [2024-06-11 06:19:52,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.95 | bwd_microstep: 950.37 | bwd_inner_microstep: 950.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3705 [2024-06-11 06:19:54,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.85 | bwd_microstep: 1524.15 | bwd_inner_microstep: 1524.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 06:19:56,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.39 | bwd_microstep: 1250.87 | bwd_inner_microstep: 1250.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1996 [2024-06-11 06:19:57,373] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.12 | bwd_microstep: 707.36 | bwd_inner_microstep: 707.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3430 [2024-06-11 06:19:59,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 461.00 | bwd_microstep: 1214.61 | bwd_inner_microstep: 1214.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1979 [2024-06-11 06:20:00,293] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 332.62 | bwd_microstep: 894.02 | bwd_inner_microstep: 893.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3503 [2024-06-11 06:20:02,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.51 | bwd_microstep: 1486.52 | bwd_inner_microstep: 1486.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3513 [2024-06-11 06:20:04,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.20 | bwd_microstep: 1616.39 | bwd_inner_microstep: 1616.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3472 [2024-06-11 06:20:06,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.82 | bwd_microstep: 1375.60 | bwd_inner_microstep: 1375.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3399 [2024-06-11 06:20:08,189] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.78 | bwd_microstep: 1245.62 | bwd_inner_microstep: 1245.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3563 [2024-06-11 06:20:10,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.32 | bwd_microstep: 1396.91 | bwd_inner_microstep: 1396.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-11 06:20:11,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.80 | bwd_microstep: 1351.41 | bwd_inner_microstep: 1351.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 06:20:13,859] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.49 | bwd_microstep: 1356.82 | bwd_inner_microstep: 1356.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3456 [2024-06-11 06:20:15,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.57 | bwd_microstep: 1350.64 | bwd_inner_microstep: 1350.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3451 [2024-06-11 06:20:17,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 444.16 | bwd_microstep: 1168.12 | bwd_inner_microstep: 1168.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-11 06:20:19,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.01 | bwd_microstep: 1297.85 | bwd_inner_microstep: 1297.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-11 06:20:20,964] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.44 | bwd_microstep: 1317.02 | bwd_inner_microstep: 1317.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 06:20:22,833] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.86 | bwd_microstep: 1353.05 | bwd_inner_microstep: 1353.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2280 [2024-06-11 06:20:24,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.80 | bwd_microstep: 1003.01 | bwd_inner_microstep: 1002.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3823 [2024-06-11 06:20:26,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.06 | bwd_microstep: 1418.93 | bwd_inner_microstep: 1418.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3802 [2024-06-11 06:20:28,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.53 | bwd_microstep: 1643.04 | bwd_inner_microstep: 1643.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3055 [2024-06-11 06:20:30,278] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.44 | bwd_microstep: 1331.86 | bwd_inner_microstep: 1331.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3434 [2024-06-11 06:20:32,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.90 | bwd_microstep: 1311.14 | bwd_inner_microstep: 1311.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2048 [2024-06-11 06:20:35,973] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.34 | optimizer_step: 6.61 [2024-06-11 06:20:35,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 324.71 | bwd_microstep: 3514.27 | bwd_inner_microstep: 997.65 | bwd_allreduce_microstep: 2516.56 | step_microstep: 38.60 [2024-06-11 06:20:35,977] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15793.45 | bwd: 44812.62 | bwd_inner: 42295.00 | bwd_allreduce: 2516.88 | step: 40.22 1.88s/it] 98%|█████████▊| 1688/1726 [29:38:08<39:11, 61.88s/it] 98%|█████████▊| 1689/1726 [29:39:07<37:44, 61.20s/it] 98%|█████████▊| 1689/1726 [29:39:07<37:44, 61.20s/it] 98%|█████████▊| 1690/1726 [29:40:09<36:47, 61.32s/it] 98%|█████████▊| 1690/1726 [29:40:09<36:47, 61.32s/it] 98%|█████████▊| 1691/1726 [29:41:10<35:41, 61.17s/it] 98%|█████████▊| 1691/1726 [29:41:10<35:41, 61.17s/it] 98%|█████████▊| 1692/1726 [29:42:11<34:43, 61.29s/it] 98%|█████████▊| 1692/1726 [29:42:11<34:43, 61.29s/it] 98%|█████████▊| 1693/1726 [29:43:12<33{'loss': 1.1836, 'learning_rate': 3.834227729313966e-08, 'epoch': 0.98} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3414 [2024-06-11 06:20:37,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.99 | bwd_microstep: 1337.22 | bwd_inner_microstep: 1337.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 06:20:39,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.02 | bwd_microstep: 1375.28 | bwd_inner_microstep: 1375.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3416 [2024-06-11 06:20:41,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.29 | bwd_microstep: 1245.11 | bwd_inner_microstep: 1245.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 06:20:43,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.23 | bwd_microstep: 1289.18 | bwd_inner_microstep: 1289.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-11 06:20:44,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.14 | bwd_microstep: 789.49 | bwd_inner_microstep: 789.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1865 [2024-06-11 06:20:45,370] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.42 | bwd_microstep: 741.63 | bwd_inner_microstep: 741.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 06:20:46,478] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.64 | bwd_microstep: 798.65 | bwd_inner_microstep: 798.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3497 [2024-06-11 06:20:48,394] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.45 | bwd_microstep: 1390.06 | bwd_inner_microstep: 1390.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2180 [2024-06-11 06:20:49,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 353.94 | bwd_microstep: 951.70 | bwd_inner_microstep: 951.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3430 [2024-06-11 06:20:51,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.13 | bwd_microstep: 1346.90 | bwd_inner_microstep: 1346.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 06:20:53,483] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.35 | bwd_microstep: 1387.83 | bwd_inner_microstep: 1387.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3706 [2024-06-11 06:20:55,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 613.35 | bwd_microstep: 1671.13 | bwd_inner_microstep: 1671.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3561 [2024-06-11 06:20:57,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.65 | bwd_microstep: 1598.66 | bwd_inner_microstep: 1598.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3660 [2024-06-11 06:21:00,252] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.53 | bwd_microstep: 1654.93 | bwd_inner_microstep: 1654.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3421 [2024-06-11 06:21:01,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.96 | bwd_microstep: 1248.52 | bwd_inner_microstep: 1248.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2006 [2024-06-11 06:21:03,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 314.60 | bwd_microstep: 833.80 | bwd_inner_microstep: 833.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3415 [2024-06-11 06:21:04,996] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.71 | bwd_microstep: 1343.41 | bwd_inner_microstep: 1343.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3700 [2024-06-11 06:21:06,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1379.84 | bwd_inner_microstep: 1379.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3648 [2024-06-11 06:21:08,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1517.64 | bwd_inner_microstep: 1517.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1984 [2024-06-11 06:21:10,103] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.85 | bwd_microstep: 802.16 | bwd_inner_microstep: 802.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 06:21:12,036] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.23 | bwd_microstep: 1396.65 | bwd_inner_microstep: 1396.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 06:21:13,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.14 | bwd_microstep: 1398.82 | bwd_inner_microstep: 1398.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3659 [2024-06-11 06:21:16,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.93 | bwd_microstep: 1722.70 | bwd_inner_microstep: 1722.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3597 [2024-06-11 06:21:18,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.96 | bwd_microstep: 1310.14 | bwd_inner_microstep: 1310.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-11 06:21:20,227] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.01 | bwd_microstep: 1510.38 | bwd_inner_microstep: 1510.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-11 06:21:22,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.24 | bwd_microstep: 1447.51 | bwd_inner_microstep: 1447.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2045 [2024-06-11 06:21:23,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.90 | bwd_microstep: 814.16 | bwd_inner_microstep: 814.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 06:21:25,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.91 | bwd_microstep: 1251.05 | bwd_inner_microstep: 1251.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-11 06:21:27,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.98 | bwd_microstep: 1439.58 | bwd_inner_microstep: 1439.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 06:21:29,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.87 | bwd_microstep: 1656.68 | bwd_inner_microstep: 1656.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764 [2024-06-11 06:21:31,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.44 | bwd_microstep: 1644.45 | bwd_inner_microstep: 1644.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-11 06:21:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-11 06:21:36,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.74 | bwd_microstep: 4766.18 | bwd_inner_microstep: 1702.91 | bwd_allreduce_microstep: 3063.22 | step_microstep: 38.88 [2024-06-11 06:21:36,979] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15597.12 | bwd: 45061.47 | bwd_inner: 41997.32 | bwd_allreduce: 3063.46 | step: 40.43 {'loss': 1.1871, 'learning_rate': 3.6054399477576384e-08, 'epoch': 0.98} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3477 [2024-06-11 06:21:38,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.56 | bwd_microstep: 1399.55 | bwd_inner_microstep: 1399.39 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.19 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3952 [2024-06-11 06:21:41,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.25 | bwd_microstep: 1694.77 | bwd_inner_microstep: 1694.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3784 [2024-06-11 06:21:43,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.58 | bwd_microstep: 1455.39 | bwd_inner_microstep: 1455.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3405 [2024-06-11 06:21:45,065] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.92 | bwd_microstep: 1307.90 | bwd_inner_microstep: 1307.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1953 [2024-06-11 06:21:46,164] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.38 | bwd_microstep: 792.96 | bwd_inner_microstep: 792.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3483 [2024-06-11 06:21:47,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.07 | bwd_microstep: 1187.72 | bwd_inner_microstep: 1187.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2079 [2024-06-11 06:21:48,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.40 | bwd_microstep: 726.30 | bwd_inner_microstep: 726.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3480 [2024-06-11 06:21:50,750] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.75 | bwd_microstep: 1387.05 | bwd_inner_microstep: 1387.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3486 [2024-06-11 06:21:52,533] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.50 | bwd_microstep: 1284.90 | bwd_inner_microstep: 1284.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3954 [2024-06-11 06:21:54,609] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1505.45 | bwd_inner_microstep: 1505.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3500 [2024-06-11 06:21:56,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.32 | bwd_microstep: 1416.80 | bwd_inner_microstep: 1416.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3663 [2024-06-11 06:21:58,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.43 | bwd_microstep: 1615.78 | bwd_inner_microstep: 1615.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3738 [2024-06-11 06:22:01,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.31 | bwd_microstep: 1731.02 | bwd_inner_microstep: 1730.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 06:22:03,082] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.34 | bwd_microstep: 1388.64 | bwd_inner_microstep: 1388.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3942 [2024-06-11 06:22:05,457] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 634.77 | bwd_microstep: 1729.69 | bwd_inner_microstep: 1729.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 06:22:07,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.49 | bwd_microstep: 1249.37 | bwd_inner_microstep: 1249.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3514 [2024-06-11 06:22:09,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.29 | bwd_microstep: 1416.25 | bwd_inner_microstep: 1416.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 06:22:11,044] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.48 | bwd_microstep: 1374.73 | bwd_inner_microstep: 1374.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-11 06:22:13,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.89 | bwd_microstep: 1504.21 | bwd_inner_microstep: 1504.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3605 [2024-06-11 06:22:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.80 | bwd_microstep: 1430.98 | bwd_inner_microstep: 1430.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3828 [2024-06-11 06:22:17,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.32 | bwd_microstep: 1582.15 | bwd_inner_microstep: 1582.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3578 [2024-06-11 06:22:19,236] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.27 | bwd_microstep: 1422.53 | bwd_inner_microstep: 1422.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 06:22:21,372] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.31 | bwd_microstep: 1551.01 | bwd_inner_microstep: 1550.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3640 [2024-06-11 06:22:23,465] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.75 | bwd_microstep: 1520.65 | bwd_inner_microstep: 1520.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2018 [2024-06-11 06:22:24,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.52 | bwd_microstep: 809.60 | bwd_inner_microstep: 809.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3398 [2024-06-11 06:22:26,441] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.76 | bwd_microstep: 1342.55 | bwd_inner_microstep: 1342.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-11 06:22:28,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.87 | bwd_microstep: 1757.32 | bwd_inner_microstep: 1757.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3548 [2024-06-11 06:22:30,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.38 | bwd_microstep: 1496.01 | bwd_inner_microstep: 1495.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 06:22:33,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.14 | bwd_microstep: 1553.14 | bwd_inner_microstep: 1553.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3594 [2024-06-11 06:22:34,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.39 | bwd_microstep: 1212.77 | bwd_inner_microstep: 1212.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3541 [2024-06-11 06:22:36,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.60 | bwd_microstep: 1399.95 | bwd_inner_microstep: 1399.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3554 [2024-06-11 06:22:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.07 | optimizer_step: 6.65 [2024-06-11 06:22:38,700] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.41 | bwd_microstep: 1444.52 | bwd_inner_microstep: 1436.52 | bwd_allreduce_microstep: 7.95 | step_microstep: 37.81 [2024-06-11 06:22:38,703] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16684.50 | bwd: 44691.70 | bwd_inner: 44682.73 | bwd_allreduce: 8.24 | step: 39.49 {'loss': 1.1749, 'learning_rate': 3.383683457463649e-08, 'epoch': 0.98} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3445 [2024-06-11 06:22:40,580] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.69 | bwd_microstep: 1354.45 | bwd_inner_microstep: 1354.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-11 06:22:41,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.91 | bwd_microstep: 699.83 | bwd_inner_microstep: 699.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3458 [2024-06-11 06:22:43,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.67 | bwd_microstep: 1482.73 | bwd_inner_microstep: 1482.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3890 [2024-06-11 06:22:45,795] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.94 | bwd_microstep: 1590.96 | bwd_inner_microstep: 1590.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 06:22:47,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.67 | bwd_microstep: 1244.35 | bwd_inner_microstep: 1244.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 06:22:49,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.66 | bwd_microstep: 1455.61 | bwd_inner_microstep: 1455.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3418 [2024-06-11 06:22:51,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.37 | bwd_microstep: 1251.79 | bwd_inner_microstep: 1251.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3435 [2024-06-11 06:22:53,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.83 | bwd_microstep: 1254.24 | bwd_inner_microstep: 1254.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-11 06:22:55,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.53 | bwd_microstep: 1525.94 | bwd_inner_microstep: 1525.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3490 [2024-06-11 06:22:57,022] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.53 | bwd_microstep: 1388.56 | bwd_inner_microstep: 1388.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3485 [2024-06-11 06:22:58,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.95 | bwd_microstep: 1287.49 | bwd_inner_microstep: 1287.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 06:23:00,914] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.01 | bwd_microstep: 1527.33 | bwd_inner_microstep: 1527.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1908 [2024-06-11 06:23:01,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.86 | bwd_microstep: 779.24 | bwd_inner_microstep: 779.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1942 [2024-06-11 06:23:03,132] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.82 | bwd_microstep: 820.38 | bwd_inner_microstep: 820.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3423 [2024-06-11 06:23:04,992] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.56 | bwd_microstep: 1346.37 | bwd_inner_microstep: 1346.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3497 [2024-06-11 06:23:07,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.33 | bwd_microstep: 1483.78 | bwd_inner_microstep: 1483.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2407 [2024-06-11 06:23:08,461] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.46 | bwd_microstep: 1033.00 | bwd_inner_microstep: 1032.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3674 [2024-06-11 06:23:10,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.75 | bwd_microstep: 1724.43 | bwd_inner_microstep: 1724.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-11 06:23:12,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.32 | bwd_microstep: 1472.69 | bwd_inner_microstep: 1472.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2288 [2024-06-11 06:23:13,961] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.67 | bwd_microstep: 785.51 | bwd_inner_microstep: 785.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3525 [2024-06-11 06:23:15,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.78 | bwd_microstep: 1395.88 | bwd_inner_microstep: 1395.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3822 [2024-06-11 06:23:17,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.24 | bwd_microstep: 1460.71 | bwd_inner_microstep: 1460.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1921 [2024-06-11 06:23:18,885] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.86 | bwd_microstep: 695.24 | bwd_inner_microstep: 695.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3539 [2024-06-11 06:23:20,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.60 | bwd_microstep: 1393.04 | bwd_inner_microstep: 1393.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 06:23:22,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1378.79 | bwd_inner_microstep: 1378.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2285 [2024-06-11 06:23:24,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.46 | bwd_microstep: 975.46 | bwd_inner_microstep: 975.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3516 [2024-06-11 06:23:25,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.69 | bwd_microstep: 1292.62 | bwd_inner_microstep: 1292.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 06:23:27,809] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.56 | bwd_microstep: 1411.45 | bwd_inner_microstep: 1411.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3592 [2024-06-11 06:23:29,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.12 | bwd_microstep: 1431.21 | bwd_inner_microstep: 1431.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3767 [2024-06-11 06:23:31,807] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.10 | bwd_microstep: 1467.54 | bwd_inner_microstep: 1467.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583 [2024-06-11 06:23:34,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.42 | bwd_microstep: 1698.78 | bwd_inner_microstep: 1698.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3428 [2024-06-11 06:23:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.14 | optimizer_step: 6.64 [2024-06-11 06:23:39,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.97 | bwd_microstep: 4448.89 | bwd_inner_microstep: 1648.67 | bwd_allreduce_microstep: 2800.15 | step_microstep: 39.37 [2024-06-11 06:23:39,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15570.36 | bwd: 44558.33 | bwd_inner: 41757.26 | bwd_allreduce: 2800.38 | step: 40.91 {'loss': 1.1932, 'learning_rate': 3.1689590394570204e-08, 'epoch': 0.98} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493 [2024-06-11 06:23:41,117] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.02 | bwd_microstep: 1400.03 | bwd_inner_microstep: 1400.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1857 [2024-06-11 06:23:42,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 261.43 | bwd_microstep: 674.32 | bwd_inner_microstep: 674.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3910 [2024-06-11 06:23:44,122] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1493.47 | bwd_inner_microstep: 1493.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4393 [2024-06-11 06:23:46,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.29 | bwd_microstep: 1644.01 | bwd_inner_microstep: 1643.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1881 [2024-06-11 06:23:47,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.45 | bwd_microstep: 679.67 | bwd_inner_microstep: 679.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3478 [2024-06-11 06:23:49,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.21 | bwd_microstep: 1483.24 | bwd_inner_microstep: 1483.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3401 [2024-06-11 06:23:51,115] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.25 | bwd_microstep: 1245.98 | bwd_inner_microstep: 1245.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3712 [2024-06-11 06:23:53,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.44 | bwd_microstep: 1431.11 | bwd_inner_microstep: 1431.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3511 [2024-06-11 06:23:54,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.11 | bwd_microstep: 1290.45 | bwd_inner_microstep: 1290.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3619 [2024-06-11 06:23:56,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.17 | bwd_microstep: 1314.95 | bwd_inner_microstep: 1314.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3402 [2024-06-11 06:23:58,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.44 | bwd_microstep: 1371.15 | bwd_inner_microstep: 1371.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-11 06:24:00,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.11 | bwd_microstep: 1419.97 | bwd_inner_microstep: 1419.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 06:24:02,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.39 | bwd_microstep: 1345.43 | bwd_inner_microstep: 1345.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 06:24:04,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.79 | bwd_microstep: 1374.48 | bwd_inner_microstep: 1374.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3504 [2024-06-11 06:24:06,448] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.46 | bwd_microstep: 1553.03 | bwd_inner_microstep: 1553.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2509 [2024-06-11 06:24:07,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 397.12 | bwd_microstep: 1062.90 | bwd_inner_microstep: 1062.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 06:24:09,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.05 | bwd_microstep: 1252.92 | bwd_inner_microstep: 1252.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3429 [2024-06-11 06:24:11,306] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.34 | bwd_microstep: 1192.61 | bwd_inner_microstep: 1192.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3520 [2024-06-11 06:24:13,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 466.99 | bwd_microstep: 1226.29 | bwd_inner_microstep: 1226.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-11 06:24:15,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.23 | bwd_microstep: 1512.19 | bwd_inner_microstep: 1512.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3705 [2024-06-11 06:24:16,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.21 | bwd_microstep: 1334.09 | bwd_inner_microstep: 1334.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3532 [2024-06-11 06:24:18,870] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.68 | bwd_microstep: 1396.20 | bwd_inner_microstep: 1396.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3513 [2024-06-11 06:24:20,920] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.83 | bwd_microstep: 1487.60 | bwd_inner_microstep: 1487.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3528 [2024-06-11 06:24:22,720] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.54 | bwd_microstep: 1295.53 | bwd_inner_microstep: 1295.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1919 [2024-06-11 06:24:23,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.05 | bwd_microstep: 688.59 | bwd_inner_microstep: 688.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2349 [2024-06-11 06:24:25,091] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 378.81 | bwd_microstep: 1023.49 | bwd_inner_microstep: 1023.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3789 [2024-06-11 06:24:27,090] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.80 | bwd_microstep: 1446.23 | bwd_inner_microstep: 1446.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3813 [2024-06-11 06:24:29,272] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.44 | bwd_microstep: 1585.97 | bwd_inner_microstep: 1585.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2484 [2024-06-11 06:24:30,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 383.36 | bwd_microstep: 1026.83 | bwd_inner_microstep: 1026.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3570 [2024-06-11 06:24:32,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.45 | bwd_microstep: 1331.83 | bwd_inner_microstep: 1331.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3563 [2024-06-11 06:24:34,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.29 | bwd_microstep: 1502.21 | bwd_inner_microstep: 1502.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2973 [2024-06-11 06:24:38,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.18 | optimizer_step: 6.59 [2024-06-11 06:24:38,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 417.37 | bwd_microstep: 3064.93 | bwd_inner_microstep: 1248.16 | bwd_allreduce_microstep: 1816.71 | step_microstep: 39.90 [2024-06-11 06:24:38,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15476.95 | bwd: 43151.77 | bwd_inner: 41333.29 | bwd_allreduce: 1816.93 | step: 41.51 {'loss': 1.1929, 'learning_rate': 2.9612674499961413e-08, 'epoch': 0.98} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1939 [2024-06-11 06:24:39,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 295.95 | bwd_microstep: 780.71 | bwd_inner_microstep: 780.52 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:24:41,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.37 | bwd_microstep: 1379.55 | bwd_inner_microstep: 1379.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3878 [2024-06-11 06:24:43,337] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.24 | bwd_microstep: 1581.23 | bwd_inner_microstep: 1581.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2295 [2024-06-11 06:24:44,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.44 | bwd_microstep: 877.67 | bwd_inner_microstep: 877.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2057 [2024-06-11 06:24:45,621] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.97 | bwd_microstep: 764.87 | bwd_inner_microstep: 764.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3811 [2024-06-11 06:24:47,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.33 | bwd_microstep: 1549.31 | bwd_inner_microstep: 1549.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3503 [2024-06-11 06:24:49,673] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.54 | bwd_microstep: 1390.85 | bwd_inner_microstep: 1390.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1916 [2024-06-11 06:24:50,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.64 | bwd_microstep: 719.38 | bwd_inner_microstep: 719.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 06:24:52,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.93 | bwd_microstep: 1281.60 | bwd_inner_microstep: 1281.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 11, images per sample: 2.75, dynamic token length: 1466 [2024-06-11 06:24:53,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 223.93 | bwd_microstep: 595.18 | bwd_inner_microstep: 595.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3753 [2024-06-11 06:24:55,300] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.96 | bwd_microstep: 1464.09 | bwd_inner_microstep: 1464.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3613 [2024-06-11 06:24:57,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.49 | bwd_microstep: 1462.48 | bwd_inner_microstep: 1462.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3744 [2024-06-11 06:24:59,298] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.81 | bwd_microstep: 1435.33 | bwd_inner_microstep: 1435.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 06:25:01,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.44 | bwd_microstep: 1284.27 | bwd_inner_microstep: 1284.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3507 [2024-06-11 06:25:03,253] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.12 | bwd_microstep: 1581.40 | bwd_inner_microstep: 1581.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2123 [2024-06-11 06:25:04,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 325.53 | bwd_microstep: 860.08 | bwd_inner_microstep: 860.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 06:25:06,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.85 | bwd_microstep: 1473.26 | bwd_inner_microstep: 1473.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 06:25:08,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 1257.35 | bwd_inner_microstep: 1257.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3564 [2024-06-11 06:25:10,165] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.67 | bwd_microstep: 1402.72 | bwd_inner_microstep: 1402.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3832 [2024-06-11 06:25:12,309] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.98 | bwd_microstep: 1556.76 | bwd_inner_microstep: 1556.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2069 [2024-06-11 06:25:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.01 | bwd_microstep: 752.99 | bwd_inner_microstep: 752.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3559 [2024-06-11 06:25:15,079] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.95 | bwd_microstep: 1236.92 | bwd_inner_microstep: 1236.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3557 [2024-06-11 06:25:17,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.06 | bwd_microstep: 1501.07 | bwd_inner_microstep: 1501.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2282 [2024-06-11 06:25:18,500] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.81 | bwd_microstep: 976.57 | bwd_inner_microstep: 976.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3451 [2024-06-11 06:25:20,155] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.02 | bwd_microstep: 1192.72 | bwd_inner_microstep: 1192.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3436 [2024-06-11 06:25:22,157] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.78 | bwd_microstep: 1455.58 | bwd_inner_microstep: 1455.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3593 [2024-06-11 06:25:24,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.23 | bwd_microstep: 1406.15 | bwd_inner_microstep: 1406.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3725 [2024-06-11 06:25:26,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 632.45 | bwd_microstep: 1728.93 | bwd_inner_microstep: 1728.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2229 [2024-06-11 06:25:27,798] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 358.33 | bwd_microstep: 960.31 | bwd_inner_microstep: 960.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2232 [2024-06-11 06:25:29,126] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.88 | bwd_microstep: 961.85 | bwd_inner_microstep: 961.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3579 [2024-06-11 06:25:31,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.98 | bwd_microstep: 1504.25 | bwd_inner_microstep: 1504.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3510 [2024-06-11 06:25:43,137] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.15 | optimizer_step: 6.62 [2024-06-11 06:25:43,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.25 | bwd_microstep: 11399.83 | bwd_inner_microstep: 1448.35 | bwd_allreduce_microstep: 9951.42 | step_microstep: 39.59 [2024-06-11 06:25:43,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14868.87 | bwd: 49775.28 | bwd_inner: 39822.81 | bwd_allreduce: 9951.72 | step: 41.32 :39, 61.19s/it] 98%|█████████▊| 1693/1726 [29:43:12<33:39, 61.19s/it] 98%|█████████▊| 1694/1726 [29:44:13<32:36, 61.13s/it] 98%|█████████▊| 1694/1726 [29:44:13<32:36, 61.13s/it] 98%|█████████▊| 1695/1726 [29:45:15<31:40, 61.31s/it] 98%|█████████▊| 1695/1726 [29:45:15<31:40, 61.31s/it] 98%|█████████▊| 1696/1726 [29:46:15<30:31, 61.06s/it] 98%|█████████▊| 1696/1726 [29:46:15<30:31, 61.06s/it] 98%|█████████▊| 1697/1726 [29:47:14<29:12, 60.43s/it] 98%|█████████▊| 1697/1726 [29:47:14<29:12, 60.43s/it] 98%|█████████▊| 1698/1726 [29:48{'loss': 1.1637, 'learning_rate': 2.760609420569882e-08, 'epoch': 0.98} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3463 [2024-06-11 06:25:45,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.01 | bwd_microstep: 1371.32 | bwd_inner_microstep: 1371.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2389 [2024-06-11 06:25:46,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.05 | bwd_microstep: 999.34 | bwd_inner_microstep: 999.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 06:25:48,704] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.24 | bwd_microstep: 1650.73 | bwd_inner_microstep: 1650.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3851 [2024-06-11 06:25:50,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.26 | bwd_microstep: 1485.42 | bwd_inner_microstep: 1485.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3774 [2024-06-11 06:25:52,740] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.54 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1941 [2024-06-11 06:25:53,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.90 | bwd_microstep: 791.06 | bwd_inner_microstep: 791.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 06:25:55,748] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1380.49 | bwd_inner_microstep: 1380.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3411 [2024-06-11 06:25:57,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 450.97 | bwd_microstep: 1180.41 | bwd_inner_microstep: 1180.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-11 06:25:59,435] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.25 | bwd_microstep: 1487.47 | bwd_inner_microstep: 1487.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 06:26:01,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.03 | bwd_microstep: 1253.18 | bwd_inner_microstep: 1253.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 06:26:02,953] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.91 | bwd_microstep: 1288.74 | bwd_inner_microstep: 1288.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3442 [2024-06-11 06:26:04,561] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.44 | bwd_microstep: 1157.93 | bwd_inner_microstep: 1157.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3668 [2024-06-11 06:26:06,396] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.51 | bwd_microstep: 1324.37 | bwd_inner_microstep: 1324.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3601 [2024-06-11 06:26:08,423] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.59 | bwd_microstep: 1469.04 | bwd_inner_microstep: 1469.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 27, images per sample: 6.75, dynamic token length: 3492 [2024-06-11 06:26:10,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.34 | bwd_microstep: 1369.26 | bwd_inner_microstep: 1369.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3645 [2024-06-11 06:26:12,664] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 624.03 | bwd_microstep: 1713.70 | bwd_inner_microstep: 1713.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3635 [2024-06-11 06:26:14,780] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.18 | bwd_microstep: 1535.56 | bwd_inner_microstep: 1535.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 06:26:16,712] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.69 | bwd_microstep: 1394.23 | bwd_inner_microstep: 1394.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2700 [2024-06-11 06:26:18,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 421.69 | bwd_microstep: 1130.58 | bwd_inner_microstep: 1130.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 06:26:20,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.66 | bwd_microstep: 1381.86 | bwd_inner_microstep: 1381.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 06:26:22,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.36 | bwd_microstep: 1379.37 | bwd_inner_microstep: 1379.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 06:26:24,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.63 | bwd_microstep: 1495.06 | bwd_inner_microstep: 1495.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3091 [2024-06-11 06:26:25,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.29 | bwd_microstep: 1064.01 | bwd_inner_microstep: 1063.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2001 [2024-06-11 06:26:26,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.18 | bwd_microstep: 898.41 | bwd_inner_microstep: 898.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-11 06:26:28,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.09 | bwd_microstep: 1430.48 | bwd_inner_microstep: 1430.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1943 [2024-06-11 06:26:29,925] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 289.67 | bwd_microstep: 762.20 | bwd_inner_microstep: 762.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 06:26:31,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.62 | bwd_microstep: 1257.49 | bwd_inner_microstep: 1257.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 3004 [2024-06-11 06:26:33,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 391.05 | bwd_microstep: 1020.74 | bwd_inner_microstep: 1020.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3592 [2024-06-11 06:26:34,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.60 | bwd_microstep: 1337.92 | bwd_inner_microstep: 1337.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3729 [2024-06-11 06:26:36,792] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.54 | bwd_microstep: 1338.87 | bwd_inner_microstep: 1338.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3568 [2024-06-11 06:26:38,727] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.27 | bwd_microstep: 1397.08 | bwd_inner_microstep: 1397.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3551 [2024-06-11 06:26:45,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.36 | optimizer_step: 6.58 [2024-06-11 06:26:45,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.42 | bwd_microstep: 6054.85 | bwd_inner_microstep: 1724.78 | bwd_allreduce_microstep: 4329.95 | step_microstep: 41.02 [2024-06-11 06:26:45,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15675.58 | bwd: 46239.56 | bwd_inner: 41908.63 | bwd_allreduce: 4330.22 | step: 42.63 {'loss': 1.1207, 'learning_rate': 2.566985657894483e-08, 'epoch': 0.98} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3399 [2024-06-11 06:26:47,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 481.30 | bwd_microstep: 1270.20 | bwd_inner_microstep: 1270.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1943 [2024-06-11 06:26:48,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.66 | bwd_microstep: 792.66 | bwd_inner_microstep: 792.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3839 [2024-06-11 06:26:50,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.21 | bwd_microstep: 1553.72 | bwd_inner_microstep: 1553.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 06:26:52,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.59 | bwd_microstep: 1344.00 | bwd_inner_microstep: 1343.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3810 [2024-06-11 06:26:54,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.99 | bwd_microstep: 1456.55 | bwd_inner_microstep: 1456.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3739 [2024-06-11 06:26:56,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.35 | bwd_microstep: 1530.59 | bwd_inner_microstep: 1530.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3494 [2024-06-11 06:26:58,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.80 | bwd_microstep: 1283.69 | bwd_inner_microstep: 1283.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3425 [2024-06-11 06:26:59,904] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1249.69 | bwd_inner_microstep: 1249.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.21 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 06:27:01,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.68 | bwd_microstep: 799.12 | bwd_inner_microstep: 799.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3447 [2024-06-11 06:27:02,687] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 456.97 | bwd_microstep: 1207.75 | bwd_inner_microstep: 1207.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2095 [2024-06-11 06:27:03,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.12 | bwd_microstep: 854.84 | bwd_inner_microstep: 854.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2666 [2024-06-11 06:27:05,369] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 404.85 | bwd_microstep: 1084.77 | bwd_inner_microstep: 1084.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3650 [2024-06-11 06:27:07,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 626.65 | bwd_microstep: 1716.05 | bwd_inner_microstep: 1716.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2963 [2024-06-11 06:27:09,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 448.01 | bwd_microstep: 1198.43 | bwd_inner_microstep: 1198.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3638 [2024-06-11 06:27:11,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.16 | bwd_microstep: 1434.80 | bwd_inner_microstep: 1434.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3837 [2024-06-11 06:27:13,505] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.10 | bwd_microstep: 1558.98 | bwd_inner_microstep: 1558.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 06:27:15,446] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.91 | bwd_microstep: 1401.81 | bwd_inner_microstep: 1401.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 611 [2024-06-11 06:27:15,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 102.83 | bwd_microstep: 258.83 | bwd_inner_microstep: 258.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 06:27:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1511.78 | bwd_inner_microstep: 1511.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3439 [2024-06-11 06:27:19,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.57 | bwd_microstep: 1256.49 | bwd_inner_microstep: 1256.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3822 [2024-06-11 06:27:21,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.39 | bwd_microstep: 1645.61 | bwd_inner_microstep: 1645.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3522 [2024-06-11 06:27:23,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.67 | bwd_microstep: 1518.81 | bwd_inner_microstep: 1518.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 06:27:25,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.16 | bwd_microstep: 1358.82 | bwd_inner_microstep: 1358.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3556 [2024-06-11 06:27:27,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.23 | bwd_microstep: 1260.21 | bwd_inner_microstep: 1260.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3574 [2024-06-11 06:27:29,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.83 | bwd_microstep: 1499.62 | bwd_inner_microstep: 1499.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2288 [2024-06-11 06:27:30,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.91 | bwd_microstep: 877.39 | bwd_inner_microstep: 877.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2013 [2024-06-11 06:27:31,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 293.48 | bwd_microstep: 771.38 | bwd_inner_microstep: 771.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:27:33,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1383.27 | bwd_inner_microstep: 1383.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1532 [2024-06-11 06:27:34,690] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 217.37 | bwd_microstep: 564.25 | bwd_inner_microstep: 564.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2952 [2024-06-11 06:27:36,212] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.11 | bwd_microstep: 1099.20 | bwd_inner_microstep: 1099.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3769 [2024-06-11 06:27:38,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.72 | bwd_microstep: 1551.12 | bwd_inner_microstep: 1551.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3464 [2024-06-11 06:27:46,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 18.36 | optimizer_gradients: 4.28 | optimizer_step: 6.61 [2024-06-11 06:27:46,427] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.35 | bwd_microstep: 7571.77 | bwd_inner_microstep: 1325.16 | bwd_allreduce_microstep: 6246.54 | step_microstep: 41.68 [2024-06-11 06:27:46,430] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 14806.85 | bwd: 45866.24 | bwd_inner: 39618.73 | bwd_allreduce: 6246.78 | step: 43.53 {'loss': 1.1581, 'learning_rate': 2.3803968439117807e-08, 'epoch': 0.98} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3413 [2024-06-11 06:27:48,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.59 | bwd_microstep: 1360.26 | bwd_inner_microstep: 1360.18 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3459 [2024-06-11 06:27:50,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.92 | bwd_microstep: 1472.75 | bwd_inner_microstep: 1472.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2290 [2024-06-11 06:27:51,692] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.06 | bwd_microstep: 972.30 | bwd_inner_microstep: 972.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3508 [2024-06-11 06:27:53,471] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.46 | bwd_microstep: 1285.27 | bwd_inner_microstep: 1285.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3789 [2024-06-11 06:27:55,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.53 | bwd_microstep: 1380.76 | bwd_inner_microstep: 1380.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 4079 [2024-06-11 06:27:57,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.38 | bwd_microstep: 1425.30 | bwd_inner_microstep: 1425.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3767 [2024-06-11 06:27:59,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.53 | bwd_microstep: 1543.13 | bwd_inner_microstep: 1543.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1935 [2024-06-11 06:28:00,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.57 | bwd_microstep: 788.62 | bwd_inner_microstep: 788.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-11 06:28:01,676] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.84 | bwd_microstep: 792.77 | bwd_inner_microstep: 792.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3546 [2024-06-11 06:28:03,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.12 | bwd_microstep: 1297.18 | bwd_inner_microstep: 1297.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1959 [2024-06-11 06:28:04,539] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 291.09 | bwd_microstep: 764.08 | bwd_inner_microstep: 764.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3653 [2024-06-11 06:28:06,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.33 | bwd_microstep: 1612.53 | bwd_inner_microstep: 1612.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 06:28:08,661] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.07 | bwd_microstep: 1379.32 | bwd_inner_microstep: 1379.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3540 [2024-06-11 06:28:10,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.02 | bwd_microstep: 1390.86 | bwd_inner_microstep: 1390.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 06:28:12,688] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.92 | bwd_microstep: 1523.15 | bwd_inner_microstep: 1523.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3634 [2024-06-11 06:28:14,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.32 | bwd_microstep: 1411.84 | bwd_inner_microstep: 1411.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 06:28:16,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.90 | bwd_microstep: 1281.36 | bwd_inner_microstep: 1281.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463 [2024-06-11 06:28:18,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.07 | bwd_microstep: 1312.53 | bwd_inner_microstep: 1312.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 06:28:20,109] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.20 | bwd_microstep: 1356.26 | bwd_inner_microstep: 1356.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1954 [2024-06-11 06:28:21,089] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.38 | bwd_microstep: 702.30 | bwd_inner_microstep: 702.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1935 [2024-06-11 06:28:22,066] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.97 | bwd_microstep: 699.73 | bwd_inner_microstep: 699.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 06:28:23,972] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.57 | bwd_microstep: 1377.74 | bwd_inner_microstep: 1377.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3789 [2024-06-11 06:28:26,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 640.37 | bwd_microstep: 1750.41 | bwd_inner_microstep: 1750.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3544 [2024-06-11 06:28:28,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.68 | bwd_microstep: 1356.99 | bwd_inner_microstep: 1356.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809 [2024-06-11 06:28:30,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.79 | bwd_microstep: 1584.23 | bwd_inner_microstep: 1584.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3491 [2024-06-11 06:28:32,297] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.59 | bwd_microstep: 1347.01 | bwd_inner_microstep: 1346.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2189 [2024-06-11 06:28:33,494] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.61 | bwd_microstep: 861.35 | bwd_inner_microstep: 861.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2181 [2024-06-11 06:28:34,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.87 | bwd_microstep: 955.50 | bwd_inner_microstep: 955.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-11 06:28:36,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.95 | bwd_microstep: 1461.09 | bwd_inner_microstep: 1461.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3471 [2024-06-11 06:28:38,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.68 | bwd_microstep: 1286.12 | bwd_inner_microstep: 1286.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2211 [2024-06-11 06:28:39,820] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.91 | bwd_microstep: 866.77 | bwd_inner_microstep: 866.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3755 [2024-06-11 06:28:50,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.30 | optimizer_step: 6.64 [2024-06-11 06:28:50,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.27 | bwd_microstep: 9560.74 | bwd_inner_microstep: 1860.89 | bwd_allreduce_microstep: 7699.78 | step_microstep: 40.01 [2024-06-11 06:28:50,035] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15101.20 | bwd: 48160.27 | bwd_inner: 40459.50 | bwd_allreduce: 7700.06 | step: 41.59 {'loss': 1.1754, 'learning_rate': 2.2008436357869866e-08, 'epoch': 0.99} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1916 [2024-06-11 06:28:51,118] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.21 | bwd_microstep: 774.75 | bwd_inner_microstep: 774.59 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.11 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-11 06:28:53,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.89 | bwd_microstep: 1473.71 | bwd_inner_microstep: 1473.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.11 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3885 [2024-06-11 06:28:55,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.12 | bwd_microstep: 1479.03 | bwd_inner_microstep: 1479.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4186 [2024-06-11 06:28:57,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.45 | bwd_microstep: 1612.30 | bwd_inner_microstep: 1612.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1922 [2024-06-11 06:28:58,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.24 | bwd_microstep: 817.20 | bwd_inner_microstep: 817.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 06:29:00,466] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.53 | bwd_microstep: 1374.60 | bwd_inner_microstep: 1374.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 06:29:02,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.86 | bwd_microstep: 1247.11 | bwd_inner_microstep: 1247.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 06:29:04,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.40 | bwd_microstep: 1382.42 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3716 [2024-06-11 06:29:06,130] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.78 | bwd_microstep: 1465.12 | bwd_inner_microstep: 1465.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 06:29:07,997] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.86 | bwd_microstep: 1353.02 | bwd_inner_microstep: 1352.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3497 [2024-06-11 06:29:09,946] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.98 | bwd_microstep: 1413.11 | bwd_inner_microstep: 1413.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3418 [2024-06-11 06:29:11,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1183.35 | bwd_inner_microstep: 1183.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 2145 [2024-06-11 06:29:12,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.61 | bwd_microstep: 1006.45 | bwd_inner_microstep: 1006.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3508 [2024-06-11 06:29:15,145] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.89 | bwd_microstep: 1579.68 | bwd_inner_microstep: 1579.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3500 [2024-06-11 06:29:17,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.64 | bwd_microstep: 1481.42 | bwd_inner_microstep: 1481.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2425 [2024-06-11 06:29:18,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 385.42 | bwd_microstep: 1032.63 | bwd_inner_microstep: 1032.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3498 [2024-06-11 06:29:20,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.40 | bwd_microstep: 1387.66 | bwd_inner_microstep: 1387.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-11 06:29:22,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.17 | bwd_microstep: 1509.02 | bwd_inner_microstep: 1508.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3606 [2024-06-11 06:29:24,333] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.85 | bwd_microstep: 1245.51 | bwd_inner_microstep: 1245.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-11 06:29:25,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.46 | bwd_microstep: 975.24 | bwd_inner_microstep: 975.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3623 [2024-06-11 06:29:27,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.75 | bwd_microstep: 1508.97 | bwd_inner_microstep: 1508.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3536 [2024-06-11 06:29:29,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.16 | bwd_microstep: 1294.59 | bwd_inner_microstep: 1294.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-11 06:29:31,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.82 | bwd_microstep: 1549.49 | bwd_inner_microstep: 1549.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 06:29:33,597] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.84 | bwd_microstep: 1375.54 | bwd_inner_microstep: 1375.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1922 [2024-06-11 06:29:34,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 268.62 | bwd_microstep: 696.37 | bwd_inner_microstep: 696.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3752 [2024-06-11 06:29:36,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1532.75 | bwd_inner_microstep: 1532.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 06:29:38,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.87 | bwd_microstep: 1496.77 | bwd_inner_microstep: 1496.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3526 [2024-06-11 06:29:40,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.53 | bwd_microstep: 1455.79 | bwd_inner_microstep: 1455.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089 [2024-06-11 06:29:42,029] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.69 | bwd_microstep: 917.75 | bwd_inner_microstep: 917.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3808 [2024-06-11 06:29:44,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.25 | bwd_microstep: 1548.41 | bwd_inner_microstep: 1548.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 06:29:45,891] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.67 | bwd_microstep: 1246.30 | bwd_inner_microstep: 1246.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3587 [2024-06-11 06:29:50,743] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.17 | optimizer_step: 6.61 [2024-06-11 06:29:50,744] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.29 | bwd_microstep: 4200.95 | bwd_inner_microstep: 1851.46 | bwd_allreduce_microstep: 2349.44 | step_microstep: 39.03 [2024-06-11 06:29:50,747] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15735.04 | bwd: 44617.03 | bwd_inner: 42266.56 | bwd_allreduce: 2349.73 | step: 40.71 {'loss': 1.1812, 'learning_rate': 2.0283266659051338e-08, 'epoch': 0.99} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 06:29:52,697] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.24 | bwd_microstep: 1408.20 | bwd_inner_microstep: 1408.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 06:29:54,606] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.41 | bwd_microstep: 1378.95 | bwd_inner_microstep: 1378.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2376 [2024-06-11 06:29:55,993] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 376.34 | bwd_microstep: 997.05 | bwd_inner_microstep: 997.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3399 [2024-06-11 06:29:57,842] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.64 | bwd_microstep: 1338.77 | bwd_inner_microstep: 1338.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3467 [2024-06-11 06:29:59,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.69 | bwd_microstep: 1380.78 | bwd_inner_microstep: 1380.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3724 [2024-06-11 06:30:01,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.36 | bwd_microstep: 1530.71 | bwd_inner_microstep: 1530.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3416 [2024-06-11 06:30:03,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1343.33 | bwd_inner_microstep: 1343.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2492 [2024-06-11 06:30:05,043] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 360.74 | bwd_microstep: 954.91 | bwd_inner_microstep: 954.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3482 [2024-06-11 06:30:06,830] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.26 | bwd_microstep: 1284.12 | bwd_inner_microstep: 1284.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3507 [2024-06-11 06:30:08,782] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.11 | bwd_microstep: 1416.73 | bwd_inner_microstep: 1416.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3493 [2024-06-11 06:30:10,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.09 | bwd_microstep: 1416.66 | bwd_inner_microstep: 1416.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1982 [2024-06-11 06:30:11,884] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.49 | bwd_microstep: 830.29 | bwd_inner_microstep: 830.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3508 [2024-06-11 06:30:13,879] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.82 | bwd_microstep: 1445.70 | bwd_inner_microstep: 1445.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2164 [2024-06-11 06:30:15,231] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.39 | bwd_microstep: 978.72 | bwd_inner_microstep: 978.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3485 [2024-06-11 06:30:17,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.72 | bwd_microstep: 1480.69 | bwd_inner_microstep: 1480.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3642 [2024-06-11 06:30:19,223] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.55 | bwd_microstep: 1411.31 | bwd_inner_microstep: 1411.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3415 [2024-06-11 06:30:21,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.61 | bwd_microstep: 1437.74 | bwd_inner_microstep: 1437.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3651 [2024-06-11 06:30:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.72 | bwd_microstep: 1324.38 | bwd_inner_microstep: 1324.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3632 [2024-06-11 06:30:25,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.25 | bwd_microstep: 1608.54 | bwd_inner_microstep: 1608.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 06:30:27,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.99 | bwd_microstep: 1413.75 | bwd_inner_microstep: 1413.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3821 [2024-06-11 06:30:29,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.69 | bwd_microstep: 1556.67 | bwd_inner_microstep: 1556.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3470 [2024-06-11 06:30:31,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.88 | bwd_microstep: 1284.25 | bwd_inner_microstep: 1284.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2277 [2024-06-11 06:30:32,221] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.77 | bwd_microstep: 785.45 | bwd_inner_microstep: 785.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3742 [2024-06-11 06:30:34,134] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.88 | bwd_microstep: 1383.88 | bwd_inner_microstep: 1383.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 06:30:36,042] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1377.44 | bwd_inner_microstep: 1377.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3547 [2024-06-11 06:30:38,106] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.93 | bwd_microstep: 1494.79 | bwd_inner_microstep: 1494.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3496 [2024-06-11 06:30:40,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.04 | bwd_microstep: 1384.48 | bwd_inner_microstep: 1384.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1995 [2024-06-11 06:30:41,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 333.95 | bwd_microstep: 895.34 | bwd_inner_microstep: 895.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3810 [2024-06-11 06:30:43,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.25 | bwd_microstep: 1556.68 | bwd_inner_microstep: 1556.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 06:30:45,188] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.68 | bwd_microstep: 1287.33 | bwd_inner_microstep: 1287.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 06:30:46,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.83 | bwd_microstep: 1258.36 | bwd_inner_microstep: 1258.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3590 [2024-06-11 06:30:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.65 | optimizer_gradients: 4.19 | optimizer_step: 6.60 [2024-06-11 06:30:52,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.89 | bwd_microstep: 5369.62 | bwd_inner_microstep: 2252.83 | bwd_allreduce_microstep: 3116.73 | step_microstep: 39.23 [2024-06-11 06:30:52,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15860.49 | bwd: 46015.63 | bwd_inner: 42897.96 | bwd_allreduce: 3116.96 | step: 40.88 :19<28:50, 61.80s/it] 98%|█████████▊| 1698/1726 [29:48:19<28:50, 61.80s/it] 98%|█████████▊| 1699/1726 [29:49:22<27:52, 61.94s/it] 98%|█████████▊| 1699/1726 [29:49:22<27:52, 61.94s/it] 98%|█████████▊| 1700/1726 [29:50:23<26:43, 61.67s/it] 98%|█████████▊| 1700/1726 [29:50:23<26:43, 61.67s/it] 99%|█████████▊| 1701/1726 [29:51:26<25:56, 62.25s/it] 99%|█████████▊| 1701/1726 [29:51:26<25:56, 62.25s/it] 99%|█████████▊| 1702/1726 [29:52:27<24:42, 61.79s/it] 99%|█████████▊| 1702/1726 [29:52:27<24:42, 61.79s/it] 99%|█████████▊| 1703/1726 {'loss': 1.1587, 'learning_rate': 1.862846541870633e-08, 'epoch': 0.99} dynamic ViT batch size: 8, images per sample: 2.0, dynamic token length: 1252 [2024-06-11 06:30:53,652] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 184.63 | bwd_microstep: 479.98 | bwd_inner_microstep: 479.83 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1866 [2024-06-11 06:30:54,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.06 | bwd_microstep: 740.31 | bwd_inner_microstep: 740.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3906 [2024-06-11 06:30:56,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.60 | bwd_microstep: 1589.35 | bwd_inner_microstep: 1589.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3486 [2024-06-11 06:30:58,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1479.66 | bwd_inner_microstep: 1479.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3769 [2024-06-11 06:31:00,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.60 | bwd_microstep: 1341.97 | bwd_inner_microstep: 1341.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3545 [2024-06-11 06:31:02,834] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.55 | bwd_microstep: 1492.21 | bwd_inner_microstep: 1492.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3418 [2024-06-11 06:31:04,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.64 | bwd_microstep: 1344.59 | bwd_inner_microstep: 1344.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1971 [2024-06-11 06:31:05,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.69 | bwd_microstep: 703.24 | bwd_inner_microstep: 703.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-11 06:31:07,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.68 | bwd_microstep: 1485.19 | bwd_inner_microstep: 1485.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3504 [2024-06-11 06:31:09,640] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1390.52 | bwd_inner_microstep: 1390.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3556 [2024-06-11 06:31:11,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.37 | bwd_microstep: 1505.74 | bwd_inner_microstep: 1505.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1990 [2024-06-11 06:31:12,908] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.73 | bwd_microstep: 860.56 | bwd_inner_microstep: 860.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3389 [2024-06-11 06:31:14,627] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.91 | bwd_microstep: 1240.87 | bwd_inner_microstep: 1240.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2132 [2024-06-11 06:31:15,860] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.34 | bwd_microstep: 890.41 | bwd_inner_microstep: 890.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3611 [2024-06-11 06:31:18,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 621.25 | bwd_microstep: 1702.44 | bwd_inner_microstep: 1702.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 50, images per sample: 12.5, dynamic token length: 3652 [2024-06-11 06:31:20,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 649.42 | bwd_microstep: 1780.96 | bwd_inner_microstep: 1780.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2131 [2024-06-11 06:31:22,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.26 | bwd_microstep: 1021.73 | bwd_inner_microstep: 1021.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2483 [2024-06-11 06:31:23,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 392.73 | bwd_microstep: 1052.34 | bwd_inner_microstep: 1052.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3524 [2024-06-11 06:31:25,553] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.31 | bwd_microstep: 1487.37 | bwd_inner_microstep: 1487.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2093 [2024-06-11 06:31:26,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.82 | bwd_microstep: 851.43 | bwd_inner_microstep: 851.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3544 [2024-06-11 06:31:28,660] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.36 | bwd_microstep: 1391.87 | bwd_inner_microstep: 1391.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2986 [2024-06-11 06:31:30,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 429.47 | bwd_microstep: 1141.11 | bwd_inner_microstep: 1141.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3439 [2024-06-11 06:31:32,017] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.33 | bwd_microstep: 1283.80 | bwd_inner_microstep: 1283.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 06:31:34,154] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.41 | bwd_microstep: 1551.70 | bwd_inner_microstep: 1551.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 06:31:36,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.97 | bwd_microstep: 1383.38 | bwd_inner_microstep: 1383.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2006 [2024-06-11 06:31:37,183] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.87 | bwd_microstep: 805.11 | bwd_inner_microstep: 805.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3451 [2024-06-11 06:31:39,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.04 | bwd_microstep: 1413.27 | bwd_inner_microstep: 1413.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3719 [2024-06-11 06:31:40,984] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.99 | bwd_microstep: 1338.20 | bwd_inner_microstep: 1338.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3468 [2024-06-11 06:31:42,634] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.98 | bwd_microstep: 1182.96 | bwd_inner_microstep: 1182.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 06:31:44,779] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.00 | bwd_microstep: 1557.00 | bwd_inner_microstep: 1556.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3668 [2024-06-11 06:31:46,883] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.38 | bwd_microstep: 1528.04 | bwd_inner_microstep: 1528.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3817 [2024-06-11 06:32:21,542] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.31 | optimizer_step: 6.62 [2024-06-11 06:32:21,543] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.41 | bwd_microstep: 34067.15 | bwd_inner_microstep: 1882.79 | bwd_allreduce_microstep: 32184.29 | step_microstep: 40.12 [2024-06-11 06:32:21,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15144.00 | bwd: 73084.52 | bwd_inner: 40899.19 | bwd_allreduce: 32184.59 | step: 41.65 {'loss': 1.138, 'learning_rate': 1.7044038465030553e-08, 'epoch': 0.99} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3418 [2024-06-11 06:32:23,530] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 532.15 | bwd_microstep: 1434.81 | bwd_inner_microstep: 1434.62 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.14 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3397 [2024-06-11 06:32:25,128] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.61 | bwd_microstep: 1149.99 | bwd_inner_microstep: 1149.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 06:32:27,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.74 | bwd_microstep: 1377.46 | bwd_inner_microstep: 1377.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3841 [2024-06-11 06:32:29,175] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.66 | bwd_microstep: 1553.96 | bwd_inner_microstep: 1553.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3767 [2024-06-11 06:32:31,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.76 | bwd_microstep: 1338.65 | bwd_inner_microstep: 1338.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2067 [2024-06-11 06:32:32,041] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.18 | bwd_microstep: 727.07 | bwd_inner_microstep: 727.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3422 [2024-06-11 06:32:33,772] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.47 | bwd_microstep: 1249.89 | bwd_inner_microstep: 1249.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3495 [2024-06-11 06:32:35,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.72 | bwd_microstep: 1382.65 | bwd_inner_microstep: 1382.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 06:32:37,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.10 | bwd_microstep: 1378.42 | bwd_inner_microstep: 1378.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3768 [2024-06-11 06:32:39,573] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.60 | bwd_microstep: 1436.83 | bwd_inner_microstep: 1436.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-11 06:32:40,680] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.88 | bwd_microstep: 796.98 | bwd_inner_microstep: 796.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3659 [2024-06-11 06:32:42,778] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.10 | bwd_microstep: 1523.73 | bwd_inner_microstep: 1523.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2159 [2024-06-11 06:32:43,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.31 | bwd_microstep: 852.84 | bwd_inner_microstep: 852.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1964 [2024-06-11 06:32:44,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 279.88 | bwd_microstep: 736.09 | bwd_inner_microstep: 736.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 2500 [2024-06-11 06:32:46,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 422.36 | bwd_microstep: 1147.41 | bwd_inner_microstep: 1147.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3534 [2024-06-11 06:32:48,624] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.33 | bwd_microstep: 1491.61 | bwd_inner_microstep: 1491.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3663 [2024-06-11 06:32:50,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.07 | bwd_microstep: 1425.23 | bwd_inner_microstep: 1425.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3833 [2024-06-11 06:32:52,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.49 | bwd_microstep: 1564.72 | bwd_inner_microstep: 1564.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1978 [2024-06-11 06:32:53,770] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.01 | bwd_microstep: 735.62 | bwd_inner_microstep: 735.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3534 [2024-06-11 06:32:55,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.30 | bwd_microstep: 1419.46 | bwd_inner_microstep: 1419.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3448 [2024-06-11 06:32:57,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.86 | bwd_microstep: 1315.14 | bwd_inner_microstep: 1315.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3582 [2024-06-11 06:32:59,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.55 | bwd_microstep: 1502.06 | bwd_inner_microstep: 1502.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 06:33:01,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.15 | bwd_microstep: 1558.08 | bwd_inner_microstep: 1558.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 06:33:03,976] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.48 | bwd_microstep: 1606.24 | bwd_inner_microstep: 1606.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-11 06:33:05,971] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.24 | bwd_microstep: 1445.39 | bwd_inner_microstep: 1445.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 06:33:08,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.39 | bwd_microstep: 1543.60 | bwd_inner_microstep: 1543.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2035 [2024-06-11 06:33:09,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 335.20 | bwd_microstep: 903.28 | bwd_inner_microstep: 903.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3437 [2024-06-11 06:33:11,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.82 | bwd_microstep: 1447.00 | bwd_inner_microstep: 1446.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3593 [2024-06-11 06:33:13,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.16 | bwd_microstep: 1603.52 | bwd_inner_microstep: 1603.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3577 [2024-06-11 06:33:15,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.66 | bwd_microstep: 1594.53 | bwd_inner_microstep: 1594.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3585 [2024-06-11 06:33:17,808] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.58 | bwd_microstep: 1498.74 | bwd_inner_microstep: 1498.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3656 [2024-06-11 06:33:21,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 06:33:21,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.33 | bwd_microstep: 3277.81 | bwd_inner_microstep: 1675.45 | bwd_allreduce_microstep: 1602.31 | step_microstep: 38.78 [2024-06-11 06:33:21,689] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15767.78 | bwd: 44018.87 | bwd_inner: 42415.51 | bwd_allreduce: 1602.62 | step: 40.37 {'loss': 1.1392, 'learning_rate': 1.552999137836908e-08, 'epoch': 0.99} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3453 [2024-06-11 06:33:23,558] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.48 | bwd_microstep: 1348.02 | bwd_inner_microstep: 1347.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3967 [2024-06-11 06:33:25,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.62 | bwd_microstep: 1698.29 | bwd_inner_microstep: 1698.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4291 [2024-06-11 06:33:28,336] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 657.44 | bwd_microstep: 1777.52 | bwd_inner_microstep: 1777.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3466 [2024-06-11 06:33:30,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.06 | bwd_microstep: 1381.07 | bwd_inner_microstep: 1381.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3464 [2024-06-11 06:33:32,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.07 | bwd_microstep: 1277.54 | bwd_inner_microstep: 1277.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3751 [2024-06-11 06:33:33,962] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.09 | bwd_microstep: 1400.53 | bwd_inner_microstep: 1400.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3558 [2024-06-11 06:33:36,034] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.21 | bwd_microstep: 1500.68 | bwd_inner_microstep: 1500.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3442 [2024-06-11 06:33:37,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.13 | bwd_microstep: 1324.52 | bwd_inner_microstep: 1324.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3489 [2024-06-11 06:33:39,783] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.67 | bwd_microstep: 1388.89 | bwd_inner_microstep: 1388.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 06:33:41,568] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.97 | bwd_microstep: 1285.29 | bwd_inner_microstep: 1285.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1954 [2024-06-11 06:33:42,671] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.72 | bwd_microstep: 795.13 | bwd_inner_microstep: 795.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1940 [2024-06-11 06:33:43,767] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.04 | bwd_microstep: 790.80 | bwd_inner_microstep: 790.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1902 [2024-06-11 06:33:44,723] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.82 | bwd_microstep: 684.69 | bwd_inner_microstep: 684.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3673 [2024-06-11 06:33:46,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 541.15 | bwd_microstep: 1447.99 | bwd_inner_microstep: 1447.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1950 [2024-06-11 06:33:47,903] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.56 | bwd_microstep: 854.36 | bwd_inner_microstep: 854.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3434 [2024-06-11 06:33:49,512] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.94 | bwd_microstep: 1157.68 | bwd_inner_microstep: 1157.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3621 [2024-06-11 06:33:51,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.95 | bwd_microstep: 1511.49 | bwd_inner_microstep: 1511.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3663 [2024-06-11 06:33:53,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.53 | bwd_microstep: 1522.14 | bwd_inner_microstep: 1522.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3443 [2024-06-11 06:33:55,428] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.96 | bwd_microstep: 1253.43 | bwd_inner_microstep: 1253.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1976 [2024-06-11 06:33:56,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.55 | bwd_microstep: 797.21 | bwd_inner_microstep: 797.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-11 06:33:58,274] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.79 | bwd_microstep: 1256.04 | bwd_inner_microstep: 1256.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3527 [2024-06-11 06:34:00,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.83 | bwd_microstep: 1395.71 | bwd_inner_microstep: 1395.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3531 [2024-06-11 06:34:02,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.55 | bwd_microstep: 1490.70 | bwd_inner_microstep: 1490.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1979 [2024-06-11 06:34:03,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 302.09 | bwd_microstep: 800.57 | bwd_inner_microstep: 800.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3442 [2024-06-11 06:34:05,110] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.01 | bwd_microstep: 1253.26 | bwd_inner_microstep: 1253.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 2276 [2024-06-11 06:34:06,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 374.26 | bwd_microstep: 1002.72 | bwd_inner_microstep: 1002.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3422 [2024-06-11 06:34:08,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 512.33 | bwd_microstep: 1374.04 | bwd_inner_microstep: 1374.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 06:34:10,296] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.26 | bwd_microstep: 1377.76 | bwd_inner_microstep: 1377.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3548 [2024-06-11 06:34:12,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.39 | bwd_microstep: 1587.08 | bwd_inner_microstep: 1587.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3580 [2024-06-11 06:34:14,686] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.02 | bwd_microstep: 1598.77 | bwd_inner_microstep: 1598.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3725 [2024-06-11 06:34:16,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.27 | bwd_microstep: 1515.42 | bwd_inner_microstep: 1515.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3814 [2024-06-11 06:34:21,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 17.10 | optimizer_gradients: 4.14 | optimizer_step: 6.59 [2024-06-11 06:34:21,722] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.09 | bwd_microstep: 4258.16 | bwd_inner_microstep: 2310.15 | bwd_allreduce_microstep: 1947.94 | step_microstep: 40.34 [2024-06-11 06:34:21,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15576.46 | bwd: 44107.49 | bwd_inner: 42158.63 | bwd_allreduce: 1948.18 | step: 41.95 {'loss': 1.196, 'learning_rate': 1.408632949118971e-08, 'epoch': 0.99} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3495 [2024-06-11 06:34:23,763] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.21 | bwd_microstep: 1474.94 | bwd_inner_microstep: 1474.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3467 [2024-06-11 06:34:25,537] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.67 | bwd_microstep: 1275.23 | bwd_inner_microstep: 1275.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3509 [2024-06-11 06:34:27,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.37 | bwd_microstep: 1386.69 | bwd_inner_microstep: 1386.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3843 [2024-06-11 06:34:29,601] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.69 | bwd_microstep: 1557.19 | bwd_inner_microstep: 1557.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3794 [2024-06-11 06:34:31,682] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.00 | bwd_microstep: 1507.53 | bwd_inner_microstep: 1507.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 06:34:33,588] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.41 | bwd_microstep: 1381.01 | bwd_inner_microstep: 1380.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1439 [2024-06-11 06:34:34,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 218.00 | bwd_microstep: 567.42 | bwd_inner_microstep: 567.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1971 [2024-06-11 06:34:35,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.85 | bwd_microstep: 794.00 | bwd_inner_microstep: 793.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3412 [2024-06-11 06:34:37,081] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.87 | bwd_microstep: 1152.53 | bwd_inner_microstep: 1152.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1948 [2024-06-11 06:34:38,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.19 | bwd_microstep: 700.57 | bwd_inner_microstep: 700.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3684 [2024-06-11 06:34:40,158] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.88 | bwd_microstep: 1522.11 | bwd_inner_microstep: 1522.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3666 [2024-06-11 06:34:42,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.96 | bwd_microstep: 1512.46 | bwd_inner_microstep: 1512.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3708 [2024-06-11 06:34:44,467] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 595.71 | bwd_microstep: 1617.71 | bwd_inner_microstep: 1617.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3649 [2024-06-11 06:34:46,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.35 | bwd_microstep: 1607.50 | bwd_inner_microstep: 1607.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3646 [2024-06-11 06:34:48,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.95 | bwd_microstep: 1676.62 | bwd_inner_microstep: 1676.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3520 [2024-06-11 06:34:50,931] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.65 | bwd_microstep: 1417.74 | bwd_inner_microstep: 1417.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2098 [2024-06-11 06:34:52,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 320.91 | bwd_microstep: 854.44 | bwd_inner_microstep: 854.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3488 [2024-06-11 06:34:53,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.65 | bwd_microstep: 1187.00 | bwd_inner_microstep: 1186.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3463 [2024-06-11 06:34:55,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.84 | bwd_microstep: 1311.61 | bwd_inner_microstep: 1311.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 06:34:57,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.27 | bwd_microstep: 1413.67 | bwd_inner_microstep: 1413.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3616 [2024-06-11 06:34:59,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.99 | bwd_microstep: 1611.16 | bwd_inner_microstep: 1611.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2151 [2024-06-11 06:35:00,805] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.36 | bwd_microstep: 759.90 | bwd_inner_microstep: 759.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3475 [2024-06-11 06:35:02,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.83 | bwd_microstep: 1281.38 | bwd_inner_microstep: 1281.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 4330 [2024-06-11 06:35:05,205] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 697.77 | bwd_microstep: 1904.53 | bwd_inner_microstep: 1904.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 1.38 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3819 [2024-06-11 06:35:07,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.65 | bwd_microstep: 1389.89 | bwd_inner_microstep: 1389.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3600 [2024-06-11 06:35:09,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.06 | bwd_microstep: 1505.97 | bwd_inner_microstep: 1505.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2012 [2024-06-11 06:35:10,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.63 | bwd_microstep: 833.56 | bwd_inner_microstep: 833.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2010 [2024-06-11 06:35:11,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.64 | bwd_microstep: 853.91 | bwd_inner_microstep: 853.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3807 [2024-06-11 06:35:14,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 673.16 | bwd_microstep: 1855.24 | bwd_inner_microstep: 1855.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3449 [2024-06-11 06:35:16,031] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.54 | bwd_microstep: 1417.49 | bwd_inner_microstep: 1417.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3582 [2024-06-11 06:35:18,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 594.75 | bwd_microstep: 1605.49 | bwd_inner_microstep: 1605.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2661 [2024-06-11 06:35:23,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.18 | optimizer_step: 6.63 [2024-06-11 06:35:23,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 415.25 | bwd_microstep: 4472.87 | bwd_inner_microstep: 1259.61 | bwd_allreduce_microstep: 3213.20 | step_microstep: 38.31 [2024-06-11 06:35:23,181] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15689.68 | bwd: 45409.38 | bwd_inner: 42195.27 | bwd_allreduce: 3213.42 | step: 41.17 {'loss': 1.1484, 'learning_rate': 1.2713057888060764e-08, 'epoch': 0.99} dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3540 [2024-06-11 06:35:24,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 463.87 | bwd_microstep: 1199.40 | bwd_inner_microstep: 1199.29 | bwd_allreduce_microstep: 0.07 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3414 [2024-06-11 06:35:26,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.26 | bwd_microstep: 1245.56 | bwd_inner_microstep: 1245.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-11 06:35:28,381] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.18 | bwd_microstep: 1295.45 | bwd_inner_microstep: 1295.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 4150 [2024-06-11 06:35:30,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.75 | bwd_microstep: 1504.35 | bwd_inner_microstep: 1504.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.16 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 06:35:32,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.90 | bwd_microstep: 1389.71 | bwd_inner_microstep: 1389.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1923 [2024-06-11 06:35:33,475] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.26 | bwd_microstep: 787.72 | bwd_inner_microstep: 787.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3774 [2024-06-11 06:35:36,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 669.71 | bwd_microstep: 1851.43 | bwd_inner_microstep: 1851.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-11 06:35:37,994] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.37 | bwd_microstep: 1438.37 | bwd_inner_microstep: 1438.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3471 [2024-06-11 06:35:39,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.87 | bwd_microstep: 1312.93 | bwd_inner_microstep: 1312.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-11 06:35:41,721] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.14 | bwd_microstep: 1382.03 | bwd_inner_microstep: 1382.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1954 [2024-06-11 06:35:42,901] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.22 | bwd_microstep: 851.61 | bwd_inner_microstep: 851.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-11 06:35:44,764] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.08 | bwd_microstep: 1350.48 | bwd_inner_microstep: 1350.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1906 [2024-06-11 06:35:45,969] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.85 | bwd_microstep: 874.39 | bwd_inner_microstep: 874.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3446 [2024-06-11 06:35:47,838] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.95 | bwd_microstep: 1353.96 | bwd_inner_microstep: 1353.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3631 [2024-06-11 06:35:49,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.08 | bwd_microstep: 1538.46 | bwd_inner_microstep: 1538.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1923 [2024-06-11 06:35:50,928] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.05 | bwd_microstep: 696.30 | bwd_inner_microstep: 696.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3630 [2024-06-11 06:35:53,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.38 | bwd_microstep: 1513.06 | bwd_inner_microstep: 1513.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1937 [2024-06-11 06:35:54,026] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.45 | bwd_microstep: 728.38 | bwd_inner_microstep: 728.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3826 [2024-06-11 06:35:56,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.71 | bwd_microstep: 1558.97 | bwd_inner_microstep: 1558.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3664 [2024-06-11 06:35:58,257] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.46 | bwd_microstep: 1510.56 | bwd_inner_microstep: 1510.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3529 [2024-06-11 06:36:00,185] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.85 | bwd_microstep: 1393.10 | bwd_inner_microstep: 1393.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2286 [2024-06-11 06:36:01,599] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 379.75 | bwd_microstep: 1023.78 | bwd_inner_microstep: 1023.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3610 [2024-06-11 06:36:03,679] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.59 | bwd_microstep: 1512.76 | bwd_inner_microstep: 1512.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 06:36:05,818] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.77 | bwd_microstep: 1552.85 | bwd_inner_microstep: 1552.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3464 [2024-06-11 06:36:07,756] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1401.69 | bwd_inner_microstep: 1401.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2289 [2024-06-11 06:36:09,101] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.25 | bwd_microstep: 972.00 | bwd_inner_microstep: 971.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2274 [2024-06-11 06:36:10,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.53 | bwd_microstep: 976.93 | bwd_inner_microstep: 976.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3809 [2024-06-11 06:36:12,364] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.53 | bwd_microstep: 1384.83 | bwd_inner_microstep: 1384.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2273 [2024-06-11 06:36:13,619] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.81 | bwd_microstep: 905.68 | bwd_inner_microstep: 905.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-11 06:36:15,873] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.83 | bwd_microstep: 1638.47 | bwd_inner_microstep: 1638.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3820 [2024-06-11 06:36:18,416] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 674.47 | bwd_microstep: 1857.93 | bwd_inner_microstep: 1857.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3751 [2024-06-11 06:36:26,303] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.29 | optimizer_step: 6.56 [2024-06-11 06:36:26,304] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.86 | bwd_microstep: 7236.60 | bwd_inner_microstep: 1857.53 | bwd_allreduce_microstep: 5379.02 | step_microstep: 38.86 [2024-06-11 06:36:26,307] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15535.61 | bwd: 47239.76 | bwd_inner: 41859.74 | bwd_allreduce: 5379.31 | step: 40.66 [29:53:29<23:44, 61.92s/it] 99%|█████████▊| 1703/1726 [29:53:29<23:44, 61.92s/it] 99%|█████████▊| 1704/1726 [29:54:58<25:38, 69.91s/it] 99%|█████████▊| 1704/1726 [29:54:58<25:38, 69.91s/it] 99%|█████████▉| 1705/1726 [29:55:58<23:26, 66.98s/it] 99%|█████████▉| 1705/1726 [29:55:58<23:26, 66.98s/it] 99%|█████████▉| 1706/1726 [29:56:58<21:37, 64.90s/it] 99%|█████████▉| 1706/1726 [29:56:58<21:37, 64.90s/it] 99%|█████████▉| 1707/1726 [29:57:59<20:13, 63.87s/it] 99%|█████████▉| 1707/1726 [29:57:59<20:13, 63.87s/it] 99%|█████████▉| 1708{'loss': 1.1576, 'learning_rate': 1.1410181405639986e-08, 'epoch': 0.99} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 06:36:28,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 550.58 | bwd_microstep: 1466.20 | bwd_inner_microstep: 1466.12 | bwd_allreduce_microstep: 0.03 | step_microstep: 0.09 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3474 [2024-06-11 06:36:30,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 508.06 | bwd_microstep: 1339.46 | bwd_inner_microstep: 1339.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 06:36:32,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.29 | bwd_microstep: 1550.42 | bwd_inner_microstep: 1550.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3813 [2024-06-11 06:36:34,598] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.62 | bwd_microstep: 1651.15 | bwd_inner_microstep: 1651.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3404 [2024-06-11 06:36:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.91 | bwd_microstep: 1341.04 | bwd_inner_microstep: 1341.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3491 [2024-06-11 06:36:38,360] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.74 | bwd_microstep: 1385.96 | bwd_inner_microstep: 1385.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2275 [2024-06-11 06:36:39,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.21 | bwd_microstep: 905.18 | bwd_inner_microstep: 905.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 06:36:41,346] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.75 | bwd_microstep: 1249.15 | bwd_inner_microstep: 1249.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2011 [2024-06-11 06:36:42,462] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.07 | bwd_microstep: 803.92 | bwd_inner_microstep: 803.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 06:36:44,240] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.73 | bwd_microstep: 1286.01 | bwd_inner_microstep: 1285.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3408 [2024-06-11 06:36:46,222] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.23 | bwd_microstep: 1438.32 | bwd_inner_microstep: 1438.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3501 [2024-06-11 06:36:48,263] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.37 | bwd_microstep: 1483.59 | bwd_inner_microstep: 1483.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3489 [2024-06-11 06:36:50,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.85 | bwd_microstep: 1487.38 | bwd_inner_microstep: 1487.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3434 [2024-06-11 06:36:52,170] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.73 | bwd_microstep: 1347.25 | bwd_inner_microstep: 1347.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3655 [2024-06-11 06:36:54,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.83 | bwd_microstep: 1451.70 | bwd_inner_microstep: 1451.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3417 [2024-06-11 06:36:56,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.44 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3630 [2024-06-11 06:36:58,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.81 | bwd_microstep: 1473.70 | bwd_inner_microstep: 1473.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3521 [2024-06-11 06:36:59,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.30 | bwd_microstep: 1322.88 | bwd_inner_microstep: 1322.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3457 [2024-06-11 06:37:01,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.18 | bwd_microstep: 1183.49 | bwd_inner_microstep: 1183.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3527 [2024-06-11 06:37:03,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.82 | bwd_microstep: 1293.16 | bwd_inner_microstep: 1293.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3704 [2024-06-11 06:37:05,361] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.82 | bwd_microstep: 1430.82 | bwd_inner_microstep: 1430.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 06:37:07,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.87 | bwd_microstep: 1394.49 | bwd_inner_microstep: 1394.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3832 [2024-06-11 06:37:09,562] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.46 | bwd_microstep: 1658.81 | bwd_inner_microstep: 1658.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3442 [2024-06-11 06:37:11,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.62 | bwd_microstep: 1286.72 | bwd_inner_microstep: 1286.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3557 [2024-06-11 06:37:13,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.52 | bwd_microstep: 1424.12 | bwd_inner_microstep: 1424.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3809 [2024-06-11 06:37:15,451] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.14 | bwd_microstep: 1551.61 | bwd_inner_microstep: 1551.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3491 [2024-06-11 06:37:17,237] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.74 | bwd_microstep: 1291.23 | bwd_inner_microstep: 1291.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3812 [2024-06-11 06:37:19,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.35 | bwd_microstep: 1455.78 | bwd_inner_microstep: 1455.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3577 [2024-06-11 06:37:21,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.20 | bwd_microstep: 1550.70 | bwd_inner_microstep: 1550.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1889 [2024-06-11 06:37:22,503] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.97 | bwd_microstep: 806.79 | bwd_inner_microstep: 806.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3820 [2024-06-11 06:37:24,678] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.95 | bwd_microstep: 1580.28 | bwd_inner_microstep: 1580.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3773 [2024-06-11 06:37:27,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.75 | optimizer_gradients: 4.08 | optimizer_step: 6.62 [2024-06-11 06:37:27,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 547.39 | bwd_microstep: 2391.36 | bwd_inner_microstep: 1771.35 | bwd_allreduce_microstep: 619.97 | step_microstep: 37.76 [2024-06-11 06:37:27,669] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16358.21 | bwd: 44659.98 | bwd_inner: 44039.05 | bwd_allreduce: 620.23 | step: 39.30 {'loss': 1.1595, 'learning_rate': 1.017770463264789e-08, 'epoch': 0.99} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3413 [2024-06-11 06:37:29,397] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.66 | bwd_microstep: 1241.98 | bwd_inner_microstep: 1241.83 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 1939 [2024-06-11 06:37:30,577] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 318.17 | bwd_microstep: 852.29 | bwd_inner_microstep: 852.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 06:37:32,352] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.38 | bwd_microstep: 1278.64 | bwd_inner_microstep: 1278.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3478 [2024-06-11 06:37:34,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.72 | bwd_microstep: 1278.29 | bwd_inner_microstep: 1278.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1933 [2024-06-11 06:37:35,224] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 297.80 | bwd_microstep: 790.48 | bwd_inner_microstep: 790.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3487 [2024-06-11 06:37:37,001] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.00 | bwd_microstep: 1280.61 | bwd_inner_microstep: 1280.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2069 [2024-06-11 06:37:38,136] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 308.24 | bwd_microstep: 818.68 | bwd_inner_microstep: 818.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 06:37:39,912] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.73 | bwd_microstep: 1283.51 | bwd_inner_microstep: 1283.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3475 [2024-06-11 06:37:41,955] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.02 | bwd_microstep: 1479.06 | bwd_inner_microstep: 1479.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 23, images per sample: 5.75, dynamic token length: 2675 [2024-06-11 06:37:43,453] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 405.95 | bwd_microstep: 1084.27 | bwd_inner_microstep: 1084.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3683 [2024-06-11 06:37:45,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 630.77 | bwd_microstep: 1719.30 | bwd_inner_microstep: 1719.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3481 [2024-06-11 06:37:47,899] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 563.60 | bwd_microstep: 1510.04 | bwd_inner_microstep: 1510.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 33, images per sample: 8.25, dynamic token length: 3636 [2024-06-11 06:37:49,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.48 | bwd_microstep: 1489.06 | bwd_inner_microstep: 1489.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3654 [2024-06-11 06:37:51,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.26 | bwd_microstep: 1382.05 | bwd_inner_microstep: 1382.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3390 [2024-06-11 06:37:53,708] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.70 | bwd_microstep: 1337.22 | bwd_inner_microstep: 1337.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3519 [2024-06-11 06:37:55,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.25 | bwd_microstep: 1388.95 | bwd_inner_microstep: 1388.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.15 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3460 [2024-06-11 06:37:57,450] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.92 | bwd_microstep: 1314.37 | bwd_inner_microstep: 1314.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3590 [2024-06-11 06:37:59,525] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.87 | bwd_microstep: 1506.82 | bwd_inner_microstep: 1506.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3513 [2024-06-11 06:38:01,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.87 | bwd_microstep: 1323.16 | bwd_inner_microstep: 1323.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3813 [2024-06-11 06:38:03,498] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.09 | bwd_microstep: 1556.05 | bwd_inner_microstep: 1556.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-11 06:38:05,280] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.39 | bwd_microstep: 1287.73 | bwd_inner_microstep: 1287.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 06:38:07,063] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.61 | bwd_microstep: 1284.14 | bwd_inner_microstep: 1284.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3575 [2024-06-11 06:38:08,872] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.36 | bwd_microstep: 1302.50 | bwd_inner_microstep: 1302.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3623 [2024-06-11 06:38:10,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.35 | bwd_microstep: 1346.40 | bwd_inner_microstep: 1346.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3475 [2024-06-11 06:38:12,389] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.60 | bwd_microstep: 1189.24 | bwd_inner_microstep: 1189.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1935 [2024-06-11 06:38:13,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 278.18 | bwd_microstep: 728.18 | bwd_inner_microstep: 728.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3473 [2024-06-11 06:38:15,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.22 | bwd_microstep: 1473.29 | bwd_inner_microstep: 1473.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3581 [2024-06-11 06:38:17,454] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 549.05 | bwd_microstep: 1455.14 | bwd_inner_microstep: 1455.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2614 [2024-06-11 06:38:18,986] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 413.35 | bwd_microstep: 1110.17 | bwd_inner_microstep: 1110.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3762 [2024-06-11 06:38:21,147] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.47 | bwd_microstep: 1572.29 | bwd_inner_microstep: 1572.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3379 [2024-06-11 06:38:23,120] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.06 | bwd_microstep: 1432.17 | bwd_inner_microstep: 1432.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3802 [2024-06-11 06:38:31,173] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.29 | optimizer_step: 6.63 [2024-06-11 06:38:31,174] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.56 | bwd_microstep: 7461.81 | bwd_inner_microstep: 1640.33 | bwd_allreduce_microstep: 5821.41 | step_microstep: 39.93 [2024-06-11 06:38:31,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15607.31 | bwd: 47557.93 | bwd_inner: 41735.48 | bwd_allreduce: 5821.70 | step: 41.64 {'loss': 1.1589, 'learning_rate': 9.015631909863321e-09, 'epoch': 0.99} dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411 [2024-06-11 06:38:32,941] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 480.16 | bwd_microstep: 1267.39 | bwd_inner_microstep: 1267.21 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.08 dynamic ViT batch size: 17, images per sample: 4.25, dynamic token length: 2713 [2024-06-11 06:38:34,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 380.75 | bwd_microstep: 999.61 | bwd_inner_microstep: 999.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3900 [2024-06-11 06:38:36,648] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 619.70 | bwd_microstep: 1688.79 | bwd_inner_microstep: 1688.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3430 [2024-06-11 06:38:38,378] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.04 | bwd_microstep: 1248.80 | bwd_inner_microstep: 1248.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2295 [2024-06-11 06:38:39,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.99 | bwd_microstep: 973.07 | bwd_inner_microstep: 973.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3502 [2024-06-11 06:38:41,644] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.83 | bwd_microstep: 1392.66 | bwd_inner_microstep: 1392.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-11 06:38:42,626] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 272.70 | bwd_microstep: 701.36 | bwd_inner_microstep: 701.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3426 [2024-06-11 06:38:44,226] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 441.16 | bwd_microstep: 1152.05 | bwd_inner_microstep: 1152.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3629 [2024-06-11 06:38:45,918] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.20 | bwd_microstep: 1218.16 | bwd_inner_microstep: 1218.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1968 [2024-06-11 06:38:47,021] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.16 | bwd_microstep: 795.75 | bwd_inner_microstep: 795.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 06:38:48,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.05 | bwd_microstep: 1251.45 | bwd_inner_microstep: 1251.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 6, images per sample: 1.5, dynamic token length: 932 [2024-06-11 06:38:49,281] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 145.24 | bwd_microstep: 376.64 | bwd_inner_microstep: 376.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 06:38:51,143] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.50 | bwd_microstep: 1349.60 | bwd_inner_microstep: 1349.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 2978 [2024-06-11 06:38:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 479.63 | bwd_microstep: 1295.68 | bwd_inner_microstep: 1295.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3737 [2024-06-11 06:38:55,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.49 | bwd_microstep: 1627.29 | bwd_inner_microstep: 1627.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3519 [2024-06-11 06:38:57,123] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.01 | bwd_microstep: 1420.98 | bwd_inner_microstep: 1420.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1975 [2024-06-11 06:38:58,146] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.21 | bwd_microstep: 733.62 | bwd_inner_microstep: 733.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3385 [2024-06-11 06:39:00,125] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 531.95 | bwd_microstep: 1437.96 | bwd_inner_microstep: 1437.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2091 [2024-06-11 06:39:01,398] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 341.70 | bwd_microstep: 923.01 | bwd_inner_microstep: 922.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3834 [2024-06-11 06:39:03,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 562.50 | bwd_microstep: 1510.65 | bwd_inner_microstep: 1510.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 06:39:05,390] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.23 | bwd_microstep: 1379.34 | bwd_inner_microstep: 1379.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3721 [2024-06-11 06:39:07,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.47 | bwd_microstep: 1441.42 | bwd_inner_microstep: 1441.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3490 [2024-06-11 06:39:09,032] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 454.00 | bwd_microstep: 1190.36 | bwd_inner_microstep: 1190.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-11 06:39:11,282] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.48 | bwd_microstep: 1639.25 | bwd_inner_microstep: 1639.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3669 [2024-06-11 06:39:13,121] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.63 | bwd_microstep: 1327.47 | bwd_inner_microstep: 1327.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3566 [2024-06-11 06:39:15,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.43 | bwd_microstep: 1504.31 | bwd_inner_microstep: 1504.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3726 [2024-06-11 06:39:17,316] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.09 | bwd_microstep: 1539.27 | bwd_inner_microstep: 1539.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3567 [2024-06-11 06:39:19,261] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.83 | bwd_microstep: 1406.02 | bwd_inner_microstep: 1405.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3819 [2024-06-11 06:39:21,357] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.09 | bwd_microstep: 1519.99 | bwd_inner_microstep: 1519.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3479 [2024-06-11 06:39:23,267] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.43 | bwd_microstep: 1380.37 | bwd_inner_microstep: 1380.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3473 [2024-06-11 06:39:25,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.51 | bwd_microstep: 1429.78 | bwd_inner_microstep: 1429.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2194 [2024-06-11 06:39:32,074] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-11 06:39:32,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 388.33 | bwd_microstep: 6396.14 | bwd_inner_microstep: 1196.71 | bwd_allreduce_microstep: 5199.37 | step_microstep: 39.48 [2024-06-11 06:39:32,078] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15043.12 | bwd: 45518.23 | bwd_inner: 40317.84 | bwd_allreduce: 5199.66 | step: 41.10 {'loss': 1.1577, 'learning_rate': 7.923967330099036e-09, 'epoch': 0.99} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-11 06:39:34,119] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.86 | bwd_microstep: 1472.61 | bwd_inner_microstep: 1472.51 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.11 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4009 [2024-06-11 06:39:36,248] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.95 | bwd_microstep: 1542.92 | bwd_inner_microstep: 1542.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3898 [2024-06-11 06:39:38,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.15 | bwd_microstep: 1585.66 | bwd_inner_microstep: 1585.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 4272 [2024-06-11 06:39:40,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.93 | bwd_microstep: 1667.17 | bwd_inner_microstep: 1667.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3772 [2024-06-11 06:39:42,985] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 601.15 | bwd_microstep: 1637.88 | bwd_inner_microstep: 1637.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3405 [2024-06-11 06:39:44,707] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.60 | bwd_microstep: 1241.77 | bwd_inner_microstep: 1241.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3607 [2024-06-11 06:39:46,657] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.82 | bwd_microstep: 1413.40 | bwd_inner_microstep: 1413.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1916 [2024-06-11 06:39:47,614] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 262.77 | bwd_microstep: 686.31 | bwd_inner_microstep: 686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3469 [2024-06-11 06:39:49,387] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.69 | bwd_microstep: 1277.07 | bwd_inner_microstep: 1277.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3487 [2024-06-11 06:39:51,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1383.34 | bwd_inner_microstep: 1383.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3927 [2024-06-11 06:39:53,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.65 | bwd_microstep: 1686.31 | bwd_inner_microstep: 1686.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3421 [2024-06-11 06:39:55,477] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.94 | bwd_microstep: 1341.99 | bwd_inner_microstep: 1341.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3515 [2024-06-11 06:39:57,563] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.80 | bwd_microstep: 1517.34 | bwd_inner_microstep: 1517.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3632 [2024-06-11 06:39:59,632] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 556.56 | bwd_microstep: 1502.62 | bwd_inner_microstep: 1502.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3443 [2024-06-11 06:40:01,630] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.98 | bwd_microstep: 1448.48 | bwd_inner_microstep: 1448.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3480 [2024-06-11 06:40:03,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 572.36 | bwd_microstep: 1537.77 | bwd_inner_microstep: 1537.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3817 [2024-06-11 06:40:05,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.59 | bwd_microstep: 1550.38 | bwd_inner_microstep: 1550.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2089 [2024-06-11 06:40:07,153] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.31 | bwd_microstep: 914.51 | bwd_inner_microstep: 914.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-11 06:40:09,415] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.45 | bwd_microstep: 1646.71 | bwd_inner_microstep: 1646.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3827 [2024-06-11 06:40:11,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.14 | bwd_microstep: 1648.61 | bwd_inner_microstep: 1648.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 06:40:13,418] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.58 | bwd_microstep: 1254.65 | bwd_inner_microstep: 1254.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3830 [2024-06-11 06:40:15,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.86 | bwd_microstep: 1388.98 | bwd_inner_microstep: 1388.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2004 [2024-06-11 06:40:16,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.17 | bwd_microstep: 708.42 | bwd_inner_microstep: 708.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3485 [2024-06-11 06:40:18,238] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.88 | bwd_microstep: 1379.52 | bwd_inner_microstep: 1379.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 2658 [2024-06-11 06:40:19,524] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 354.11 | bwd_microstep: 924.49 | bwd_inner_microstep: 924.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3809 [2024-06-11 06:40:21,693] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 584.06 | bwd_microstep: 1574.92 | bwd_inner_microstep: 1574.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3594 [2024-06-11 06:40:23,635] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.00 | bwd_microstep: 1405.81 | bwd_inner_microstep: 1405.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3779 [2024-06-11 06:40:25,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.73 | bwd_microstep: 1549.34 | bwd_inner_microstep: 1549.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3553 [2024-06-11 06:40:27,616] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.29 | bwd_microstep: 1331.11 | bwd_inner_microstep: 1331.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3827 [2024-06-11 06:40:29,762] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.77 | bwd_microstep: 1558.55 | bwd_inner_microstep: 1558.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3796 [2024-06-11 06:40:31,900] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.79 | bwd_microstep: 1552.76 | bwd_inner_microstep: 1552.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3489 [2024-06-11 06:40:33,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.71 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 06:40:33,858] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.27 | bwd_microstep: 1444.30 | bwd_inner_microstep: 1345.52 | bwd_allreduce_microstep: 98.73 | step_microstep: 37.48 [2024-06-11 06:40:33,861] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16657.48 | bwd: 44775.71 | bwd_inner: 44675.99 | bwd_allreduce: 99.01 | step: 39.11 {'loss': 1.1325, 'learning_rate': 6.902714738192817e-09, 'epoch': 0.99} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3402 [2024-06-11 06:40:35,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.84 | bwd_microstep: 1338.53 | bwd_inner_microstep: 1338.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.10 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3954 [2024-06-11 06:40:37,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.40 | bwd_microstep: 1401.63 | bwd_inner_microstep: 1401.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3417 [2024-06-11 06:40:39,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.16 | bwd_microstep: 1247.85 | bwd_inner_microstep: 1247.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2301 [2024-06-11 06:40:40,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 349.22 | bwd_microstep: 929.10 | bwd_inner_microstep: 929.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3775 [2024-06-11 06:40:42,702] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.48 | bwd_microstep: 1472.96 | bwd_inner_microstep: 1472.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3734 [2024-06-11 06:40:44,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.80 | bwd_microstep: 1634.10 | bwd_inner_microstep: 1634.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3407 [2024-06-11 06:40:46,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.83 | bwd_microstep: 1311.34 | bwd_inner_microstep: 1311.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-11 06:40:48,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.54 | bwd_microstep: 1288.24 | bwd_inner_microstep: 1288.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 06:40:50,328] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.36 | bwd_microstep: 1284.22 | bwd_inner_microstep: 1284.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1985 [2024-06-11 06:40:51,358] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.97 | bwd_microstep: 740.27 | bwd_inner_microstep: 740.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3763 [2024-06-11 06:40:53,345] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.07 | bwd_microstep: 1440.00 | bwd_inner_microstep: 1439.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504 [2024-06-11 06:40:55,177] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.65 | bwd_microstep: 1324.49 | bwd_inner_microstep: 1324.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3485 [2024-06-11 06:40:57,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.82 | bwd_microstep: 1316.31 | bwd_inner_microstep: 1316.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2080 [2024-06-11 06:40:58,270] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 340.88 | bwd_microstep: 919.07 | bwd_inner_microstep: 919.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3428 [2024-06-11 06:41:00,067] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.63 | bwd_microstep: 1299.84 | bwd_inner_microstep: 1299.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-11 06:41:02,069] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.27 | bwd_microstep: 1453.54 | bwd_inner_microstep: 1453.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 06:41:03,978] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.95 | bwd_microstep: 1384.24 | bwd_inner_microstep: 1384.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3650 [2024-06-11 06:41:05,937] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.18 | bwd_microstep: 1419.28 | bwd_inner_microstep: 1419.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3635 [2024-06-11 06:41:08,104] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.45 | bwd_microstep: 1576.93 | bwd_inner_microstep: 1576.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3513 [2024-06-11 06:41:10,244] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.68 | bwd_microstep: 1558.25 | bwd_inner_microstep: 1558.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3828 [2024-06-11 06:41:12,527] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.64 | bwd_microstep: 1662.18 | bwd_inner_microstep: 1662.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 06:41:14,269] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.57 | bwd_microstep: 1257.91 | bwd_inner_microstep: 1257.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3437 [2024-06-11 06:41:16,004] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.39 | bwd_microstep: 1254.03 | bwd_inner_microstep: 1254.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3565 [2024-06-11 06:41:17,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 530.73 | bwd_microstep: 1404.80 | bwd_inner_microstep: 1404.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2033 [2024-06-11 06:41:19,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.64 | bwd_microstep: 840.05 | bwd_inner_microstep: 840.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3432 [2024-06-11 06:41:21,197] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.19 | bwd_microstep: 1515.16 | bwd_inner_microstep: 1515.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3815 [2024-06-11 06:41:23,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1558.62 | bwd_inner_microstep: 1558.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3496 [2024-06-11 06:41:25,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.70 | bwd_microstep: 1290.48 | bwd_inner_microstep: 1290.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3455 [2024-06-11 06:41:27,064] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.89 | bwd_microstep: 1404.17 | bwd_inner_microstep: 1404.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2074 [2024-06-11 06:41:28,458] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 371.52 | bwd_microstep: 1011.98 | bwd_inner_microstep: 1011.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2047 [2024-06-11 06:41:29,839] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 1006.09 | bwd_inner_microstep: 1006.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2283 [2024-06-11 06:41:35,379] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.69 | optimizer_gradients: 4.11 | optimizer_step: 6.59 [2024-06-11 06:41:35,380] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 364.32 | bwd_microstep: 5130.93 | bwd_inner_microstep: 1104.33 | bwd_allreduce_microstep: 4026.54 | step_microstep: 38.34 [2024-06-11 06:41:35,383] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15499.50 | bwd: 45676.61 | bwd_inner: 41649.13 | bwd_allreduce: 4026.77 | step: 39.96 /1726 [29:59:03<19:05, 63.64s/it] 99%|█████████▉| 1708/1726 [29:59:03<19:05, 63.64s/it] 99%|█████████▉| 1709/1726 [30:00:04<17:50, 62.96s/it] 99%|█████████▉| 1709/1726 [30:00:04<17:50, 62.96s/it] 99%|█████████▉| 1710/1726 [30:01:07<16:49, 63.12s/it] 99%|█████████▉| 1710/1726 [30:01:07<16:49, 63.12s/it] 99%|█████████▉| 1711/1726 [30:02:08<15:36, 62.46s/it] 99%|█████████▉| 1711/1726 [30:02:08<15:36, 62.46s/it] 99%|█████████▉| 1712/1726 [30:03:10<14:31, 62.25s/it] 99%|█████████▉| 1712/1726 [30:03:10<14:31, 62.25s/it] 99%|█████████▉{'loss': 1.1937, 'learning_rate': 5.951877730991928e-09, 'epoch': 0.99} dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3492 [2024-06-11 06:41:37,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.07 | bwd_microstep: 1575.53 | bwd_inner_microstep: 1575.31 | bwd_allreduce_microstep: 0.09 | step_microstep: 0.16 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3542 [2024-06-11 06:41:39,344] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 493.16 | bwd_microstep: 1290.18 | bwd_inner_microstep: 1290.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 06:41:41,385] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.87 | bwd_microstep: 1478.00 | bwd_inner_microstep: 1477.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3578 [2024-06-11 06:41:43,195] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.67 | bwd_microstep: 1303.80 | bwd_inner_microstep: 1303.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3483 [2024-06-11 06:41:45,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.29 | bwd_microstep: 1384.91 | bwd_inner_microstep: 1384.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 4141 [2024-06-11 06:41:47,277] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.06 | bwd_microstep: 1569.36 | bwd_inner_microstep: 1569.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3488 [2024-06-11 06:41:49,060] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.86 | bwd_microstep: 1284.60 | bwd_inner_microstep: 1284.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3984 [2024-06-11 06:41:51,139] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 564.59 | bwd_microstep: 1505.28 | bwd_inner_microstep: 1505.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3600 [2024-06-11 06:41:52,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.76 | bwd_microstep: 1314.44 | bwd_inner_microstep: 1314.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3482 [2024-06-11 06:41:55,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.42 | bwd_microstep: 1530.50 | bwd_inner_microstep: 1530.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 06:41:56,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.56 | bwd_microstep: 1248.64 | bwd_inner_microstep: 1248.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 06:41:58,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.67 | bwd_microstep: 1357.91 | bwd_inner_microstep: 1357.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3544 [2024-06-11 06:42:00,735] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.29 | bwd_microstep: 1493.13 | bwd_inner_microstep: 1493.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2077 [2024-06-11 06:42:01,877] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.51 | bwd_microstep: 823.43 | bwd_inner_microstep: 823.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3454 [2024-06-11 06:42:03,739] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.46 | bwd_microstep: 1349.15 | bwd_inner_microstep: 1349.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3679 [2024-06-11 06:42:05,843] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.83 | bwd_microstep: 1527.97 | bwd_inner_microstep: 1527.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2102 [2024-06-11 06:42:07,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 342.11 | bwd_microstep: 921.12 | bwd_inner_microstep: 921.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3446 [2024-06-11 06:42:08,854] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.95 | bwd_microstep: 1255.98 | bwd_inner_microstep: 1255.96 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3448 [2024-06-11 06:42:10,716] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 504.81 | bwd_microstep: 1348.45 | bwd_inner_microstep: 1348.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2091 [2024-06-11 06:42:12,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 372.80 | bwd_microstep: 1014.70 | bwd_inner_microstep: 1014.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3818 [2024-06-11 06:42:14,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 593.38 | bwd_microstep: 1602.34 | bwd_inner_microstep: 1602.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 06:42:16,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.41 | bwd_microstep: 1256.31 | bwd_inner_microstep: 1256.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3774 [2024-06-11 06:42:18,184] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.65 | bwd_microstep: 1545.40 | bwd_inner_microstep: 1545.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3814 [2024-06-11 06:42:20,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.31 | bwd_microstep: 1545.18 | bwd_inner_microstep: 1545.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3827 [2024-06-11 06:42:22,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 599.85 | bwd_microstep: 1622.45 | bwd_inner_microstep: 1622.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3580 [2024-06-11 06:42:24,438] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.81 | bwd_microstep: 1364.17 | bwd_inner_microstep: 1364.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3551 [2024-06-11 06:42:26,404] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1421.54 | bwd_inner_microstep: 1421.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2242 [2024-06-11 06:42:27,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.70 | bwd_microstep: 969.72 | bwd_inner_microstep: 969.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2008 [2024-06-11 06:42:28,752] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.52 | bwd_microstep: 721.14 | bwd_inner_microstep: 721.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3722 [2024-06-11 06:42:31,005] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 602.45 | bwd_microstep: 1640.48 | bwd_inner_microstep: 1640.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3771 [2024-06-11 06:42:33,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.16 | bwd_microstep: 1637.27 | bwd_inner_microstep: 1637.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 2706 [2024-06-11 06:42:39,401] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.16 | optimizer_step: 6.62 [2024-06-11 06:42:39,402] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 393.97 | bwd_microstep: 5703.13 | bwd_inner_microstep: 1185.79 | bwd_allreduce_microstep: 4517.28 | step_microstep: 38.99 [2024-06-11 06:42:39,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16071.09 | bwd: 47606.22 | bwd_inner: 43087.85 | bwd_allreduce: 4517.60 | step: 40.56 {'loss': 1.1631, 'learning_rate': 5.071459657339794e-09, 'epoch': 0.99} dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3457 [2024-06-11 06:42:41,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.26 | bwd_microstep: 1464.87 | bwd_inner_microstep: 1464.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3476 [2024-06-11 06:42:43,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.27 | bwd_microstep: 1281.23 | bwd_inner_microstep: 1281.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3390 [2024-06-11 06:42:44,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.72 | bwd_microstep: 1238.11 | bwd_inner_microstep: 1238.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2396 [2024-06-11 06:42:46,314] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 377.04 | bwd_microstep: 999.49 | bwd_inner_microstep: 999.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 06:42:48,039] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.50 | bwd_microstep: 1244.95 | bwd_inner_microstep: 1244.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 06:42:49,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.33 | bwd_microstep: 1382.42 | bwd_inner_microstep: 1382.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3733 [2024-06-11 06:42:51,927] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.76 | bwd_microstep: 1428.75 | bwd_inner_microstep: 1428.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 06:42:53,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.91 | bwd_microstep: 1277.31 | bwd_inner_microstep: 1277.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1881 [2024-06-11 06:42:54,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 281.39 | bwd_microstep: 742.33 | bwd_inner_microstep: 742.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 1900 [2024-06-11 06:42:55,932] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 321.34 | bwd_microstep: 870.68 | bwd_inner_microstep: 870.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3606 [2024-06-11 06:42:57,871] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.55 | bwd_microstep: 1404.41 | bwd_inner_microstep: 1404.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2558 [2024-06-11 06:42:59,085] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.61 | bwd_microstep: 872.63 | bwd_inner_microstep: 872.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3455 [2024-06-11 06:43:00,865] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.32 | bwd_microstep: 1287.26 | bwd_inner_microstep: 1287.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3504 [2024-06-11 06:43:02,684] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 494.64 | bwd_microstep: 1315.09 | bwd_inner_microstep: 1315.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3655 [2024-06-11 06:43:04,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.71 | bwd_microstep: 1651.66 | bwd_inner_microstep: 1651.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3824 [2024-06-11 06:43:07,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.19 | bwd_microstep: 1555.20 | bwd_inner_microstep: 1555.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3505 [2024-06-11 06:43:08,749] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 455.48 | bwd_microstep: 1191.50 | bwd_inner_microstep: 1191.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3518 [2024-06-11 06:43:10,672] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.04 | bwd_microstep: 1394.53 | bwd_inner_microstep: 1394.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 06:43:12,411] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.11 | bwd_microstep: 1256.39 | bwd_inner_microstep: 1256.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3831 [2024-06-11 06:43:14,557] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.49 | bwd_microstep: 1559.23 | bwd_inner_microstep: 1559.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1978 [2024-06-11 06:43:15,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.46 | bwd_microstep: 797.35 | bwd_inner_microstep: 797.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3893 [2024-06-11 06:43:17,987] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 623.00 | bwd_microstep: 1691.66 | bwd_inner_microstep: 1691.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3607 [2024-06-11 06:43:20,070] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.79 | bwd_microstep: 1511.18 | bwd_inner_microstep: 1511.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3570 [2024-06-11 06:43:21,880] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.75 | bwd_microstep: 1302.28 | bwd_inner_microstep: 1302.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3824 [2024-06-11 06:43:24,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.69 | bwd_microstep: 1580.67 | bwd_inner_microstep: 1580.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3601 [2024-06-11 06:43:25,949] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.31 | bwd_microstep: 1369.27 | bwd_inner_microstep: 1369.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3823 [2024-06-11 06:43:28,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 609.69 | bwd_microstep: 1649.33 | bwd_inner_microstep: 1649.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3620 [2024-06-11 06:43:30,424] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1606.19 | bwd_inner_microstep: 1606.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2260 [2024-06-11 06:43:31,766] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 362.81 | bwd_microstep: 970.55 | bwd_inner_microstep: 970.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3820 [2024-06-11 06:43:34,172] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 639.96 | bwd_microstep: 1755.96 | bwd_inner_microstep: 1755.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 2220 [2024-06-11 06:43:35,567] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 373.91 | bwd_microstep: 1011.74 | bwd_inner_microstep: 1011.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2212 [2024-06-11 06:43:40,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.66 | optimizer_gradients: 4.28 | optimizer_step: 6.61 [2024-06-11 06:43:40,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.75 | bwd_microstep: 4153.89 | bwd_inner_microstep: 982.58 | bwd_allreduce_microstep: 3171.23 | step_microstep: 40.21 [2024-06-11 06:43:40,098] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15532.95 | bwd: 44818.14 | bwd_inner: 41645.97 | bwd_allreduce: 3171.47 | step: 41.72 {'loss': 1.2006, 'learning_rate': 4.261463618062678e-09, 'epoch': 0.99} dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3576 [2024-06-11 06:43:41,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.29 | bwd_microstep: 1351.32 | bwd_inner_microstep: 1351.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3915 [2024-06-11 06:43:44,045] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.63 | bwd_microstep: 1494.53 | bwd_inner_microstep: 1494.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3470 [2024-06-11 06:43:46,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.04 | bwd_microstep: 1480.06 | bwd_inner_microstep: 1480.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 4107 [2024-06-11 06:43:48,302] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.01 | bwd_microstep: 1599.83 | bwd_inner_microstep: 1599.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3782 [2024-06-11 06:43:50,434] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 575.62 | bwd_microstep: 1545.79 | bwd_inner_microstep: 1545.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3517 [2024-06-11 06:43:52,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.65 | bwd_microstep: 1392.03 | bwd_inner_microstep: 1392.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-11 06:43:53,464] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.23 | bwd_microstep: 800.61 | bwd_inner_microstep: 800.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1960 [2024-06-11 06:43:54,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 301.04 | bwd_microstep: 796.35 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3504 [2024-06-11 06:43:56,354] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 485.76 | bwd_microstep: 1287.81 | bwd_inner_microstep: 1287.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3599 [2024-06-11 06:43:58,463] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.55 | bwd_microstep: 1532.81 | bwd_inner_microstep: 1532.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3674 [2024-06-11 06:44:00,507] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.72 | bwd_microstep: 1481.38 | bwd_inner_microstep: 1481.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 1910 [2024-06-11 06:44:01,506] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 273.32 | bwd_microstep: 717.88 | bwd_inner_microstep: 717.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3513 [2024-06-11 06:44:03,414] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.44 | bwd_microstep: 1382.54 | bwd_inner_microstep: 1382.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3649 [2024-06-11 06:44:05,572] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 580.19 | bwd_microstep: 1566.12 | bwd_inner_microstep: 1566.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3384 [2024-06-11 06:44:07,422] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.49 | bwd_microstep: 1341.33 | bwd_inner_microstep: 1341.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3944 [2024-06-11 06:44:09,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 610.26 | bwd_microstep: 1656.41 | bwd_inner_microstep: 1656.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3825 [2024-06-11 06:44:11,841] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.62 | bwd_microstep: 1554.58 | bwd_inner_microstep: 1554.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3523 [2024-06-11 06:44:13,774] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.07 | bwd_microstep: 1395.66 | bwd_inner_microstep: 1395.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3469 [2024-06-11 06:44:15,681] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1377.27 | bwd_inner_microstep: 1377.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3682 [2024-06-11 06:44:17,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 469.56 | bwd_microstep: 1234.49 | bwd_inner_microstep: 1234.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2276 [2024-06-11 06:44:18,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.85 | bwd_microstep: 879.41 | bwd_inner_microstep: 879.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3675 [2024-06-11 06:44:20,583] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.75 | bwd_microstep: 1427.36 | bwd_inner_microstep: 1427.34 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2012 [2024-06-11 06:44:21,576] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 274.28 | bwd_microstep: 711.32 | bwd_inner_microstep: 711.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3604 [2024-06-11 06:44:23,655] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.03 | bwd_microstep: 1511.22 | bwd_inner_microstep: 1511.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 2062 [2024-06-11 06:44:24,668] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.25 | bwd_microstep: 724.82 | bwd_inner_microstep: 724.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2046 [2024-06-11 06:44:25,793] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 305.12 | bwd_microstep: 812.62 | bwd_inner_microstep: 812.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3421 [2024-06-11 06:44:27,439] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 452.42 | bwd_microstep: 1184.59 | bwd_inner_microstep: 1184.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2273 [2024-06-11 06:44:28,615] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 845.13 | bwd_inner_microstep: 845.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3436 [2024-06-11 06:44:30,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.08 | bwd_microstep: 1354.92 | bwd_inner_microstep: 1354.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3559 [2024-06-11 06:44:32,417] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.20 | bwd_microstep: 1396.11 | bwd_inner_microstep: 1396.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3583 [2024-06-11 06:44:34,753] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 625.96 | bwd_microstep: 1699.27 | bwd_inner_microstep: 1699.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3465 [2024-06-11 06:44:39,950] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.22 | optimizer_step: 6.58 [2024-06-11 06:44:39,951] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.03 | bwd_microstep: 4660.72 | bwd_inner_microstep: 1435.58 | bwd_allreduce_microstep: 3225.07 | step_microstep: 39.16 [2024-06-11 06:44:39,954] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15316.14 | bwd: 44196.30 | bwd_inner: 40970.30 | bwd_allreduce: 3225.31 | step: 40.70 {'loss': 1.168, 'learning_rate': 3.5218924659607966e-09, 'epoch': 0.99} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3464 [2024-06-11 06:44:41,850] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.63 | bwd_microstep: 1363.31 | bwd_inner_microstep: 1363.13 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3484 [2024-06-11 06:44:43,892] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.51 | bwd_microstep: 1478.52 | bwd_inner_microstep: 1478.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3899 [2024-06-11 06:44:46,075] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.12 | bwd_microstep: 1585.41 | bwd_inner_microstep: 1585.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3417 [2024-06-11 06:44:47,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.55 | bwd_microstep: 1294.27 | bwd_inner_microstep: 1294.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3750 [2024-06-11 06:44:49,726] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.84 | bwd_microstep: 1343.46 | bwd_inner_microstep: 1343.43 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3488 [2024-06-11 06:44:51,775] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.25 | bwd_microstep: 1485.29 | bwd_inner_microstep: 1485.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-11 06:44:53,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.68 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3720 [2024-06-11 06:44:55,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.48 | bwd_microstep: 1494.91 | bwd_inner_microstep: 1494.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-11 06:44:56,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.88 | bwd_microstep: 698.98 | bwd_inner_microstep: 698.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:44:58,472] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.47 | bwd_microstep: 1391.38 | bwd_inner_microstep: 1391.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3449 [2024-06-11 06:45:00,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.70 | bwd_microstep: 1359.82 | bwd_inner_microstep: 1359.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3680 [2024-06-11 06:45:02,250] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.51 | bwd_microstep: 1374.25 | bwd_inner_microstep: 1374.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3464 [2024-06-11 06:45:04,219] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.77 | bwd_microstep: 1420.52 | bwd_inner_microstep: 1420.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3431 [2024-06-11 06:45:06,207] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.50 | bwd_microstep: 1443.50 | bwd_inner_microstep: 1443.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3646 [2024-06-11 06:45:08,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.05 | bwd_microstep: 1545.33 | bwd_inner_microstep: 1545.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2088 [2024-06-11 06:45:09,473] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 311.38 | bwd_microstep: 822.35 | bwd_inner_microstep: 822.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3652 [2024-06-11 06:45:11,443] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 533.79 | bwd_microstep: 1426.32 | bwd_inner_microstep: 1426.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-11 06:45:12,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.92 | bwd_microstep: 796.35 | bwd_inner_microstep: 796.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2072 [2024-06-11 06:45:13,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 339.69 | bwd_microstep: 919.51 | bwd_inner_microstep: 919.48 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3591 [2024-06-11 06:45:15,887] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 557.61 | bwd_microstep: 1505.04 | bwd_inner_microstep: 1505.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2122 [2024-06-11 06:45:17,084] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.63 | bwd_microstep: 862.62 | bwd_inner_microstep: 862.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3615 [2024-06-11 06:45:18,902] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.18 | bwd_microstep: 1313.36 | bwd_inner_microstep: 1313.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3676 [2024-06-11 06:45:21,040] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 574.44 | bwd_microstep: 1553.90 | bwd_inner_microstep: 1553.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2280 [2024-06-11 06:45:22,301] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 343.67 | bwd_microstep: 907.63 | bwd_inner_microstep: 907.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 2166 [2024-06-11 06:45:23,742] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 384.03 | bwd_microstep: 1047.52 | bwd_inner_microstep: 1047.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3432 [2024-06-11 06:45:25,602] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.69 | bwd_microstep: 1348.71 | bwd_inner_microstep: 1348.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3561 [2024-06-11 06:45:27,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.69 | bwd_microstep: 1396.13 | bwd_inner_microstep: 1396.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3551 [2024-06-11 06:45:29,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.43 | bwd_microstep: 1447.43 | bwd_inner_microstep: 1447.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3579 [2024-06-11 06:45:31,642] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.11 | bwd_microstep: 1526.98 | bwd_inner_microstep: 1526.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2020 [2024-06-11 06:45:32,803] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 315.38 | bwd_microstep: 837.57 | bwd_inner_microstep: 837.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3767 [2024-06-11 06:45:35,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 603.45 | bwd_microstep: 1644.27 | bwd_inner_microstep: 1644.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3458 [2024-06-11 06:45:40,481] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.72 | optimizer_gradients: 4.31 | optimizer_step: 6.60 [2024-06-11 06:45:40,482] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 457.12 | bwd_microstep: 4916.54 | bwd_inner_microstep: 1324.99 | bwd_allreduce_microstep: 3591.49 | step_microstep: 39.71 [2024-06-11 06:45:40,485] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15385.75 | bwd: 44801.06 | bwd_inner: 41208.50 | bwd_allreduce: 3591.80 | step: 41.26 {'loss': 1.1652, 'learning_rate': 2.8527488058038844e-09, 'epoch': 0.99} dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1412 [2024-06-11 06:45:41,273] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 215.96 | bwd_microstep: 558.41 | bwd_inner_microstep: 558.29 | bwd_allreduce_microstep: 0.05 | step_microstep: 0.09 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3462 [2024-06-11 06:45:43,179] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.67 | bwd_microstep: 1376.17 | bwd_inner_microstep: 1376.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3471 [2024-06-11 06:45:45,095] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.71 | bwd_microstep: 1384.60 | bwd_inner_microstep: 1384.57 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3468 [2024-06-11 06:45:47,000] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.62 | bwd_microstep: 1374.68 | bwd_inner_microstep: 1374.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3855 [2024-06-11 06:45:48,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.59 | bwd_microstep: 1367.22 | bwd_inner_microstep: 1367.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3484 [2024-06-11 06:45:50,675] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.09 | bwd_microstep: 1279.77 | bwd_inner_microstep: 1279.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3560 [2024-06-11 06:45:52,607] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.48 | bwd_microstep: 1395.77 | bwd_inner_microstep: 1395.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3726 [2024-06-11 06:45:54,591] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.70 | bwd_microstep: 1438.64 | bwd_inner_microstep: 1438.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3449 [2024-06-11 06:45:56,331] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.93 | bwd_microstep: 1257.95 | bwd_inner_microstep: 1257.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3707 [2024-06-11 06:45:58,440] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 569.34 | bwd_microstep: 1530.56 | bwd_inner_microstep: 1530.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 9, images per sample: 2.25, dynamic token length: 1253 [2024-06-11 06:45:59,141] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 191.20 | bwd_microstep: 502.23 | bwd_inner_microstep: 502.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3672 [2024-06-11 06:46:01,499] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.06 | bwd_microstep: 1720.66 | bwd_inner_microstep: 1720.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3642 [2024-06-11 06:46:03,714] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 591.29 | bwd_microstep: 1611.41 | bwd_inner_microstep: 1611.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3690 [2024-06-11 06:46:05,816] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.40 | bwd_microstep: 1526.28 | bwd_inner_microstep: 1526.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1994 [2024-06-11 06:46:06,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 312.72 | bwd_microstep: 829.00 | bwd_inner_microstep: 828.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3850 [2024-06-11 06:46:09,521] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 679.31 | bwd_microstep: 1864.55 | bwd_inner_microstep: 1864.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-11 06:46:11,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.14 | bwd_microstep: 1494.55 | bwd_inner_microstep: 1494.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3936 [2024-06-11 06:46:13,587] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.20 | bwd_microstep: 1446.44 | bwd_inner_microstep: 1446.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3489 [2024-06-11 06:46:15,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.25 | bwd_microstep: 1334.15 | bwd_inner_microstep: 1334.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 37, images per sample: 9.25, dynamic token length: 3661 [2024-06-11 06:46:17,590] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 579.63 | bwd_microstep: 1568.81 | bwd_inner_microstep: 1568.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3864 [2024-06-11 06:46:19,939] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 629.78 | bwd_microstep: 1709.11 | bwd_inner_microstep: 1709.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3444 [2024-06-11 06:46:21,886] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.36 | bwd_microstep: 1412.14 | bwd_inner_microstep: 1412.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1955 [2024-06-11 06:46:22,862] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.30 | bwd_microstep: 698.86 | bwd_inner_microstep: 698.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3553 [2024-06-11 06:46:24,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 497.55 | bwd_microstep: 1298.42 | bwd_inner_microstep: 1298.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 06:46:26,814] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.15 | bwd_microstep: 1559.77 | bwd_inner_microstep: 1559.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2190 [2024-06-11 06:46:28,008] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 326.27 | bwd_microstep: 860.38 | bwd_inner_microstep: 860.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3575 [2024-06-11 06:46:29,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.61 | bwd_microstep: 1402.02 | bwd_inner_microstep: 1401.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3568 [2024-06-11 06:46:31,758] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.73 | bwd_microstep: 1301.88 | bwd_inner_microstep: 1301.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3459 [2024-06-11 06:46:33,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.82 | bwd_microstep: 1278.28 | bwd_inner_microstep: 1278.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575 [2024-06-11 06:46:35,594] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.26 | bwd_microstep: 1491.39 | bwd_inner_microstep: 1491.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3812 [2024-06-11 06:46:37,737] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.70 | bwd_microstep: 1555.30 | bwd_inner_microstep: 1555.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3497 [2024-06-11 06:46:43,010] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.64 | optimizer_gradients: 4.29 | optimizer_step: 6.61 [2024-06-11 06:46:43,011] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.69 | bwd_microstep: 4729.49 | bwd_inner_microstep: 1478.90 | bwd_allreduce_microstep: 3250.52 | step_microstep: 39.92 [2024-06-11 06:46:43,014] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16028.14 | bwd: 46158.89 | bwd_inner: 42907.35 | bwd_allreduce: 3250.80 | step: 41.56 | 1713/1726 [30:04:12<13:26, 62.03s/it] 99%|█████████▉| 1713/1726 [30:04:12<13:26, 62.03s/it] 99%|█████████▉| 1714/1726 [30:05:16<12:31, 62.63s/it] 99%|█████████▉| 1714/1726 [30:05:16<12:31, 62.63s/it] 99%|█████████▉| 1715/1726 [30:06:16<11:22, 62.05s/it] 99%|█████████▉| 1715/1726 [30:06:16<11:22, 62.05s/it] 99%|█████████▉| 1716/1726 [30:07:16<10:13, 61.39s/it] 99%|█████████▉| 1716/1726 [30:07:16<10:13, 61.39s/it] 99%|█████████▉| 1717/1726 [30:08:17<09:10, 61.13s/it] 99%|█████████▉| 1717/1726 [30:08:17<09:10, 61.13s/it] 100%|████████{'loss': 1.1529, 'learning_rate': 2.2540349943089844e-09, 'epoch': 1.0} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3461 [2024-06-11 06:46:44,910] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1361.87 | bwd_inner_microstep: 1361.78 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.12 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3415 [2024-06-11 06:46:46,552] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.63 | bwd_microstep: 1181.19 | bwd_inner_microstep: 1181.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1961 [2024-06-11 06:46:47,649] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.41 | bwd_microstep: 791.63 | bwd_inner_microstep: 791.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3576 [2024-06-11 06:46:49,584] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.87 | bwd_microstep: 1398.48 | bwd_inner_microstep: 1398.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3801 [2024-06-11 06:46:51,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.45 | bwd_microstep: 1649.52 | bwd_inner_microstep: 1649.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1897 [2024-06-11 06:46:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.98 | bwd_microstep: 777.02 | bwd_inner_microstep: 776.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2018 [2024-06-11 06:46:54,006] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 294.22 | bwd_microstep: 775.35 | bwd_inner_microstep: 775.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 4039 [2024-06-11 06:46:56,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.13 | bwd_microstep: 1517.65 | bwd_inner_microstep: 1517.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1955 [2024-06-11 06:46:57,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.60 | bwd_microstep: 798.58 | bwd_inner_microstep: 798.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3076 [2024-06-11 06:46:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.99 | bwd_microstep: 1309.03 | bwd_inner_microstep: 1309.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3016 [2024-06-11 06:47:00,592] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 432.13 | bwd_microstep: 1133.73 | bwd_inner_microstep: 1133.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3649 [2024-06-11 06:47:02,637] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.52 | bwd_microstep: 1482.26 | bwd_inner_microstep: 1482.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3455 [2024-06-11 06:47:04,384] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 475.18 | bwd_microstep: 1263.26 | bwd_inner_microstep: 1263.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3519 [2024-06-11 06:47:06,249] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.00 | bwd_microstep: 1351.58 | bwd_inner_microstep: 1351.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 49, images per sample: 12.25, dynamic token length: 3646 [2024-06-11 06:47:08,663] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 638.67 | bwd_microstep: 1763.81 | bwd_inner_microstep: 1763.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:47:10,575] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.73 | bwd_microstep: 1381.96 | bwd_inner_microstep: 1381.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3456 [2024-06-11 06:47:12,351] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.50 | bwd_microstep: 1284.19 | bwd_inner_microstep: 1284.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3386 [2024-06-11 06:47:14,150] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 488.45 | bwd_microstep: 1302.19 | bwd_inner_microstep: 1302.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3436 [2024-06-11 06:47:15,959] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.41 | bwd_microstep: 1309.47 | bwd_inner_microstep: 1309.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3458 [2024-06-11 06:47:17,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.67 | bwd_microstep: 1374.94 | bwd_inner_microstep: 1374.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2014 [2024-06-11 06:47:18,981] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.74 | bwd_microstep: 805.82 | bwd_inner_microstep: 805.79 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3715 [2024-06-11 06:47:20,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 505.24 | bwd_microstep: 1337.46 | bwd_inner_microstep: 1337.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3823 [2024-06-11 06:47:22,754] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.27 | bwd_microstep: 1389.30 | bwd_inner_microstep: 1389.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2437 [2024-06-11 06:47:24,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.41 | bwd_microstep: 952.77 | bwd_inner_microstep: 952.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3593 [2024-06-11 06:47:26,191] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.94 | bwd_microstep: 1534.86 | bwd_inner_microstep: 1534.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2017 [2024-06-11 06:47:27,315] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.66 | bwd_microstep: 810.74 | bwd_inner_microstep: 810.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3502 [2024-06-11 06:47:29,102] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.41 | bwd_microstep: 1292.30 | bwd_inner_microstep: 1292.27 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3595 [2024-06-11 06:47:31,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.45 | bwd_microstep: 1411.09 | bwd_inner_microstep: 1411.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3594 [2024-06-11 06:47:33,127] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.10 | bwd_microstep: 1507.62 | bwd_inner_microstep: 1507.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3769 [2024-06-11 06:47:35,198] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.25 | bwd_microstep: 1502.13 | bwd_inner_microstep: 1502.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3602 [2024-06-11 06:47:37,138] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.81 | bwd_microstep: 1404.86 | bwd_inner_microstep: 1404.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3816 [2024-06-11 06:47:45,694] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.14 | optimizer_step: 6.62 [2024-06-11 06:47:45,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 576.68 | bwd_microstep: 7932.29 | bwd_inner_microstep: 1756.14 | bwd_allreduce_microstep: 6176.09 | step_microstep: 38.99 [2024-06-11 06:47:45,698] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15263.35 | bwd: 47088.98 | bwd_inner: 40911.90 | bwd_allreduce: 6176.37 | step: 40.56 {'loss': 1.21, 'learning_rate': 1.7257531401448924e-09, 'epoch': 1.0} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3398 [2024-06-11 06:47:47,419] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.21 | bwd_microstep: 1235.29 | bwd_inner_microstep: 1235.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.07 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2505 [2024-06-11 06:47:48,831] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 382.23 | bwd_microstep: 1021.47 | bwd_inner_microstep: 1021.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3413 [2024-06-11 06:47:50,683] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.29 | bwd_microstep: 1342.32 | bwd_inner_microstep: 1342.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3764 [2024-06-11 06:47:52,929] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 600.79 | bwd_microstep: 1635.39 | bwd_inner_microstep: 1635.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3397 [2024-06-11 06:47:54,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.48 | bwd_microstep: 1244.53 | bwd_inner_microstep: 1244.50 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3489 [2024-06-11 06:47:56,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.03 | bwd_microstep: 1283.26 | bwd_inner_microstep: 1283.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3735 [2024-06-11 06:47:58,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.65 | bwd_microstep: 1528.24 | bwd_inner_microstep: 1528.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3483 [2024-06-11 06:48:00,228] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.47 | bwd_microstep: 1215.12 | bwd_inner_microstep: 1215.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 4047 [2024-06-11 06:48:02,586] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 633.33 | bwd_microstep: 1714.31 | bwd_inner_microstep: 1714.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 15, images per sample: 3.75, dynamic token length: 2633 [2024-06-11 06:48:03,911] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.94 | bwd_microstep: 953.48 | bwd_inner_microstep: 953.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3404 [2024-06-11 06:48:05,636] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.43 | bwd_microstep: 1245.77 | bwd_inner_microstep: 1245.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3405 [2024-06-11 06:48:07,486] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.47 | bwd_microstep: 1340.52 | bwd_inner_microstep: 1340.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3434 [2024-06-11 06:48:09,256] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.86 | bwd_microstep: 1278.39 | bwd_inner_microstep: 1278.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2344 [2024-06-11 06:48:10,535] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 347.23 | bwd_microstep: 924.07 | bwd_inner_microstep: 924.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3909 [2024-06-11 06:48:12,724] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 592.41 | bwd_microstep: 1587.66 | bwd_inner_microstep: 1587.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3386 [2024-06-11 06:48:14,569] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.44 | bwd_microstep: 1336.82 | bwd_inner_microstep: 1336.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3497 [2024-06-11 06:48:16,701] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.16 | bwd_microstep: 1551.35 | bwd_inner_microstep: 1551.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3837 [2024-06-11 06:48:18,878] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 586.38 | bwd_microstep: 1580.06 | bwd_inner_microstep: 1580.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3463 [2024-06-11 06:48:20,571] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 467.83 | bwd_microstep: 1217.89 | bwd_inner_microstep: 1217.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3686 [2024-06-11 06:48:22,556] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.49 | bwd_microstep: 1438.17 | bwd_inner_microstep: 1438.14 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3596 [2024-06-11 06:48:24,638] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.30 | bwd_microstep: 1512.67 | bwd_inner_microstep: 1512.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3838 [2024-06-11 06:48:26,828] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.56 | bwd_microstep: 1591.55 | bwd_inner_microstep: 1591.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3658 [2024-06-11 06:48:28,801] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.60 | bwd_microstep: 1429.26 | bwd_inner_microstep: 1429.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3620 [2024-06-11 06:48:30,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.93 | bwd_microstep: 1318.44 | bwd_inner_microstep: 1318.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3657 [2024-06-11 06:48:32,718] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 565.42 | bwd_microstep: 1519.88 | bwd_inner_microstep: 1519.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2140 [2024-06-11 06:48:33,837] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 307.31 | bwd_microstep: 804.01 | bwd_inner_microstep: 803.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3832 [2024-06-11 06:48:35,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.37 | bwd_microstep: 1360.67 | bwd_inner_microstep: 1360.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3781 [2024-06-11 06:48:37,768] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.19 | bwd_microstep: 1486.20 | bwd_inner_microstep: 1486.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3611 [2024-06-11 06:48:39,851] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.95 | bwd_microstep: 1514.08 | bwd_inner_microstep: 1514.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2036 [2024-06-11 06:48:40,974] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 304.43 | bwd_microstep: 810.32 | bwd_inner_microstep: 810.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1913 [2024-06-11 06:48:42,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 292.98 | bwd_microstep: 780.71 | bwd_inner_microstep: 780.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3454 [2024-06-11 06:48:46,546] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.62 | optimizer_gradients: 4.15 | optimizer_step: 6.60 [2024-06-11 06:48:46,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.33 | bwd_microstep: 3971.02 | bwd_inner_microstep: 1421.96 | bwd_allreduce_microstep: 2549.00 | step_microstep: 39.11 [2024-06-11 06:48:46,549] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15750.16 | bwd: 44772.92 | bwd_inner: 42223.02 | bwd_allreduce: 2549.23 | step: 40.58 {'loss': 1.1683, 'learning_rate': 1.2679051039188317e-09, 'epoch': 1.0} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 06:48:48,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 519.42 | bwd_microstep: 1374.83 | bwd_inner_microstep: 1374.68 | bwd_allreduce_microstep: 0.06 | step_microstep: 0.11 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3419 [2024-06-11 06:48:50,056] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.34 | bwd_microstep: 1148.20 | bwd_inner_microstep: 1148.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3481 [2024-06-11 06:48:52,050] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.22 | bwd_microstep: 1441.53 | bwd_inner_microstep: 1441.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3405 [2024-06-11 06:48:53,645] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 439.67 | bwd_microstep: 1146.41 | bwd_inner_microstep: 1146.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3413 [2024-06-11 06:48:55,284] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.25 | bwd_microstep: 1179.94 | bwd_inner_microstep: 1179.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 06:48:57,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.41 | bwd_microstep: 1282.08 | bwd_inner_microstep: 1282.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3505 [2024-06-11 06:48:58,836] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.02 | bwd_microstep: 1286.02 | bwd_inner_microstep: 1285.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 06:48:59,940] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 298.88 | bwd_microstep: 797.65 | bwd_inner_microstep: 797.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1969 [2024-06-11 06:49:01,086] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 310.78 | bwd_microstep: 827.44 | bwd_inner_microstep: 827.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3487 [2024-06-11 06:49:03,131] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 553.79 | bwd_microstep: 1481.27 | bwd_inner_microstep: 1481.25 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3482 [2024-06-11 06:49:05,038] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.04 | bwd_microstep: 1377.24 | bwd_inner_microstep: 1377.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3419 [2024-06-11 06:49:06,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.71 | bwd_microstep: 1249.71 | bwd_inner_microstep: 1249.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3715 [2024-06-11 06:49:08,882] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.71 | bwd_microstep: 1530.13 | bwd_inner_microstep: 1530.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3653 [2024-06-11 06:49:11,194] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 617.44 | bwd_microstep: 1683.00 | bwd_inner_microstep: 1682.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3467 [2024-06-11 06:49:13,359] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 582.39 | bwd_microstep: 1573.43 | bwd_inner_microstep: 1573.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3536 [2024-06-11 06:49:15,202] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 507.15 | bwd_microstep: 1326.36 | bwd_inner_microstep: 1326.33 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3451 [2024-06-11 06:49:17,072] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.21 | bwd_microstep: 1353.68 | bwd_inner_microstep: 1353.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.35 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3502 [2024-06-11 06:49:19,061] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.15 | bwd_microstep: 1441.96 | bwd_inner_microstep: 1441.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3446 [2024-06-11 06:49:20,965] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 514.11 | bwd_microstep: 1379.69 | bwd_inner_microstep: 1379.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3716 [2024-06-11 06:49:22,952] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 539.02 | bwd_microstep: 1439.09 | bwd_inner_microstep: 1439.06 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3818 [2024-06-11 06:49:24,968] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 546.57 | bwd_microstep: 1459.15 | bwd_inner_microstep: 1459.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 2065 [2024-06-11 06:49:26,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.12 | bwd_microstep: 785.72 | bwd_inner_microstep: 785.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3463 [2024-06-11 06:49:27,717] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 459.09 | bwd_microstep: 1187.15 | bwd_inner_microstep: 1187.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3814 [2024-06-11 06:49:29,595] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.51 | bwd_microstep: 1356.12 | bwd_inner_microstep: 1356.09 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3557 [2024-06-11 06:49:31,407] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 499.92 | bwd_microstep: 1304.46 | bwd_inner_microstep: 1304.44 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3513 [2024-06-11 06:49:33,271] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.03 | bwd_microstep: 1349.07 | bwd_inner_microstep: 1349.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3563 [2024-06-11 06:49:35,162] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.84 | bwd_microstep: 1363.96 | bwd_inner_microstep: 1363.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3580 [2024-06-11 06:49:36,975] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 498.76 | bwd_microstep: 1304.13 | bwd_inner_microstep: 1304.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2162 [2024-06-11 06:49:38,159] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 322.45 | bwd_microstep: 853.63 | bwd_inner_microstep: 853.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3808 [2024-06-11 06:49:40,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.69 | bwd_microstep: 1453.39 | bwd_inner_microstep: 1453.37 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3812 [2024-06-11 06:49:42,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 604.66 | bwd_microstep: 1644.94 | bwd_inner_microstep: 1644.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3594 [2024-06-11 06:49:47,107] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.28 | optimizer_step: 6.58 [2024-06-11 06:49:47,108] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 587.38 | bwd_microstep: 4044.51 | bwd_inner_microstep: 1810.32 | bwd_allreduce_microstep: 2234.11 | step_microstep: 39.79 [2024-06-11 06:49:47,111] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15787.35 | bwd: 44425.97 | bwd_inner: 42190.79 | bwd_allreduce: 2234.40 | step: 41.67 {'loss': 1.1316, 'learning_rate': 8.804924981653529e-10, 'epoch': 1.0} dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3463 [2024-06-11 06:49:49,052] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 527.34 | bwd_microstep: 1396.19 | bwd_inner_microstep: 1396.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3860 [2024-06-11 06:49:51,203] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 581.32 | bwd_microstep: 1559.90 | bwd_inner_microstep: 1559.87 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2331 [2024-06-11 06:49:52,560] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 365.90 | bwd_microstep: 981.43 | bwd_inner_microstep: 981.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3826 [2024-06-11 06:49:54,736] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 583.90 | bwd_microstep: 1582.52 | bwd_inner_microstep: 1582.49 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3794 [2024-06-11 06:49:56,734] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 542.78 | bwd_microstep: 1445.65 | bwd_inner_microstep: 1445.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.18 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3483 [2024-06-11 06:49:58,518] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.19 | bwd_microstep: 1283.48 | bwd_inner_microstep: 1283.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3400 [2024-06-11 06:50:00,374] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.88 | bwd_microstep: 1343.10 | bwd_inner_microstep: 1343.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 19, images per sample: 4.75, dynamic token length: 3497 [2024-06-11 06:50:02,088] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 468.75 | bwd_microstep: 1236.90 | bwd_inner_microstep: 1236.88 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3411 [2024-06-11 06:50:03,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 482.56 | bwd_microstep: 1282.01 | bwd_inner_microstep: 1281.98 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3537 [2024-06-11 06:50:05,915] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.06 | bwd_microstep: 1482.88 | bwd_inner_microstep: 1482.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3621 [2024-06-11 06:50:07,733] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.89 | bwd_microstep: 1313.81 | bwd_inner_microstep: 1313.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-11 06:50:09,589] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 503.06 | bwd_microstep: 1343.41 | bwd_inner_microstep: 1343.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3489 [2024-06-11 06:50:11,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.97 | bwd_microstep: 1314.58 | bwd_inner_microstep: 1314.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3412 [2024-06-11 06:50:13,264] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.73 | bwd_microstep: 1341.16 | bwd_inner_microstep: 1341.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3633 [2024-06-11 06:50:15,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.21 | bwd_microstep: 1604.18 | bwd_inner_microstep: 1604.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 52, images per sample: 13.0, dynamic token length: 3649 [2024-06-11 06:50:17,960] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 659.78 | bwd_microstep: 1820.77 | bwd_inner_microstep: 1820.74 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3901 [2024-06-11 06:50:20,276] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 620.05 | bwd_microstep: 1685.10 | bwd_inner_microstep: 1685.07 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 2310 [2024-06-11 06:50:21,508] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.05 | bwd_microstep: 887.34 | bwd_inner_microstep: 887.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1998 [2024-06-11 06:50:22,623] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 303.21 | bwd_microstep: 803.95 | bwd_inner_microstep: 803.92 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3473 [2024-06-11 06:50:24,405] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.88 | bwd_microstep: 1283.08 | bwd_inner_microstep: 1283.05 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3456 [2024-06-11 06:50:26,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 453.93 | bwd_microstep: 1190.85 | bwd_inner_microstep: 1190.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 12, images per sample: 3.0, dynamic token length: 2292 [2024-06-11 06:50:27,196] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 313.47 | bwd_microstep: 816.49 | bwd_inner_microstep: 816.47 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1906 [2024-06-11 06:50:28,152] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 263.16 | bwd_microstep: 685.26 | bwd_inner_microstep: 685.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3566 [2024-06-11 06:50:30,094] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.56 | bwd_microstep: 1404.56 | bwd_inner_microstep: 1404.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3600 [2024-06-11 06:50:32,047] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.06 | bwd_microstep: 1415.32 | bwd_inner_microstep: 1415.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3548 [2024-06-11 06:50:33,980] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.85 | bwd_microstep: 1395.03 | bwd_inner_microstep: 1395.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3571 [2024-06-11 06:50:35,919] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.12 | bwd_microstep: 1401.24 | bwd_inner_microstep: 1401.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3541 [2024-06-11 06:50:37,725] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 496.32 | bwd_microstep: 1301.73 | bwd_inner_microstep: 1301.70 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2194 [2024-06-11 06:50:39,048] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 357.01 | bwd_microstep: 956.53 | bwd_inner_microstep: 956.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3575 [2024-06-11 06:50:41,112] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.31 | bwd_microstep: 1494.81 | bwd_inner_microstep: 1494.78 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3820 [2024-06-11 06:50:43,213] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 567.09 | bwd_microstep: 1524.19 | bwd_inner_microstep: 1524.17 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3433 [2024-06-11 06:50:49,210] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.63 | optimizer_gradients: 4.13 | optimizer_step: 6.61 [2024-06-11 06:50:49,211] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 502.62 | bwd_microstep: 5446.65 | bwd_inner_microstep: 1523.35 | bwd_allreduce_microstep: 3923.24 | step_microstep: 39.10 [2024-06-11 06:50:49,214] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15731.62 | bwd: 46024.13 | bwd_inner: 42099.97 | bwd_allreduce: 3923.48 | step: 40.83 {'loss': 1.1186, 'learning_rate': 5.63516687352994e-10, 'epoch': 1.0} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 06:50:50,991] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 486.85 | bwd_microstep: 1274.74 | bwd_inner_microstep: 1274.67 | bwd_allreduce_microstep: 0.02 | step_microstep: 0.09 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3914 [2024-06-11 06:50:53,178] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 589.75 | bwd_microstep: 1587.74 | bwd_inner_microstep: 1587.71 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2137 [2024-06-11 06:50:54,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.50 | bwd_microstep: 891.41 | bwd_inner_microstep: 891.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3474 [2024-06-11 06:50:56,140] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 476.00 | bwd_microstep: 1243.00 | bwd_inner_microstep: 1242.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3742 [2024-06-11 06:50:58,114] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.67 | bwd_microstep: 1429.07 | bwd_inner_microstep: 1429.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3491 [2024-06-11 06:50:59,761] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.67 | bwd_microstep: 1186.83 | bwd_inner_microstep: 1186.80 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3425 [2024-06-11 06:51:01,746] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.99 | bwd_microstep: 1440.43 | bwd_inner_microstep: 1440.40 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3717 [2024-06-11 06:51:03,856] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 570.91 | bwd_microstep: 1529.70 | bwd_inner_microstep: 1529.67 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 25, images per sample: 6.25, dynamic token length: 3426 [2024-06-11 06:51:05,650] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.89 | bwd_microstep: 1296.12 | bwd_inner_microstep: 1296.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3501 [2024-06-11 06:51:07,429] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.08 | bwd_microstep: 1285.64 | bwd_inner_microstep: 1285.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 26, images per sample: 6.5, dynamic token length: 3450 [2024-06-11 06:51:09,243] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 492.51 | bwd_microstep: 1312.47 | bwd_inner_microstep: 1312.45 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3410 [2024-06-11 06:51:10,966] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 470.82 | bwd_microstep: 1243.67 | bwd_inner_microstep: 1243.64 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3381 [2024-06-11 06:51:12,844] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 506.92 | bwd_microstep: 1361.72 | bwd_inner_microstep: 1361.69 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3503 [2024-06-11 06:51:14,625] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 484.83 | bwd_microstep: 1287.41 | bwd_inner_microstep: 1287.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 4, images per sample: 1.0, dynamic token length: 640 [2024-06-11 06:51:15,002] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 105.41 | bwd_microstep: 265.28 | bwd_inner_microstep: 265.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2301 [2024-06-11 06:51:16,348] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 363.17 | bwd_microstep: 975.25 | bwd_inner_microstep: 975.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3525 [2024-06-11 06:51:18,410] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.81 | bwd_microstep: 1493.01 | bwd_inner_microstep: 1492.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1946 [2024-06-11 06:51:19,386] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.82 | bwd_microstep: 698.39 | bwd_inner_microstep: 698.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3827 [2024-06-11 06:51:21,268] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 513.03 | bwd_microstep: 1361.21 | bwd_inner_microstep: 1361.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3615 [2024-06-11 06:51:23,349] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.20 | bwd_microstep: 1510.58 | bwd_inner_microstep: 1510.56 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3620 [2024-06-11 06:51:25,431] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.57 | bwd_microstep: 1511.85 | bwd_inner_microstep: 1511.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3535 [2024-06-11 06:51:27,495] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.47 | bwd_microstep: 1495.54 | bwd_inner_microstep: 1495.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3693 [2024-06-11 06:51:29,732] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 598.24 | bwd_microstep: 1627.98 | bwd_inner_microstep: 1627.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3484 [2024-06-11 06:51:31,643] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.63 | bwd_microstep: 1379.44 | bwd_inner_microstep: 1379.42 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3615 [2024-06-11 06:51:33,755] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 566.88 | bwd_microstep: 1535.26 | bwd_inner_microstep: 1535.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 2061 [2024-06-11 06:51:34,930] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 319.00 | bwd_microstep: 848.37 | bwd_inner_microstep: 848.35 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3422 [2024-06-11 06:51:36,547] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 443.65 | bwd_microstep: 1165.69 | bwd_inner_microstep: 1165.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 2937 [2024-06-11 06:51:38,375] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 491.68 | bwd_microstep: 1325.25 | bwd_inner_microstep: 1325.22 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3551 [2024-06-11 06:51:40,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 560.43 | bwd_microstep: 1498.85 | bwd_inner_microstep: 1498.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2019 [2024-06-11 06:51:41,691] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 334.63 | bwd_microstep: 903.62 | bwd_inner_microstep: 903.60 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 44, images per sample: 11.0, dynamic token length: 3586 [2024-06-11 06:51:43,983] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.38 | bwd_microstep: 1670.34 | bwd_inner_microstep: 1670.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3816 [2024-06-11 06:51:50,666] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.73 | optimizer_gradients: 4.15 | optimizer_step: 6.59 [2024-06-11 06:51:50,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 605.18 | bwd_microstep: 6029.43 | bwd_inner_microstep: 1868.68 | bwd_allreduce_microstep: 4160.69 | step_microstep: 38.99 [2024-06-11 06:51:50,670] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15450.25 | bwd: 45665.32 | bwd_inner: 41503.66 | bwd_allreduce: 4160.95 | step: 40.55 █▉| 1718/1726 [30:09:19<08:12, 61.55s/it] 100%|█████████▉| 1718/1726 [30:09:19<08:12, 61.55s/it] 100%|█████████▉| 1719/1726 [30:10:22<07:13, 61.89s/it] 100%|█████████▉| 1719/1726 [30:10:22<07:13, 61.89s/it] 100%|█████████▉| 1720/1726 [30:11:23<06:09, 61.58s/it] 100%|█████████▉| 1720/1726 [30:11:23<06:09, 61.58s/it] 100%|█████████▉| 1721/1726 [30:12:23<05:06, 61.27s/it] 100%|█████████▉| 1721/1726 [30:12:23<05:06, 61.27s/it] 100%|█████████▉| 1722/1726 [30:13:25<04:06, 61.52s/it] 100%|█████████▉| 1722/1726 [30:13:25<04:06, 61.52s/it] 100%|██████{'loss': 1.153, 'learning_rate': 3.1697878786873804e-10, 'epoch': 1.0} dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3551 [2024-06-11 06:51:53,890] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.64 | bwd_microstep: 2679.75 | bwd_inner_microstep: 2679.66 | bwd_allreduce_microstep: 0.04 | step_microstep: 0.13 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3463 [2024-06-11 06:51:55,665] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 487.29 | bwd_microstep: 1278.57 | bwd_inner_microstep: 1278.55 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3844 [2024-06-11 06:51:57,943] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 611.99 | bwd_microstep: 1656.27 | bwd_inner_microstep: 1656.24 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.14 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3753 [2024-06-11 06:51:59,934] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.76 | bwd_microstep: 1440.88 | bwd_inner_microstep: 1440.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3444 [2024-06-11 06:52:01,538] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 442.19 | bwd_microstep: 1153.18 | bwd_inner_microstep: 1153.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3784 [2024-06-11 06:52:03,667] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.60 | bwd_microstep: 1545.84 | bwd_inner_microstep: 1545.82 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3477 [2024-06-11 06:52:05,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 489.92 | bwd_microstep: 1279.13 | bwd_inner_microstep: 1279.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3478 [2024-06-11 06:52:07,362] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.49 | bwd_microstep: 1385.14 | bwd_inner_microstep: 1385.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3423 [2024-06-11 06:52:09,097] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.32 | bwd_microstep: 1252.26 | bwd_inner_microstep: 1252.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3509 [2024-06-11 06:52:11,080] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.71 | bwd_microstep: 1438.56 | bwd_inner_microstep: 1438.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3400 [2024-06-11 06:52:12,759] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 460.10 | bwd_microstep: 1211.23 | bwd_inner_microstep: 1211.21 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 31, images per sample: 7.75, dynamic token length: 3610 [2024-06-11 06:52:14,785] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 545.39 | bwd_microstep: 1467.64 | bwd_inner_microstep: 1467.61 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 42, images per sample: 10.5, dynamic token length: 3699 [2024-06-11 06:52:17,058] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.23 | bwd_microstep: 1653.62 | bwd_inner_microstep: 1653.59 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3515 [2024-06-11 06:52:19,055] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 538.93 | bwd_microstep: 1447.12 | bwd_inner_microstep: 1447.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3611 [2024-06-11 06:52:20,995] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.68 | bwd_microstep: 1404.07 | bwd_inner_microstep: 1404.04 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3486 [2024-06-11 06:52:22,913] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 522.27 | bwd_microstep: 1386.32 | bwd_inner_microstep: 1386.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3452 [2024-06-11 06:52:24,653] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 474.81 | bwd_microstep: 1256.65 | bwd_inner_microstep: 1256.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3502 [2024-06-11 06:52:26,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 548.46 | bwd_microstep: 1484.29 | bwd_inner_microstep: 1484.26 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3456 [2024-06-11 06:52:28,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 473.54 | bwd_microstep: 1255.92 | bwd_inner_microstep: 1255.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3723 [2024-06-11 06:52:30,550] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 573.26 | bwd_microstep: 1533.78 | bwd_inner_microstep: 1533.76 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3824 [2024-06-11 06:52:32,824] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.06 | bwd_microstep: 1655.06 | bwd_inner_microstep: 1655.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3520 [2024-06-11 06:52:34,738] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 517.22 | bwd_microstep: 1388.33 | bwd_inner_microstep: 1388.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3539 [2024-06-11 06:52:36,799] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.29 | bwd_microstep: 1492.65 | bwd_inner_microstep: 1492.62 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3806 [2024-06-11 06:52:38,944] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 578.25 | bwd_microstep: 1557.04 | bwd_inner_microstep: 1557.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3656 [2024-06-11 06:52:40,781] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.14 | bwd_microstep: 1326.71 | bwd_inner_microstep: 1326.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3476 [2024-06-11 06:52:42,436] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 458.06 | bwd_microstep: 1189.41 | bwd_inner_microstep: 1189.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3691 [2024-06-11 06:52:44,317] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 511.92 | bwd_microstep: 1361.02 | bwd_inner_microstep: 1361.00 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3454 [2024-06-11 06:52:46,312] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 535.99 | bwd_microstep: 1448.89 | bwd_inner_microstep: 1448.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2024 [2024-06-11 06:52:47,564] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 336.29 | bwd_microstep: 906.33 | bwd_inner_microstep: 906.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3631 [2024-06-11 06:52:49,769] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.51 | bwd_microstep: 1606.31 | bwd_inner_microstep: 1606.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3559 [2024-06-11 06:52:51,868] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.89 | bwd_microstep: 1520.05 | bwd_inner_microstep: 1520.02 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3741 [2024-06-11 06:52:54,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.84 | optimizer_gradients: 4.08 | optimizer_step: 6.58 [2024-06-11 06:52:54,484] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.22 | bwd_microstep: 1976.12 | bwd_inner_microstep: 1722.97 | bwd_allreduce_microstep: 253.09 | step_microstep: 39.23 [2024-06-11 06:52:54,487] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16821.09 | bwd: 46638.15 | bwd_inner: 46384.08 | bwd_allreduce: 253.37 | step: 41.04 {'loss': 1.1706, 'learning_rate': 1.4087966801579201e-10, 'epoch': 1.0} dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 3403 [2024-06-11 06:52:56,129] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 451.03 | bwd_microstep: 1175.44 | bwd_inner_microstep: 1175.41 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.08 dynamic ViT batch size: 30, images per sample: 7.5, dynamic token length: 3915 [2024-06-11 06:52:58,229] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.71 | bwd_microstep: 1521.95 | bwd_inner_microstep: 1521.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3796 [2024-06-11 06:53:00,241] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 544.94 | bwd_microstep: 1456.06 | bwd_inner_microstep: 1456.03 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3834 [2024-06-11 06:53:02,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 543.06 | bwd_microstep: 1453.49 | bwd_inner_microstep: 1453.46 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3762 [2024-06-11 06:53:04,242] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 540.19 | bwd_microstep: 1445.99 | bwd_inner_microstep: 1445.97 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1948 [2024-06-11 06:53:05,342] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.05 | bwd_microstep: 792.71 | bwd_inner_microstep: 792.68 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3480 [2024-06-11 06:53:07,392] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 555.34 | bwd_microstep: 1485.41 | bwd_inner_microstep: 1485.38 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 13, images per sample: 3.25, dynamic token length: 1900 [2024-06-11 06:53:08,413] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 277.22 | bwd_microstep: 734.98 | bwd_inner_microstep: 734.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3469 [2024-06-11 06:53:10,459] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 554.05 | bwd_microstep: 1482.17 | bwd_inner_microstep: 1482.15 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3607 [2024-06-11 06:53:12,662] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 588.31 | bwd_microstep: 1604.87 | bwd_inner_microstep: 1604.84 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3377 [2024-06-11 06:53:14,511] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 500.73 | bwd_microstep: 1338.33 | bwd_inner_microstep: 1338.30 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.05 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1966 [2024-06-11 06:53:15,618] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 299.86 | bwd_microstep: 798.41 | bwd_inner_microstep: 798.39 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3765 [2024-06-11 06:53:17,604] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 536.79 | bwd_microstep: 1440.01 | bwd_inner_microstep: 1439.99 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3402 [2024-06-11 06:53:19,330] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.58 | bwd_microstep: 1245.88 | bwd_inner_microstep: 1245.86 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3504 [2024-06-11 06:53:21,327] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.66 | bwd_microstep: 1448.80 | bwd_inner_microstep: 1448.77 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3528 [2024-06-11 06:53:23,393] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 558.89 | bwd_microstep: 1496.35 | bwd_inner_microstep: 1496.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3656 [2024-06-11 06:53:25,437] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 552.39 | bwd_microstep: 1481.98 | bwd_inner_microstep: 1481.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3452 [2024-06-11 06:53:27,217] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 483.88 | bwd_microstep: 1286.32 | bwd_inner_microstep: 1286.29 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 36, images per sample: 9.0, dynamic token length: 3634 [2024-06-11 06:53:29,340] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.70 | bwd_microstep: 1543.66 | bwd_inner_microstep: 1543.63 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.12 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3460 [2024-06-11 06:53:31,247] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 520.90 | bwd_microstep: 1375.96 | bwd_inner_microstep: 1375.93 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3822 [2024-06-11 06:53:33,171] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 523.30 | bwd_microstep: 1391.21 | bwd_inner_microstep: 1391.18 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3620 [2024-06-11 06:53:35,116] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.88 | bwd_microstep: 1410.53 | bwd_inner_microstep: 1410.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3475 [2024-06-11 06:53:37,028] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 521.30 | bwd_microstep: 1382.10 | bwd_inner_microstep: 1382.08 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3550 [2024-06-11 06:53:38,832] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 495.68 | bwd_microstep: 1299.68 | bwd_inner_microstep: 1299.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1958 [2024-06-11 06:53:39,815] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 270.97 | bwd_microstep: 703.54 | bwd_inner_microstep: 703.52 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 3549 [2024-06-11 06:53:41,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 464.36 | bwd_microstep: 1201.15 | bwd_inner_microstep: 1201.12 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3553 [2024-06-11 06:53:43,559] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.59 | bwd_microstep: 1499.92 | bwd_inner_microstep: 1499.89 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-11 06:53:45,826] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 608.59 | bwd_microstep: 1647.23 | bwd_inner_microstep: 1647.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.19 dynamic ViT batch size: 38, images per sample: 9.5, dynamic token length: 3820 [2024-06-11 06:53:48,057] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 597.87 | bwd_microstep: 1619.76 | bwd_inner_microstep: 1619.73 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3574 [2024-06-11 06:53:49,998] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 529.09 | bwd_microstep: 1402.13 | bwd_inner_microstep: 1402.10 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3781 [2024-06-11 06:53:52,265] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 607.72 | bwd_microstep: 1648.68 | bwd_inner_microstep: 1648.65 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 40, images per sample: 10.0, dynamic token length: 3581 [2024-06-11 06:53:57,186] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.67 | optimizer_gradients: 4.25 | optimizer_step: 6.61 [2024-06-11 06:53:57,187] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 590.81 | bwd_microstep: 4280.50 | bwd_inner_microstep: 2090.37 | bwd_allreduce_microstep: 2190.06 | step_microstep: 39.58 [2024-06-11 06:53:57,190] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 16250.07 | bwd: 46095.21 | bwd_inner: 43904.23 | bwd_allreduce: 2190.30 | step: 41.38 {'loss': 1.165, 'learning_rate': 3.521994801580775e-11, 'epoch': 1.0} dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2014 [2024-06-11 06:53:58,426] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 331.93 | bwd_microstep: 889.54 | bwd_inner_microstep: 889.35 | bwd_allreduce_microstep: 0.08 | step_microstep: 0.13 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3961 [2024-06-11 06:54:00,489] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 559.59 | bwd_microstep: 1493.34 | bwd_inner_microstep: 1493.31 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 14, images per sample: 3.5, dynamic token length: 1878 [2024-06-11 06:54:01,517] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 280.29 | bwd_microstep: 739.98 | bwd_inner_microstep: 739.95 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3923 [2024-06-11 06:54:03,444] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 526.26 | bwd_microstep: 1392.19 | bwd_inner_microstep: 1392.16 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 18, images per sample: 4.5, dynamic token length: 1956 [2024-06-11 06:54:04,585] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 309.19 | bwd_microstep: 824.83 | bwd_inner_microstep: 824.81 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3403 [2024-06-11 06:54:06,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 471.11 | bwd_microstep: 1245.22 | bwd_inner_microstep: 1245.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3759 [2024-06-11 06:54:08,433] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 571.95 | bwd_microstep: 1541.22 | bwd_inner_microstep: 1541.19 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 24, images per sample: 6.0, dynamic token length: 3733 [2024-06-11 06:54:10,326] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 515.90 | bwd_microstep: 1367.23 | bwd_inner_microstep: 1367.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3444 [2024-06-11 06:54:12,062] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.98 | bwd_microstep: 1254.13 | bwd_inner_microstep: 1254.11 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3617 [2024-06-11 06:54:14,013] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 528.17 | bwd_microstep: 1414.69 | bwd_inner_microstep: 1414.66 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 3101 [2024-06-11 06:54:15,613] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 438.95 | bwd_microstep: 1151.97 | bwd_inner_microstep: 1151.94 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 16, images per sample: 4.0, dynamic token length: 1975 [2024-06-11 06:54:16,719] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 300.76 | bwd_microstep: 797.61 | bwd_inner_microstep: 797.58 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3676 [2024-06-11 06:54:18,695] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 534.43 | bwd_microstep: 1432.88 | bwd_inner_microstep: 1432.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.06 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3508 [2024-06-11 06:54:20,617] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 518.63 | bwd_microstep: 1393.26 | bwd_inner_microstep: 1393.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3664 [2024-06-11 06:54:22,656] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 551.05 | bwd_microstep: 1478.34 | bwd_inner_microstep: 1478.32 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3628 [2024-06-11 06:54:24,741] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 561.19 | bwd_microstep: 1513.25 | bwd_inner_microstep: 1513.23 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3415 [2024-06-11 06:54:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 472.38 | bwd_microstep: 1249.85 | bwd_inner_microstep: 1249.83 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 32, images per sample: 8.0, dynamic token length: 3499 [2024-06-11 06:54:28,468] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 537.39 | bwd_microstep: 1446.93 | bwd_inner_microstep: 1446.90 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 43, images per sample: 10.75, dynamic token length: 3631 [2024-06-11 06:54:30,745] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 606.32 | bwd_microstep: 1658.39 | bwd_inner_microstep: 1658.36 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 20, images per sample: 5.0, dynamic token length: 2080 [2024-06-11 06:54:31,967] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 330.43 | bwd_microstep: 883.31 | bwd_inner_microstep: 883.28 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3618 [2024-06-11 06:54:34,310] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 622.51 | bwd_microstep: 1709.57 | bwd_inner_microstep: 1709.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3397 [2024-06-11 06:54:36,166] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 501.34 | bwd_microstep: 1345.16 | bwd_inner_microstep: 1345.13 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 46, images per sample: 11.5, dynamic token length: 3653 [2024-06-11 06:54:38,522] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 627.62 | bwd_microstep: 1716.04 | bwd_inner_microstep: 1716.01 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3712 [2024-06-11 06:54:40,629] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 568.09 | bwd_microstep: 1529.54 | bwd_inner_microstep: 1529.51 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 2182 [2024-06-11 06:54:41,948] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 356.32 | bwd_microstep: 953.88 | bwd_inner_microstep: 953.85 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 10, images per sample: 2.5, dynamic token length: 1926 [2024-06-11 06:54:42,922] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 269.25 | bwd_microstep: 696.56 | bwd_inner_microstep: 696.54 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3590 [2024-06-11 06:54:44,897] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.58 | bwd_microstep: 1407.55 | bwd_inner_microstep: 1407.53 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3500 [2024-06-11 06:54:46,811] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 516.56 | bwd_microstep: 1387.94 | bwd_inner_microstep: 1387.91 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3596 [2024-06-11 06:54:48,751] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 524.91 | bwd_microstep: 1406.75 | bwd_inner_microstep: 1406.72 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.03 dynamic ViT batch size: 22, images per sample: 5.5, dynamic token length: 3468 [2024-06-11 06:54:50,534] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 490.77 | bwd_microstep: 1283.22 | bwd_inner_microstep: 1283.20 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 34, images per sample: 8.5, dynamic token length: 3800 [2024-06-11 06:54:52,674] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 577.38 | bwd_microstep: 1552.78 | bwd_inner_microstep: 1552.75 | bwd_allreduce_microstep: 0.01 | step_microstep: 0.04 dynamic ViT batch size: 28, images per sample: 7.0, dynamic token length: 3612 [2024-06-11 06:54:59,863] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | optimizer_allgather: 16.70 | optimizer_gradients: 4.27 | optimizer_step: 6.57 [2024-06-11 06:54:59,864] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd_microstep: 525.58 | bwd_microstep: 6616.12 | bwd_inner_microstep: 1579.57 | bwd_allreduce_microstep: 5036.47 | step_microstep: 39.78 [2024-06-11 06:54:59,867] [INFO] [logging.py:96:log_dist] [Rank 0] time (ms) | fwd: 15523.38 | bwd: 46773.30 | bwd_inner: 41735.75 | bwd_allreduce: 5036.80 | step: 41.48 {'loss': 1.1662, 'learning_rate': 0.0, 'epoch': 1.0} Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fa0769bf9f0> Failed to load image: playground/data/ocr_vqa/images/60170921.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0768c3900> Failed to load image: playground/data/ocr_vqa/images/157851231X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0768aeb30> Failed to load image: playground/data/ocr_vqa/images/930031571.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769346d0> Failed to load image: playground/data/ocr_vqa/images/771573936.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa072b7bef0> Failed to load image: playground/data/ocr_vqa/images/60164255.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769d5f90> Failed to load image: playground/data/ocr_vqa/images/316142778.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769569a0> Failed to load image: playground/data/ocr_vqa/images/715308904.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769a3cc0> Failed to load image: playground/data/ocr_vqa/images/312244452.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa072b97270> Failed to load image: playground/data/ocr_vqa/images/201112973.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0768c0270> Failed to load image: playground/data/ocr_vqa/images/521506743.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769ec2c0> Failed to load image: playground/data/ocr_vqa/images/2067009559.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa0769f8540> Failed to load image: playground/data/ocr_vqa/images/425099369.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fbe7ecaa2c0> Failed to load image: playground/data/ocr_vqa/images/816512019.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ecaaf40> Failed to load image: playground/data/ocr_vqa/images/883880075.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7d005c70> Failed to load image: playground/data/ocr_vqa/images/966424603.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe9745b1d0> Failed to load image: playground/data/ocr_vqa/images/739704516.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ee35540> Failed to load image: playground/data/ocr_vqa/images/292796048.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe80449770> Failed to load image: playground/data/ocr_vqa/images/037550267X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ecb77c0> Failed to load image: playground/data/ocr_vqa/images/3928819232.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ecb76d0> Failed to load image: playground/data/ocr_vqa/images/878576991.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7edb6360> Failed to load image: playground/data/ocr_vqa/images/786405511.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ed5f7c0> Failed to load image: playground/data/ocr_vqa/images/553069403.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe9745b450> Failed to load image: playground/data/ocr_vqa/images/800756614.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbe7ecb7cc0> Failed to load image: playground/data/ocr_vqa/images/679445765.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f690f301540> Failed to load image: playground/data/ocr_vqa/images/930016238.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f690f2e3c70> Failed to load image: playground/data/ocr_vqa/images/763115509.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f691108c0e0> Failed to load image: playground/data/ocr_vqa/images/1576730867.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f691101f6d0> Failed to load image: playground/data/ocr_vqa/images/393701719.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6a4c76bd10> Failed to load image: playground/data/ocr_vqa/images/937274461.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f69110de900> Failed to load image: playground/data/ocr_vqa/images/761500413.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6910f0aa40> Failed to load image: playground/data/ocr_vqa/images/914625179.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f691101f3b0> Failed to load image: playground/data/ocr_vqa/images/28628594.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f7ba50a14a0> Failed to load image: playground/data/ocr_vqa/images/830816011.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba5101ea0> Failed to load image: playground/data/ocr_vqa/images/899330266.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba5090f90> Failed to load image: playground/data/ocr_vqa/images/674035275.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6dbc4f0> Failed to load image: playground/data/ocr_vqa/images/785282394.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6df4c70> Failed to load image: playground/data/ocr_vqa/images/965776611.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6ebcef0> Failed to load image: playground/data/ocr_vqa/images/B00XIZWWNC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! string index out of range Failed to load image: playground/data/coco/train2017/000000178275.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fedd2299c20> Failed to load image: playground/data/ocr_vqa/images/471243787.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fedd2251a90> Failed to load image: playground/data/ocr_vqa/images/912111364.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fedd21ceea0> Failed to load image: playground/data/ocr_vqa/images/688116191.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fedd21219a0> Failed to load image: playground/data/ocr_vqa/images/671025627.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f07462bf720> Failed to load image: playground/data/ocr_vqa/images/1881174034.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07462ab090> Failed to load image: playground/data/ocr_vqa/images/898620996.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07462bf270> Failed to load image: playground/data/ocr_vqa/images/393090027.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0746364720> Failed to load image: playground/data/ocr_vqa/images/1560523573.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07463db9f0> Failed to load image: playground/data/ocr_vqa/images/892390263.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07462aacc0> Failed to load image: playground/data/ocr_vqa/images/060960323X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07463c55e0> Failed to load image: playground/data/ocr_vqa/images/517884283.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f074255ee50> Failed to load image: playground/data/ocr_vqa/images/394532643.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07462ab9f0> Failed to load image: playground/data/ocr_vqa/images/871563932.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0742587e50> Failed to load image: playground/data/ocr_vqa/images/789401592.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f074623e5e0> Failed to load image: playground/data/ocr_vqa/images/28624084.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07463a8cc0> Failed to load image: playground/data/ocr_vqa/images/446387355.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07463a8860> Failed to load image: playground/data/ocr_vqa/images/810940183.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f07462aab30> Failed to load image: playground/data/ocr_vqa/images/1569750882.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6db6310> Failed to load image: playground/data/ocr_vqa/images/080740604X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6e20a40> Failed to load image: playground/data/ocr_vqa/images/966542606.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6e20ea0> Failed to load image: playground/data/ocr_vqa/images/412132710.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6db69f0> Failed to load image: playground/data/ocr_vqa/images/1566866391.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6d87ae0> Failed to load image: playground/data/ocr_vqa/images/067173363X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6e91540> Failed to load image: playground/data/ocr_vqa/images/953735702.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7ba6e20900> Failed to load image: playground/data/ocr_vqa/images/465069347.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fbf83c090e0> Failed to load image: playground/data/ocr_vqa/images/739714864.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83cf19f0> Failed to load image: playground/data/ocr_vqa/images/038794740X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83bb65e0> Failed to load image: playground/data/ocr_vqa/images/812931432.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83c75a40> Failed to load image: playground/data/ocr_vqa/images/1860340660.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83c75b30> Failed to load image: playground/data/ocr_vqa/images/1574301012.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83c75ae0> Failed to load image: playground/data/ocr_vqa/images/789401509.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83c59e00> Failed to load image: playground/data/ocr_vqa/images/1564965112.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83c20e00> Failed to load image: playground/data/ocr_vqa/images/078710339X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83b16bd0> Failed to load image: playground/data/ocr_vqa/images/688118127.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf8c093b30> Failed to load image: playground/data/ocr_vqa/images/188673206X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fbf83cf1040> Failed to load image: playground/data/ocr_vqa/images/785807209.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f1258f51b30> Failed to load image: playground/data/ocr_vqa/images/938076140.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258f37d10> Failed to load image: playground/data/ocr_vqa/images/078942049X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258fb3400> Failed to load image: playground/data/ocr_vqa/images/1566250420.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258f51b80> Failed to load image: playground/data/ocr_vqa/images/395539331.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258e3dea0> Failed to load image: playground/data/ocr_vqa/images/967695104.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1265d899f0> Failed to load image: playground/data/ocr_vqa/images/1557987203.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258eb4c20> Failed to load image: playground/data/ocr_vqa/images/64462013.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258f1d130> Failed to load image: playground/data/ocr_vqa/images/1579771009.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f1ffb2b1590> Failed to load image: playground/data/ocr_vqa/images/134412052.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f2023de50e0> Failed to load image: playground/data/ocr_vqa/images/1570280762.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb181720> Failed to load image: playground/data/ocr_vqa/images/785805516.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb2aa2c0> Failed to load image: playground/data/ocr_vqa/images/893148423.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb21c450> Failed to load image: playground/data/ocr_vqa/images/671567918.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb205270> Failed to load image: playground/data/ocr_vqa/images/960536205.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5590> Failed to load image: playground/data/ocr_vqa/images/761522751.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5c70> Failed to load image: playground/data/ocr_vqa/images/789408554.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb0c5810> Failed to load image: playground/data/ocr_vqa/images/185343342X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1ffb2ac130> Failed to load image: playground/data/ocr_vqa/images/785808841.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f0a8c2ddcc0> Failed to load image: playground/data/ocr_vqa/images/1571971459.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c2218b0> Failed to load image: playground/data/ocr_vqa/images/073970477X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8a5b2ef0> Failed to load image: playground/data/ocr_vqa/images/185230863X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c244040> Failed to load image: playground/data/ocr_vqa/images/933261004.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c3b2d10> Failed to load image: playground/data/ocr_vqa/images/933821131.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c3b2ae0> Failed to load image: playground/data/ocr_vqa/images/1855326779.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c293f40> Failed to load image: playground/data/ocr_vqa/images/670886939.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c355590> Failed to load image: playground/data/ocr_vqa/images/345414810.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c2fc400> Failed to load image: playground/data/ocr_vqa/images/679844023.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fd8c7bbbd10> Failed to load image: playground/data/ocr_vqa/images/1580170536.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c3dbf400> Failed to load image: playground/data/ocr_vqa/images/312187114.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c7c117c0> Failed to load image: playground/data/ocr_vqa/images/1559920696.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c3dbf6d0> Failed to load image: playground/data/ocr_vqa/images/1572240466.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c7b41130> Failed to load image: playground/data/ocr_vqa/images/1886947694.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c3da4d10> Failed to load image: playground/data/ocr_vqa/images/375400664.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c7a1a7c0> Failed to load image: playground/data/ocr_vqa/images/786884061.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c6bc7180> Failed to load image: playground/data/ocr_vqa/images/133099156.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c3dcb360> Failed to load image: playground/data/ocr_vqa/images/668053984.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fd8c3da42c0> Failed to load image: playground/data/ocr_vqa/images/805034676.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f121e5d4900> Failed to load image: playground/data/ocr_vqa/images/739702505.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e5d49a0> Failed to load image: playground/data/ocr_vqa/images/812903390.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e4f0220> Failed to load image: playground/data/ocr_vqa/images/936783109.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e55fcc0> Failed to load image: playground/data/ocr_vqa/images/761307842.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e54a450> Failed to load image: playground/data/ocr_vqa/images/1559920734.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121c7ae9a0> Failed to load image: playground/data/ocr_vqa/images/1555951295.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e4ff8b0> Failed to load image: playground/data/ocr_vqa/images/1580621783.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f121e4afef0> Failed to load image: playground/data/ocr_vqa/images/958315434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fc137a29450> Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fee4314f3b0> Failed to load image: playground/data/ocr_vqa/images/762704519.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee430e3270> Failed to load image: playground/data/ocr_vqa/images/125476604.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee431414a0> Failed to load image: playground/data/ocr_vqa/images/821411896.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee43156a40> Failed to load image: playground/data/ocr_vqa/images/870331612.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee43156400> Failed to load image: playground/data/ocr_vqa/images/739715100.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee431721d0> Failed to load image: playground/data/ocr_vqa/images/894550225.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee40398b30> Failed to load image: playground/data/ocr_vqa/images/962726303.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee403b7a40> Failed to load image: playground/data/ocr_vqa/images/860208656.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee431bc450> Failed to load image: playground/data/ocr_vqa/images/156347185X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fee43054310> Failed to load image: playground/data/ocr_vqa/images/966355903.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f6c551cbea0> Failed to load image: playground/data/ocr_vqa/images/1862045852.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c56f7dc70> Failed to load image: playground/data/ocr_vqa/images/156458321X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c56fd5860> Failed to load image: playground/data/ocr_vqa/images/870675656.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c56f7cdb0> Failed to load image: playground/data/ocr_vqa/images/809235269.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c551cb680> Failed to load image: playground/data/ocr_vqa/images/810928949.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c56ffc130> Failed to load image: playground/data/ocr_vqa/images/1568811012.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c551cb720> Failed to load image: playground/data/ocr_vqa/images/78821185.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c551ba680> Failed to load image: playground/data/ocr_vqa/images/870695916.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f6c56f12540> Failed to load image: playground/data/ocr_vqa/images/B011M9LHUO.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f84909a3400> Failed to load image: playground/data/ocr_vqa/images/1885928017.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f8490979680> Failed to load image: playground/data/ocr_vqa/images/60191341.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f849370c310> Failed to load image: playground/data/ocr_vqa/images/739715593.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f84909a3360> Failed to load image: playground/data/ocr_vqa/images/1560445513.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f849370c860> Failed to load image: playground/data/ocr_vqa/images/931580390.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f84937adae0> Failed to load image: playground/data/ocr_vqa/images/849930987.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f8493807c20> Failed to load image: playground/data/ocr_vqa/images/679761799.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f84909944f0> Failed to load image: playground/data/ocr_vqa/images/664220789.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f76939792c0> Failed to load image: playground/data/ocr_vqa/images/877792356.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f76938c1e50> Failed to load image: playground/data/ocr_vqa/images/739704745.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f76938b8540> Failed to load image: playground/data/ocr_vqa/images/1564582914.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7693876e00> Failed to load image: playground/data/ocr_vqa/images/1568590644.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7693836d10> Failed to load image: playground/data/ocr_vqa/images/679888268.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f76938b1900> Failed to load image: playground/data/ocr_vqa/images/B00WTKH3HC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f76939a4270> Failed to load image: playground/data/ocr_vqa/images/1574770225.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f76939a44f0> Failed to load image: playground/data/ocr_vqa/images/915801841.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f769395e450> Failed to load image: playground/data/ocr_vqa/images/965150739.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258f5dd10> Failed to load image: playground/data/ocr_vqa/images/289800900.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258f51720> Failed to load image: playground/data/ocr_vqa/images/761511253.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f1258fb39a0> Failed to load image: playground/data/ocr_vqa/images/1557488789.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f125ab30d60> Failed to load image: playground/data/ocr_vqa/images/553062204.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7ff1db14d090> Failed to load image: playground/data/ocr_vqa/images/739714600.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7ff1db1f6a40> Failed to load image: playground/data/ocr_vqa/images/299130002.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7ff1db252040> Failed to load image: playground/data/ocr_vqa/images/1570611912.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7ff1db0bff90> Failed to load image: playground/data/ocr_vqa/images/B00XLZW19O.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7ff1db1f6220> Failed to load image: playground/data/ocr_vqa/images/877936293.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7ff1dc5459a0> Failed to load image: playground/data/ocr_vqa/images/B007K53FQ4.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f3c79659e00> Failed to load image: playground/data/ocr_vqa/images/525941290.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c795bb5e0> Failed to load image: playground/data/ocr_vqa/images/739714023.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c795e44f0> Failed to load image: playground/data/ocr_vqa/images/785270965.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c794ff3b0> Failed to load image: playground/data/ocr_vqa/images/706377273.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c7a971a90> Failed to load image: playground/data/ocr_vqa/images/780800370.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c795d0630> Failed to load image: playground/data/ocr_vqa/images/553353500.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c77608040> Failed to load image: playground/data/ocr_vqa/images/1564583031.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c776088b0> Failed to load image: playground/data/ocr_vqa/images/70359148.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f3c795e40e0> Failed to load image: playground/data/ocr_vqa/images/3797306210.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c3cfae0> Failed to load image: playground/data/ocr_vqa/images/1556507488.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f0a8c39fb30> Failed to load image: playground/data/ocr_vqa/images/966586611.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fa73ad14310> Failed to load image: playground/data/ocr_vqa/images/673384772.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a427810> Failed to load image: playground/data/ocr_vqa/images/819183482.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a3e29a0> Failed to load image: playground/data/ocr_vqa/images/139642625.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73ad31860> Failed to load image: playground/data/ocr_vqa/images/295968265.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a404860> Failed to load image: playground/data/ocr_vqa/images/823412385.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73ad1fdb0> Failed to load image: playground/data/ocr_vqa/images/818405988.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a404e00> Failed to load image: playground/data/ocr_vqa/images/29344506.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a3305e0> Failed to load image: playground/data/ocr_vqa/images/882894293.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a3fd9f0> Failed to load image: playground/data/ocr_vqa/images/345351452.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a47ef90> Failed to load image: playground/data/ocr_vqa/images/231072430.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a3f4f90> Failed to load image: playground/data/ocr_vqa/images/093727464X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa73a37c090> Failed to load image: playground/data/ocr_vqa/images/051763547X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fe0e79bdcc0> Failed to load image: playground/data/ocr_vqa/images/3980621146.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e7926f90> Failed to load image: playground/data/ocr_vqa/images/1580910068.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e7934630> Failed to load image: playground/data/ocr_vqa/images/1564583015.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0fffd5540> Failed to load image: playground/data/ocr_vqa/images/471148288.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e78779f0> Failed to load image: playground/data/ocr_vqa/images/1570761116.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Failed to load image: playground/data/ocr_vqa/images/472084798.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc137a25540> Failed to load image: playground/data/ocr_vqa/images/823929493.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc137a1e9a0> Failed to load image: playground/data/ocr_vqa/images/739704133.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc1379cf7c0> Failed to load image: playground/data/ocr_vqa/images/1570282358.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc13790b810> Failed to load image: playground/data/ocr_vqa/images/877793417.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc137956270> Failed to load image: playground/data/ocr_vqa/images/807085707.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc1379ea900> Failed to load image: playground/data/ocr_vqa/images/006270110X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc1349ca3b0> Failed to load image: playground/data/ocr_vqa/images/835607240.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc13909a3b0> Failed to load image: playground/data/ocr_vqa/images/942627458.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc13799d3b0> Failed to load image: playground/data/ocr_vqa/images/1882419065.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fc137a25f40> Failed to load image: playground/data/ocr_vqa/images/912818034.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f88b76541d0> Failed to load image: playground/data/ocr_vqa/images/934710171.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b1c0d0e0> Failed to load image: playground/data/ocr_vqa/images/268016860.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b5808bd0> Failed to load image: playground/data/ocr_vqa/images/749517735.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b756aa40> Failed to load image: playground/data/ocr_vqa/images/71351817.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b75f1c20> Failed to load image: playground/data/ocr_vqa/images/375501983.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b1c2aa90> Failed to load image: playground/data/ocr_vqa/images/810955563.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b1c2a6d0> Failed to load image: playground/data/ocr_vqa/images/1563705176.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b76541d0> Failed to load image: playground/data/ocr_vqa/images/030724055X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b1c0aea0> Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f7344952b80> Failed to load image: playground/data/ocr_vqa/images/832904651.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344871220> Failed to load image: playground/data/ocr_vqa/images/875730434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344882ea0> Failed to load image: playground/data/ocr_vqa/images/789404427.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7342a824f0> Failed to load image: playground/data/ocr_vqa/images/812015320.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73447cbd10> Failed to load image: playground/data/ocr_vqa/images/737303360.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344880950> Failed to load image: playground/data/ocr_vqa/images/750219378.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f734475c450> Failed to load image: playground/data/ocr_vqa/images/688151175.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344880cc0> Failed to load image: playground/data/ocr_vqa/images/1579901387.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7342a82900> Failed to load image: playground/data/ocr_vqa/images/205260780.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344882270> Failed to load image: playground/data/ocr_vqa/images/750223391.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344882ae0> Failed to load image: playground/data/ocr_vqa/images/1560984503.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f7344859d60> Failed to load image: playground/data/ocr_vqa/images/811726819.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f734467f7c0> Failed to load image: playground/data/ocr_vqa/images/1890838004.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fb319419040> Failed to load image: playground/data/ocr_vqa/images/3980621154.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b1ae1d0> Failed to load image: playground/data/ocr_vqa/images/3884452762.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b1202c0> Failed to load image: playground/data/ocr_vqa/images/930410629.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31943c5e0> Failed to load image: playground/data/ocr_vqa/images/1568362021.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b118f90> Failed to load image: playground/data/ocr_vqa/images/1890916196.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b11b3b0> Failed to load image: playground/data/ocr_vqa/images/031476271X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7efe10cf8130> Failed to load image: playground/data/ocr_vqa/images/1883323703.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe0debb400> Failed to load image: playground/data/ocr_vqa/images/717281434.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe0de9ed10> Failed to load image: playground/data/ocr_vqa/images/1566864941.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10c01e50> Failed to load image: playground/data/ocr_vqa/images/1879505460.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe187b8d10> Failed to load image: playground/data/web-celebrity/images/Lee_Byung-hun2.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k cannot identify image file <_io.BytesIO object at 0x7efe10b44e50> Failed to load image: playground/data/ocr_vqa/images/391040952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10c27c20> Failed to load image: playground/data/ocr_vqa/images/1561701289.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10cb19f0> Failed to load image: playground/data/ocr_vqa/images/2061514022.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10cac9a0> Failed to load image: playground/data/ocr_vqa/images/739705539.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10cb1f40> Failed to load image: playground/data/ocr_vqa/images/1878239589.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7efe10cac590> Failed to load image: playground/data/ocr_vqa/images/1887089160.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fcfb9126f40> Failed to load image: playground/data/ocr_vqa/images/471178705.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba0d6360> Failed to load image: playground/data/ocr_vqa/images/1564582922.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfc42a2c20> Failed to load image: playground/data/web-celebrity/images/Lee_Byung-hun.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k cannot identify image file <_io.BytesIO object at 0x7fcfba0e2ae0> Failed to load image: playground/data/ocr_vqa/images/155850835X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba161590> Failed to load image: playground/data/ocr_vqa/images/B013RVJ7KW.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba073f40> Failed to load image: playground/data/ocr_vqa/images/939302322.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba141ef0> Failed to load image: playground/data/ocr_vqa/images/1562614479.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba1441d0> Failed to load image: playground/data/ocr_vqa/images/531202852.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe2136a0680> Failed to load image: playground/data/ocr_vqa/images/1567615317.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e79341d0> Failed to load image: playground/data/ocr_vqa/images/093062596X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e784b310> Failed to load image: playground/data/ocr_vqa/images/393314286.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e78c25e0> Failed to load image: playground/data/ocr_vqa/images/968297072.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fe0e77964f0> Failed to load image: playground/data/ocr_vqa/images/067944680X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Failed to load image: playground/data/ocr_vqa/images/1555951120.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f88b75d7e00> Failed to load image: playground/data/ocr_vqa/images/B005FOFNA8.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb3193c1e00> Failed to load image: playground/data/ocr_vqa/images/412076217.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b19ea90> Failed to load image: playground/data/ocr_vqa/images/055305340X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b19f040> Failed to load image: playground/data/ocr_vqa/images/1575662698.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fb31b0e5d60> Failed to load image: playground/data/ocr_vqa/images/749521643.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfb9fce9a0> Failed to load image: playground/data/ocr_vqa/images/739715534.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fcfba19d810> Failed to load image: playground/data/ocr_vqa/images/471024961.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f178a669cc0> Failed to load image: playground/data/ocr_vqa/images/1564588963.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k string index out of range Failed to load image: playground/data/coco/train2017/000000047952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a595950> Failed to load image: playground/data/ocr_vqa/images/968557902.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a536a90> Failed to load image: playground/data/ocr_vqa/images/471550833.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178863df90> Failed to load image: playground/data/ocr_vqa/images/1574320793.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a5ad0e0> Failed to load image: playground/data/ocr_vqa/images/1878239376.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a692180> Failed to load image: playground/data/ocr_vqa/images/1562613480.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178863df90> Failed to load image: playground/data/ocr_vqa/images/892133252.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a692e00> Failed to load image: playground/data/ocr_vqa/images/28608194.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a5ad630> Failed to load image: playground/data/ocr_vqa/images/1570761493.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a554860> Failed to load image: playground/data/ocr_vqa/images/135707978.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a595310> Failed to load image: playground/data/ocr_vqa/images/870408712.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a423c70> Failed to load image: playground/data/ocr_vqa/images/962770124.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a6408b0> Failed to load image: playground/data/ocr_vqa/images/B00XLX3W9O.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a6694a0> Failed to load image: playground/data/ocr_vqa/images/962289027.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f178a60e6d0> Failed to load image: playground/data/ocr_vqa/images/415913756.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fa70a785d60> Failed to load image: playground/data/ocr_vqa/images/893463264.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70a790680> Failed to load image: playground/data/ocr_vqa/images/B01577TUTC.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70a7a3f40> Failed to load image: playground/data/ocr_vqa/images/1560443952.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70a7a39a0> Failed to load image: playground/data/ocr_vqa/images/671766627.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70a8a57c0> Failed to load image: playground/data/ocr_vqa/images/377570177X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70a88f590> Failed to load image: playground/data/ocr_vqa/images/1558743030.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70b1559f0> Failed to load image: playground/data/ocr_vqa/images/4544040604.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fa70b149810> Failed to load image: playground/data/web-celebrity/images/Choi_Min-sik2.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7fca6ee6fea0> Failed to load image: playground/data/ocr_vqa/images/014005667X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6ee50090> Failed to load image: playground/data/ocr_vqa/images/843129697.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6ee87ef0> Failed to load image: playground/data/ocr_vqa/images/155583468X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6ee8f2c0> Failed to load image: playground/data/web-celebrity/images/Choi_Min-sik.jpg, the dataset is: sharegpt4v_instruct_gpt4-vision_cap100k cannot identify image file <_io.BytesIO object at 0x7fca6ee62a90> Failed to load image: playground/data/ocr_vqa/images/201624508.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6ee874f0> Failed to load image: playground/data/ocr_vqa/images/570035651.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6eef19f0> Failed to load image: playground/data/ocr_vqa/images/945397690.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7fca6ed12220> Failed to load image: playground/data/ocr_vqa/images/3575228701.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f73e34eae50> Failed to load image: playground/data/ocr_vqa/images/679441662.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e14b83b0> Failed to load image: playground/data/ocr_vqa/images/316051772.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e3482270> Failed to load image: playground/data/ocr_vqa/images/963479903.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e3452db0> Failed to load image: playground/data/ocr_vqa/images/087033512X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e3482090> Failed to load image: playground/data/ocr_vqa/images/471542989.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e3452ae0> Failed to load image: playground/data/ocr_vqa/images/521256771.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e34eaa90> Failed to load image: playground/data/ocr_vqa/images/1558216103.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e34ea630> Failed to load image: playground/data/ocr_vqa/images/051770353X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e34ea450> Failed to load image: playground/data/ocr_vqa/images/811820580.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e24fbdb0> Failed to load image: playground/data/ocr_vqa/images/B001KBBD4A.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e34523b0> Failed to load image: playground/data/ocr_vqa/images/688121675.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e3421770> Failed to load image: playground/data/ocr_vqa/images/876043082.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f73e34f7cc0> Failed to load image: playground/data/ocr_vqa/images/1560449152.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7faaa3116ae0> Failed to load image: playground/data/ocr_vqa/images/081182568X.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7faaa3a69b80> Failed to load image: playground/data/ocr_vqa/images/939302349.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7faaa13b8f40> Failed to load image: playground/data/ocr_vqa/images/471523771.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7faaa3a73ef0> Failed to load image: playground/data/ocr_vqa/images/1565540581.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7faaa3116c20> Failed to load image: playground/data/ocr_vqa/images/879801654.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Discovered apex.normalization.FusedRMSNorm - will use it instead of LlamaRMSNorm Replace train sampler!! cannot identify image file <_io.BytesIO object at 0x7f5ba6a7cdb0> Failed to load image: playground/data/ocr_vqa/images/840734921.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba698d900> Failed to load image: playground/data/ocr_vqa/images/1561700940.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba6b13360> Failed to load image: playground/data/ocr_vqa/images/891960856.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba69fbc20> Failed to load image: playground/data/ocr_vqa/images/70224889.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba6a918b0> Failed to load image: playground/data/ocr_vqa/images/810943794.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba73b29f0> Failed to load image: playground/data/ocr_vqa/images/933478186.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k cannot identify image file <_io.BytesIO object at 0x7f5ba6a9f360> Failed to load image: playground/data/ocr_vqa/images/1883323460.jpg, the dataset is: sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel) File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table autotune_table = cache_manager.load() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load with open(self.file_path, 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle' Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel) File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table autotune_table = cache_manager.load() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load with open(self.file_path, 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle' Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel) File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table autotune_table = cache_manager.load() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load with open(self.file_path, 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle' Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 444, in matmul_ext_update_autotune_table fp16_matmul._update_autotune_table() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 422, in _update_autotune_table TritonMatmul._update_autotune_table(__class__.__name__ + "_4d_kernel", __class__._4d_kernel) File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 145, in _update_autotune_table autotune_table = cache_manager.load() File "/mnt/petrelfs/wangwenhai/miniconda3/envs/internvl_s_n/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/triton/matmul_ext.py", line 73, in load with open(self.file_path, 'rb') as handle: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/wangwenhai/.triton/autotune/Fp16Matmul_4d_kernel.pickle' ███▉| 1723/1726 [30:14:27<03:04, 61.50s/it] 100%|█████████▉| 1723/1726 [30:14:27<03:04, 61.50s/it] 100%|█████████▉| 1724/1726 [30:15:31<02:04, 62.20s/it] 100%|█████████▉| 1724/1726 [30:15:31<02:04, 62.20s/it] 100%|█████████▉| 1725/1726 [30:16:33<01:02, 62.35s/it] 100%|█████████▉| 1725/1726 [30:16:33<01:02, 62.35s/it] 100%|██████████| 1726/1726 [30:17:36<00:00, 62.45s/it] 100%|██████████| 1726/1726 [30:17:36<00:00, 62.45s/it][INFO|trainer.py:1962] 2024-06-11 06:55:06,153 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 109063.037, 'train_samples_per_second': 16.207, 'train_steps_per_second': 0.016, 'train_loss': 1.2428148501441487, 'epoch': 1.0} 100%|██████████| 1726/1726 [30:17:42<00:00, 62.45s/it] 100%|██████████| 1726/1726 [30:17:42<00:00, 63.19s/it] [INFO|trainer.py:2936] 2024-06-11 06:55:08,482 >> Saving model checkpoint to work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re [INFO|configuration_utils.py:473] 2024-06-11 06:55:08,486 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/config.json [INFO|configuration_utils.py:594] 2024-06-11 06:55:08,491 >> Configuration saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/generation_config.json [INFO|modeling_utils.py:2493] 2024-06-11 06:55:16,362 >> Model weights saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/model.safetensors [INFO|tokenization_utils_base.py:2433] 2024-06-11 06:55:16,433 >> tokenizer config file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/tokenizer_config.json [INFO|tokenization_utils_base.py:2442] 2024-06-11 06:55:16,444 >> Special tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/special_tokens_map.json [INFO|tokenization_utils_base.py:2493] 2024-06-11 06:55:16,450 >> added tokens file saved in work_dirs/internvl_chat_v1_5_internlm2_1_8b_dynamic_res_finetune_re/added_tokens.json ***** train metrics ***** epoch = 1.0 train_loss = 1.2428 train_runtime = 1 day, 6:17:43.03 train_samples = 1767531 train_samples_per_second = 16.207 train_steps_per_second = 0.016